Blog

Do It With Drupal: The Economist

Rob robpurdie@economist.com - Scrum Practice Leader
twitter.com/robpurdie
facebook.com/robpurdie

Overview

  • Moving incrementally and iteratively to Drupal- making improvements as you move bit by bit
  • User comments and recommendations served from Drupal, along with comment history pages, article comments pages
  • Syncing data to Drupal every 5 minutes-- all content and comments
  • Soon, article pages served from Drupal-- running into a few performance problems
  • Next: channel pages served from Drupal, third-party services, registration
  • We benefit from Drupal sooner by taking this approach; rather than building the whole site in the background and not benefitting until the end, this way we benefit from improved functionality sooner
  • "The Economist is so old that the guy who started it had to be painted rather than photographed"

The old way

  • 20-30 mil page views, 3-4 million unique visitors per month - lots of performance and scalability issues
  • Want to build the foremost destination online for analyzing and debating global agenda; want to bring visitors into that debate; current system isn't enough to support this vision, that's why they moved to Drupal particularly for comments
  • Increase publishing volume with user-generated content (more content w/o more costs)
  • The old way: custom CMS built on proprietary stack (MS, ColdFusion, Oracle)
  • Blogs were originally MovableType, now are all Drupal
  • Broken waterfall processes meant frequent fire-fighting
  • Needed to be more responsive to change, deliver business value sooner (projects take a long time to deliver value to organization), more sustainable, happier
  • Making these changes incrementally and iteratively; "perfect is the enemy of better"

Why Drupal?

  • Looked at OpenCMS, Alfresco, Joombla, met with other newspapers, considered building a custom system, buying a proprietary system, or going open source
  • Drupal as strategic fit: community and content publishing, robust development framework, development language, free software
  • Strength of Drupal community
  • Selling Drupal internally was a challenge: no suit-wearing Drupal sales force
  • Attended DrupalCon Boston 2008, networking within community, engaging w/ Lullabot for workshops and training
  • Proof-of-concept to reproduce article page in Drupal; how to use CCK fields to make a rich article content type

Using Scrum

  • 3 million registered users, articles - data migration is daunting
  • Manage the move using Scrum - selling it was easy with charts (developing business value sooner and throughout, management can see progress throughout, shining a spotlight on issues/dysfunction and attacking them along the way - risk decreases a lot faster)
  • Take requirements, prioritize based on business value: which are the most important to organization, do those first
  • Trained management team in Scrum, development team in Drupal, then started sprinting with help from consultants (2-week sprints, delivering something of value at the end)
  • "Maybe not the largest Drupal project, but the most expensive" - lots of consultants

Integrating CMS's

  • Proxy approach: Drupal sends JSON over HTTP back and forth with Existing ColdFusion system
  • Using native Drupal comments; comments have to be attached for nodes - there has to be a node for every piece of content on the legacy system
  • Create nodes on the fly for every ColdFusion request that comes in
  • Notion of proxy nodes is a pattern that comes up during integration of Drupal with other systems
  • Voting API votes used for recommends; these are also attached to proxy nodes
  • Started with proxy approach only; then moved to doing some with subdomain approach - hope to be doing neither soon after moving entirely to Drupal

Migrating data

  • Migrating and syncing data every 5 minutes - don't wait until the end to figure out that piece
  • Table Wizard and Migrate modules
  • Table Wizard writes Views integration for MySQL tables
  • Migrate lets you migrate certain views, push into Drupal as nodes/users/taxonomy terms/etc
  • Client is involved in how legacy data gets organized in Drupal
  • Sat down with client to browse through content and decide what data needs to be moved and what it means
  • Migrate keeps track of everything you've done, gives you a dashboard, tells you how far along you are - keeps a mapping table, legacy ID, you can check and see what came across and fix things; does your bookkeeping for you
  • Drupal expects to have all the info it needs in its database; something getting published in Oracle needs to be in Drupal promptly - synchronization

Questions

  • How did you decide what to put into Drupal first?
    • Business value: comments, user profiles, recommends
  • How many Drupal servers does it take to scale that big?
    • Not entirely sure how many servers we have; let's say +/- 12
    • Master MySQL server, a few slave MySQL servers - more important aspects have to do with Pressflow
    • Pressflow = high performance variant of Drupal 6, completely API compatible with Drupal, but it takes some patches that are in Drupal 7 and moves them in to Drupal 6
    • Use Varnish's full capability; Varnish = reverse proxy server, takes load off Drupal/PHP/MySQL
  • How do you stop people from trying to shove their emergencies into Scrum process?
    • Don't want people going directly to the team like they traditionally do
    • Team, Scrum Master, product owner - customer, person who represents the client, has to have power to make decisions on behalf of organization, responsible for managing stakeholders
    • Product owner comes to team w/ prioritized list of features for next sprint
    • Had two teams in New York and one team in London all doing 3-week iterations in parallel
    • Split up site into component parts: profiles, article pages, channel pages, had three product owners who had to manage stakeholders
    • Works reasonably well; now we're doing two teams, one system that shows what all teams will do; someone has to keep "product backlog" in order, stopping people from shoving in their "one little thing"

Features

  • Base theme is 960 px grid - laying out themes as a series of columns, all sections have to fit into the grid
  • Selenium for "user journey" testing; building environments to help manage configurations
  • Continuous integration using Hudson - needed a shared place where user tests could run
  • Set of servers running on Amazon; Hudson sets off user tests every time there's a commit to the SVN repository
  • Apache SOLR search hosted by Acquia- 100,000k articles that have to be available through site search
  • People were unhappy with relevance of matches in old site search
  • Acquia's hosted search service: really fast, good results
  • Apache SOLR: can start filtering results further and further - faceting
  • "How do I get SOLR running on my website?" - can self-host, but we went with Acquia

Questions

  • Other tools for managing people/process?
    • In Scrum, less about resource management - we just want dedicated co-located teams, don't worry about availability because of multiple projects: single focus
    • Redundancy of function - generalizing specialists, specialists can create bottlenecks/risks
    • "How many people need to be hit by a bus before your project fails?"
    • agilemanifesto.org
    • Use Google Docs a lot - project backlogs are all spreadsheets, a big wiki, project dashboards that "radiate information to the rest of the organization"
    • Focus is on people, not tools
    • Test-driven development, writing tests first can sometime be hard with Drupal

Impediments to progress

  • Previous processes/structure/culture: command and control - hard habit to break
  • Project manager telling people what to do and when to do it by - this is bad management; it has an impact on people
  • We want self-organizing teams
  • Previously, black box development: low visibility during the project process
  • For Scrum, everything needs to be transparent, frequently inspect outcomes, adapt as we go - can't have a postmortem after everything's done, need to do that every day
  • Hero developers who go off and solve problems heroically aren't compatible with Scrum
  • Previously, developmental silos - departments based on function, these have been removed, but people still want to exist within their old silos
  • People want to work on multiple projects like they used to, rather than working on a single project in a dedicated manner
  • Previously, traditional line management: where you stack up in the line doesn't matter now, this was a big change
  • Engineering practices (specifically quality) - big issue; Scrum is a wrapper for your existing engineering practices, doesn't say anything about testing
  • Scrum assumes your engineering practices are great, or you'll make them great quickly
  • You can say "we're going to do Scrum" but old habits die hard - focusing on what "done" means and providing a deliverable at the end of each sprint, have to deliver quality too-- have to go live successfully
  • Want to deliver "potentially shippable code" at the end of each session - have to have a testing environment that's representative of live environment; been bitten by differences in configuration
  • Everything has to be identical in the test environment (just with a scaled down number of servers) - same data center, same network issues, etc
  • Hard to bite the bullet on the costs involved in building a testing environment, but it's important
  • Hard to simulate kinds of traffic you get in production - plus, have to keep track of session cookies
  • Form fields can hurt you - replaying post requests
  • Cron jobs that run all the time - cron jobs can stack up and site starts to decay

Questions

  • Migration of real-time data: code changes are easier to migrate than content changes, what's the process for moving bits of content from development to production?
    • When there's content you need to work on for a while before it goes live, work on the live servers but make sure end-users can't see it
    • Can use the unpublished flag on a Drupal node to do that; use "views" to see everything unpublished in sports category
    • For a small team, that's a reasonable solution
    • For bigger organizations with a lot of people working together, use "Workflow" module - nodes step through a series of states
    • If it's a business requirement that content has to start off on staging servers and only then push to live, use module "Deploy" - push-button way to push nodes and their dependencies-- users, taxonomy terms, etc-- to another environment
  • Technical reason for using external searching - why use SOLR at all? What about Drupal search?
    • Drupal 6 is better than previous search mechanisms, but falls apart at a certain scale
    • Slow queries, sub-optimal results
    • A lot of non-Drupal people have worked on Apache SOLR, Drupal has integrated it well
    • Self-hosting, or with Acquia - if you have the talent to run Java apps in your data center and keep it running, self-hosting is a great idea; will reduce latency
    • Most of us are struggling to keep PHP/MySQL up as it is, this is where Acquia comes in
    • Acquia service is pretty much plug-and-play
    • Built-in search doesn't come with facets; can add on facets with the "Faceted Search" module
    • SOLR is an enterprise search system; used by Netflix, Expedia, etc.
  • Could you use Views instead of facets?
    • There's a lot of overlap there, and different possible approaches.
    • Full-text searches need SOLR rather than Views
  • Some of the wins you've had with Scrum/Drupal, and some weaknesses
    • Wins by development teams - prefer this way of working, where business people are only concerned with relative priority of requirements, have no say in how long it takes to implement
    • Product owners prioritize "stories", developers size those stories relative to each other, rather than in hours of effort
    • Stops the cycle of cutting corners on quality in order to get it done in a shorter timeframe
    • Can't get productivity gains w/o changing the way you work
    • Product owners need to be involved, can't change requirements mid-sprint
    • Have "working agreements" - a kind of social contract
    • Scrum isn't a prescription - you can pick and choose the parts that you want that meet your organization's needs
    • Specific processes layered on top of simple framework of transparency, working together, and adapting to testing results, can vary
  • When will the Economist be fully on Drupal?
    • Description says "this month" - that was the plan
    • People paying the bills get to make decisions; is it most important for us to go all-Drupal ASAP, or extend functionality of site to be competitive?
    • Recent decision was for the latter
    • Don't know when

Tags: 

Do It With Drupal: Fantasy Sites- Stack Overflow

About Stack Overflow

  • Zero barriers to entry
  • Reward good content by putting the best answers first
  • Give people karma
  • Destroy Experts' Exchange and answers behind a paywall
  • Incredibly active, has sister sites superuser, serverfault - people collaboratively build great answers to pressing questions
  • Spawning clones - can license software behind Stack Overflow
  • "I could do that" tinyurl.com/bitquabit-so, tinyurl.com/mythical-weekend - this ignores how much "soft work" went into it, how the community would work, etc.
  • 24 hours of actual site-building behind this

Behind the site

  • You've got questions, people have answers, people can vote up/down, people can favorite, community moderation, collaborative editing - every question/answer can turn into a wiki page so people can edit/improve/tweak/correct content
  • Lots of views of content - tagging, rich user profiles, badges, "karma crap" to get people hooked on contributing
  • Mapping out architecture of site and how things are presented:
    • Current active list of questions - shows you votes, answers, how many views, tags
    • Can sort by "karma bounty" (give up 100 points of my karma to person with best answer)
    • Can sort by hot questions, current week, current month
    • Newest, featured, highest vote-earning
    • Tag cloud view of entire site
    • View of all users and activity level
    • Badges: all the different awards people have earned
    • View of unanswered questions
  • "Ask a question" form
  • Moderation tools - editing and flagging, and post an answer right below the question
  • A lot of rich functionality, but totally dedicated to its core goal of Q&A

Drupal version

  • www.array-shift.com
  • Can flag taxonomy tags that are interesting, and just see related questions
  • Node add form: done some work to streamline
  • BU Editor plugin - not WYSIWYG, but a tag helper - it puts the tags there, provides buttons
  • Uses markdown, not HTML
  • Markdown module in Drupal just provides an input filter
  • BU Editor plugin, Markdown manager, Markdown = rough analog to Stack Overflow
  • New module "Active Tags" - lets you accumulate tags as little flagged items rather than having them be in a list; just click a little X and it goes away - pure client-side stuff, nicer way of presenting the tag lists
  • Turned it on, added some extra CSS to put nice boxes around it, that's it
  • "Wikify" module lets you invert normal node access - like "Private" (checkbox for 'only people in specific codes can see this'), "Wikify" has the same thing, but the checkbox is for editing
  • "Flag" module used for star, "user points" module awards karma points when something is starred
  • (array-shift.com has major CSS problems in Safari)
  • 100 lines of code that intercept Drupal events and ward karma; could use "rules" module but it was easier to just do what I needed for this exercise
  • "user points" automatically assigns roles when people pass karma thresholds
  • "flag" is a "flag" module ... flag that lets you set up arbitrary toggle-able flags for things, even supports "when more than 10 people flag something, do X"
  • Flagging something as offensive takes karma away (that's why there's a confirmation page to avoid mistakes), 10 offensive flags unpublish questions
  • Module called "flag term" - taxonomy terms, that's how you track topics you care about
  • Pure theming differentiates word "flag" and image of a star
  • List of answers uses "node comments" module - stands in for built-in comments, has a content type "comment"
  • Can have a view that shows the comments, where the arrangement is based on the rating
  • Comments on comments wasn't implemented (Stack Overflow has in-line meta-discussion, we didn't have time to do this)
  • Node Comment lets you use normal Drupal comments on things too
  • User badges module exists for Drupal, but doesn't have enough API support to configure without a lot of work (Anyone want to rewrite User Badges from scratch?)
  • Tabs for "newest", "hot", "etc" - each of these is a display on one view, set up with tabs
  • Tags view is a view of taxonomy tags - sort by popularity/name
  • Users - uses Gravatar module to pull in global avatar; set up user pictures like normal for Drupal, but Gravatar sits in the middle; generates unique geometric icons if you don't specify your own picture
  • "Type to find users" - exposed filter
  • Node Form Settings module - lets you do things like hide the revisions field, hide name of the title - exercise control over chunks of the node form
  • Similar By Terms module used for "similar questions" - other questions tagged with the same tags

Stepping back for a moment...

  • This thing has Q&A, voting (vote up/down module), karma (user points module), moderation (simple tools for flagging), ability to track interesting taxonomy terms (flag module), community editing (wikify - people with permissions can say "this is an article everyone should update to consolidate discussion"), a bunch of views that slice and dice content
  • Drupal didn't need a lot to do that basic functionality
  • What it doesn't have: meta-comments (this could be done, just didn't have time), in-line editing and AJAXy goodness (when you hit "edit", you go to the Drupal edit page; in Stack Overflow the body turns editable), karma bounties (could be written), user badges/awards (user badges module is pretty rough, doesn't work so well beyond use case), user profiles are unthemed (Stack Overflow can pull in OpenID profiles, what you've voted on, chart of karma history, etc - there's tools for each of that; user points history module that will generate chart; views attach module - stack views onto user profile)
  • Lots of polish isn't there
  • Queued messages - if you're not on the site when you've earned a new badge, it'll prepare a message for you when you come back (maybe Activity module?)
  • TOTALLY missing: performance tuning, no community around it (Stack Overflow has a great community - that's just as much work, and you can't install it)
  • Doesn't have a theme that can be distributed. Ever.

Under the hood

  • 20 contrib modules, 2 custom modules (one is just exported flags/views), 1 theme, lots of config work
  • 6 views with sub-tabs
  • 5 flags (just set up in flag modules user interface: favorite, wikify, offensive, interesting/ignored for taxonomy terms)
  • 3 behaviors: posting/editing, evaluating, filtering, suite of modules - vote up/down, voting API, user points; active tags, BU editor, markdown, markdown editor, node form settings
  • Did not theme node form - CSS + those modules
  • pathauto, token - clean URLS
  • CCK isn't even installed
  • 2 custom modules: export (held exported versions of those views and flags), tweaks (intercepting voting API/posting hooks to give karma)- could've used Rules module to do this
  • Theme has page templates
  • Views have unformatted view, row-style template (has title, number of votes, times node has been viewed, listing of tags) - give me an array, I'll write markup
  • Theming views was easy
  • Custom node templates for question and answer nodes to position things correctly
  • Flag module templates used to override things and get the little star
  • Pre-process hooks to pull in user karma points, but no crazy theming hacks
  • 30 lines of PHP in a template file
  • No overridden theme functions (user name, breadcrumbs, none of that)
  • Extra credit: just learned about a module called "inline registration" - if user is anonymous, can enter desired username and e-mail at the top of the node form; when they submit, it'll create user account, node, and assign node to user account in one step
  • Live preview of node editing can be done with "Live" module (used at groups.drupal.org)
  • blittr vs arrayshift
    • Analysis and evaluation of what the sites do, and what modules there are - spent more time on Array Shift doing that evaluation, bigger site, more ways of looking at that data than what Twitter provides
    • 10 hours on that stuff
    • Configuring the site - about 4 hours of going through and clicking on stuff in Twitter site, 3 hours for Array Shift
    • After analyzing what Stack Overflow looks like, how it works - implementing it in Drupal with user points, flags, views was pretty straightforward
    • Building the views took less time than doing it for the Twitter module - very clean mapping between the tabs and what they connect to
    • UX mapped itself well to building some views
    • Time for the more complex parts (custom code) kept going up for Twitter clone (7-ish hours) - kept going down for Stack Overflow (2 hours)
    • Theming - tricky any way you look at it; 9-ish hours for Twitter, 13-ish for Stack Overflow
    • Translating the info Drupal provides into the right markup for the theme
    • This time is based on starting from a design in HTML + CSS (coming up with an idea would take a lot longer)
  • "Magic" category - took 11+ hours, creating the whole install profile
  • Install profiles - use Aegir to install a copy of this on a sub-domain
  • Writing the install profile took as long as writing the theme - now can spawn infinite copies of this website
  • drupal.org/project/arrayshift - last pieces will be in place soon
  • BUT there will not be a pretty theme with it

Tags: 

Do It With Drupal: Anatomy of a Distribution: Open Atrium

  • Open Atrium is a "team portal in a box" (AKA Basecamp alternative)
  • Can be behind a firewall, is free, openatrium.com
  • Putting people in different groups
  • Comes with six features:
    • Blog: turned on/off on a group-by-group basis
    • Wiki
    • Calendar- iCal feeds too
    • Shoutbox - like private Twitter
    • Case Tracker - ticketing system
    • Group dashboard
  • 75,000 downloads since July 17
  • translate.openatrium.com - 31+ levels to various extents; get updates that don't overwrite your custom updates

What are people doing with it

  • Basic project management tool set
  • Sprite-based theme (5.5 kb, 13.7 kb)
  • Tailoring the system to your own needs
  • Drupal Core, modules, plus Features module power Open Atrium
  • People can customize their own dashboard
  • Cross-posting to different groups disabled; also, Organic Group configuration much more simple (clear distinction between public and private)

Migrating into Open Atrium

  • It's just a Drupal site, so in theory you can turn on the Open Atrium modules around your existing site (but this isn't suggested) - use some other way (Feeds module?) to aggregate existing content and put it into the new framework
  • Migration is a solvable problem, but probably not in a generic way useful for the core project

Extended features

  • Project status - time tracking and approval flow for a web shop
  • World Bank did a highly customzied version; integration with Lotus Notes - their own internet behind a firewall; faceted search across their pre-existing staff directory; extended events system to help with scheduling
  • Some custom coding went into the World Bank site, but a lot of what goes into it comes from configuring existing modules

How we use it

  • Over 50% tickets
  • Use blog instead of e-mail for the most part

Atrium's rules

  • Works out of the box
  • At least as simple as running straight from drupal.org
  • Once you install it, it's clear what the next step is - unlike Drupal, where you install it and wonder "what now?"
  • Works with Aegir
  • Doesn't hack core or contrib (except occasionally- there's a hack to Views that makes it translatable)
  • Doesn't do everything - does a few things that are widely useful for intranets, and you can extend it

Things we'll never do

  • Add a WYSIWYG; BUT, you can do that
  • Add CVS integration (but see features.blackstormsstudios.com)
  • Add Alfressco integration - but someone else has tried this
  • Investing some time in Google Docs integration
  • Won't ever clone Basecamp - but someone wrote a theme that looks a lot like it (drupal.org/project/atrium_simple)
  • Add Sharepoint integration to base package

Things we will do

  • Clearer branding- Drupalisms & Atriumisms beware!
  • Drag and drop dashboards (vimeo.com/7643255)
  • Better admin experience (drupal.org/project/admin)
  • Pluggable search
  • Improved l10n support- Drupal only supports one language at a time, we want to fix this
  • Rewriting core functionality - upgrading to Context and Spaces, when we say "beta", we mean it
  • Rework the "user space"
  • A calendar with a user story
  • Rewrite Case Tracker - this powers the to-do system, people want to customize the states cases can be in, kinds of cases, etc. (github.com/miccolis/casetracker)
  • This is going to be painful, we'll provide upgrade paths
  • Move to drush make (drupal.org/project/drush_make)
  • New on drupal.org: install profiles: lists of things that, all together, make a site

Tags: 

Pages

Subscribe to Blog