Do It With Drupal: Drupal Under Pressure: Performance and Scalability

11 Dec 2009
Posted by Quinn
  • Browser | Apache | PHP | -SQL Queries | MySQL
  • Common pattern for optimization: inspect each layer, add little buckets of caches everywhere
  • "Fast track" through the different layers to get out requests more efficiently
  • On browser side: Mod Expires, sends a message to the browser and says "I've got this info, you've already looked at it, we're good"
  • Firebug will show you all the individual requests- says how many kb it takes to download (if you only have to download a little bit when you refresh, that's good)
  • CDN - Content Delivery Networks and reverse proxy caches: any stuff that hasn't changed, you don't have to ask your internal infrastructure to handle that (hand it off to geolocated servers optimized to quickly serve out that info)
  • Proxy cache can be in front of your infrastructure (offload things Drupal would keep doing over and over)
  • PHP level: OpCode cache
  • MySQL level: query cache - takes all the read queries (most of the select statements) and stores the results in memory
  • Query cache, OpCode cache: half hour or less, significant improvements
  • Proxy caches and CDNs are a bit larger of a task
  • Component between database and PHP: MemCache - clone of some of Drupal's tables
  • MemCache: take all the cached tables, hold it in memory
  • MemCache also used for sessions - if your sessions table is locking up, your site is about to implode
  • MemCache also used to speed up path aliasing stuff

Apache Requirements

  • Apache 1.3.x or 2.x, ability to read .htaccess fiels, AllowOverrideALL
  • If we take information in .htaccess and put it in main Apache config file - it's faster, it might not be a huge bump in performance, turn off dynamic configuration of Apache
  • mod_rewrite (clean URLs), mod_php (Apache integration), mod_expires
  • MaxClients- number of connections you can have to Apache at once; if you set it too high for your server, you'll run out of memory
  • RAM / AvgApache mem size = # max clients

mod_expires

  • ExpiresDefault A1209600 (AKA "two weeks")
  • ExpiresByType text/html A1 (all images, CSS, javascript: they get cached for two weeks, except the text/html)
  • We can't cache html in Drupal because that's dynamic
  • This is telling Apache to send the headers to the browser that tell the browser it's ok to cache it

KeepAlive

  • There's overhead to opening TCP/IP connections
  • "We can have a conversation this long" - Apache and browser can keep a conversation going long enough to download an entire page
  • KeepAliveTimeout 2 (but you can monitor Apache threads to determine when a process turns into a wait process, refine it)
  • Resources: linuxgazette.net/123/vishnu.html

PHP requirements

  • 5.2.x, XMl extension, GD image library, Curl support, register_globals:off, safe_mode:off
  • PHP Opcode Cache: removes "compile to operation codes" steps - go right from parse PHP to execute
  • APC: http://pecl.php.net/package/APC
  • php.ini: max_execution_time = 60, memory_limit = 96M
  • If you're uploading big things, you might need more; if you're doing image handling/image manipulating (image cache to dynamically create image derivatives) may need to increase memory
  • Opcode cache is going to increase size of each Apache process? Or maybe not? (Debate ensues)
  • In any case, check and see if Apache is holding onto more memory
  • Use PHP best practice (don't count things over and over - store that count and then move on)

True or False?

  • The more modules you enable, the slower your site becomes (TRUE!)
    • Sometimes you may not need a module for that - 5 lines of code and it's done (don't need a birthday module with candles, etc if you just need the number)
    • "Do I really need to enable this module?"
  • When my site is getting hammered, I should increase the MaxClients option to handle more traffic (FALSE!)
    • You'll run out of memory, start swapping, and die
  • echo() is faster than print() (WHO CARES?)
    • This is taking things a little too far

Database server

  • MySQL 5.0.x or 5.1.33 or higher (there's some problems before 5.1.33 with CCK)
  • MyISAM by default
  • In Drupal 7, there are changes - MyISAM locks the entire table from writing when one thing is getting written somewhere; the access column, user table, session table is getting written to on every page request - this can cause problems
  • Drupal 7 uses InnoDB - row-level locking, transactions, foreign key support, more robustness (less likely to get corrupted tables)
  • If you have a table that's primarily read, MyISAM is a little faster
  • Query caching - specify query_cache_size (64M?), max_allowed_packet (16M?)
  • Is query cache size relative to table size? - yes, basically a bucket for read queries; how many result sets do you want to store in query cache

Query optimization

  • Find a slow query (can look at slow query log in MySQL), debug the query using EXPLAIN, it shows what's getting joined together and all sorts of other details; save the query, save the world
  • log-slow-queries = /var/log/slow_query.log
  • log_query_time = 5 (5 milliseconds)
  • #log-queries-not-using-indexes: little ones that get run a ton, if you tweak that, you'll optimize the site (voting API, casting a vote)
  • Add an index to reduce the number of rows it has to look through (tradeoff: it adds a little bit of time before a write can happen)

Drupal

  • Use Pressflow: same APIs as Drupal core but supports MySQL replication, reverse proxy caching, PHP 5 optimizatinos
  • pressflow.org
  • Almost all Pressflow changes make it back to core Drupal for the next release
  • Cron is serious business - run it
  • Drupal performance screen (/admin/settings/performance)
  • We can't cache HTML like we can cache other things... but there's a way to do it
  • It's disabled by default; the normal version takes requests (stores anonymous-user-viewing-a-page and stores it in the database)
  • Aggressive cache bypasses some of the normal startup-y kind of things
  • Aggressive cache lets you know if there's any modules that might be affected by enabling aggressive caching (such as Devel module)
  • MTV runs on 4 web servers and a database server - and has TON of caching/CDN
  • CDN is great for a huge spike in traffic
  • If you don't have $$ for a CDN, use a reverse proxy like Varnish: don't ask Drupal to keep generating stuff for anonymous traffic
  • Block caching is good
  • Optimize CSS = aggregate and merge (20 requests for CSS files can go to 2)
  • JSAggregator does compression for javascript (but be sure that you've got all the right semicolons)

Tools of the trade

  • Reverse proxy caches: like your own mini mini CDN; Varnish (varnish-cache.com)
  • Set time to live for your content - this leads to regulated traffic off the originating server
  • whitehouse.gov is being served all through Akamai; when you do a search, or post something you start to hit the original Drupal
  • Apache Benchmark - impact of your code on your site
  • It's built-in with Apache (ab from command line)
  • ab -n 10 -c 10 http://www.example.com/ (10 requests, 10 at a time)
  • You get back a number (requests per second your site can handle)
  • More complicated for authenticated users; first, turn off all caching (for worst case scenario), look at the cookie and get the session ID, and do: ab -n 10 -c -C PHPSESSID=[whatever it is] http://www.example.com

devel module

  • Not suggested for a production site; Masquerade module is for switching users on a live site
  • Print out database queries for each page
  • Switch users
  • View session information
  • dsm()
  • db_queryd()
  • timer_start(), timer_stop()

MySQL Tuning Scripts

  • blog.mysqltuner.com
  • www.maatkit.org - makes human-friendly reports from slow query report

Kinds of scalability

  • Scalability - how long can you survive the load
  • Scaling: viral widgets, there, the mantra isn't "protect the database", it's "protect the web servers" - get more web servers
  • Spike in anonymous user traffic (getting Slashdotted): site is a place for authenticated users, offload anonymous user traffic
  • Tons of authenticated users: 100k employees logging into an infrastructure from 9 to 5 - big, beefy servers in a hosting location

Where do you start?

  • Do the quick wins first
  • Save time for load testing
  • RAM is cheap, MemCache is a nice solution
  • If you get a warning about upcoming spikes in traffic, that triggers reverse proxy cache, CDN
  • Work with hosting companies that know their infrastructure; build a relationship with them early on to have these kinds of conversations
  • Some crashes are just a misunderstanding about what Drupal needs (going from a static site to Drupal without making changes)

When your server's on fire

  • Always have breathing room if you can
  • If you've done MemCache, query caching, gone through all of that... add another box
  • Add another virtual server
  • Scalability = redundancy; back yourself up
  • If the site goes down, will you lose money? If yes, invest in infrastructure

Tools of the trade

Quinn, great post about scaling out Drupal. You might want to check out http://blitz.io which is kinda like 'ab', except on steroids. :) cURL'ish interface, multi-location, world-wide and comes with a ruby gem for continuous integration.

- @pcapr


Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options