Note: This is Part 2 of an ongoing series on Drupal performance and load testing. If you haven’t already, read the introduction.
Summary
We measured Drupal’s performance with a naive and recommended configuration, and again using the Pantheon Drupal Platform, demonstrating a better than 15x improvement in performance.
Procedure
We created a Drupal installation on the Amazon Elastic Cloud, which allows us to start and customize Drupal in a matter of minutes. For these tests we used Amazon’s “Large” 64bit instance, which corresponds roughly to a dual-core machine with 7.5 GB of memory.
Our baseline platform consisted of a stock Fedora Core 8 with Apache and PHP5. The Drupal version was 6.19.
We wanted a realistic, high-stress scenario. Our test entailed a large number of users focusing on one story, which we imagined had been linked to by another high-traffic site. There were four test scenarios, which seemed to represent typical user behavior:
We configured our load test to ramp up, starting with a very small number of users and increasing over time until our Drupal server fell apart under load. We also collected a variety of statistics on the performance of the servers and the quality of the user experience.
Configuration 1: Baseline
We load tested a basic configuration of Drupal: simply boot up Fedora Core 8, install Apache and PHP, and unpack the Drupal tarball into /var/www/. Under this configuration, Drupal began printing a few errors when we exceeded 175 users.
Average page durations on this system were about 6/10ths of a second at the 100 user level, which is entirely acceptable, but could be snappier.
What caused Drupal to fail? We used our server monitor software and discovered that the CPU utilization on our server peaked at around the same time Drupal began printing errors. Notice that the server seemed to loose scalability at about the 20-minute mark, and finally collapses at the 40 minute mark.
Configuration 2: PHP Acceleration
To optimize Drupal, we started by installing APC, a popular PHP accelerator. APC works under the hood to cache PHP bytecode; without it, the PHP source has to be recompiled for each request. When we load tested Drupal with APC installed, Drupal didn’t print any errors until we exceeded 500 users.
Drupal was also slightly snappier with acceleration, giving us average page durations at the 100 user level of over 4/10ths of a second — probably not a large enough difference for a human user to notice.
Configuration 3: Caching
Next, we enabled Drupal’s built-in caching facilities. These are features of Drupal that are normally disabled for development, but enabled on production systems. Users with administrative privileges can find these settings in Administer > Site Configuration >Performance. Specifically, we set caching mode to “normal,” block caching to “enabled,” and we enabled Drupal’s built-in javascript and CSS optimizers. We didn’t set a minimum cache lifetime.
For explanation, Drupal’s “block cache” has the ability to cache individual widgets, such as “recently viewed comments.” This gives Drupal the capability to compose different cached elements into one HTML document. Drupal’s Javascript and CSS optimizers combine and compress multiple resources into one.
With caching enabled, Drupal seemed to have a capacity of 1700 users.
Caching also improved Drupal’s page durations to a little over 2/10ths of a second. Very snappy!
But can we do better?
Configuration 4: Pantheon
For the next stage of our quest to optimize Drupal, we sought out some Drupal experts from Chapter Three and Four Kitchens, who recommended the Pantheon Drupal Platform. Pantheon is a complete optimized best-practices Drupal stack, combining Pressflow, memcache, and Varnish. The Pantehon AMIs were extremely easy to set up, and we were able to casually migrate our entire database back and forth between Drupal and Pressflow with mysqldump.
We started up Load Tester and threw as much traffic at Pantheon as we could — until the number of concurrent logged-in users exceeded what we had populated in the database. We added a few thousand users to our SQL table and tried again.
Not only did we get up to a whopping 2700 simultaneous users, but Pantheon was snappier at all user levels. Average page durations for Pantheon at the 1500 user level were 14/100th’s of a second — faster than any other configuration at any user level.
We actually tested Pantheon all the way up to 7500 users, and, although it didn’t satisfy our performance criteria beyond 2700, we had to throw a lot of traffic at it before it became thoroughly unusable. We asked Pantheon developer Josh Koenig what tools and techniques his team used to optimize Pantheon:
The tools of the trade we tend to use are a mix of standard monitoring utilities (top on the shell, munin to get graphs, maatkit for digesting a query log, etc) and some Drupal/PHP specific tools like the apc.php file and drupal’s own devel.module.
For digging deeper into what might be bottlenecking the CPU, we use xdebug to generate cachgrind files, which can be analyzed to determine what functions are costing us the most, or where there are functional paths which could be truncated via caching.
David [Strauss]’s also had some good experience with xhprof, and we’re looking at demoing some of the upcoming php monitoring tools from New Relic (who set a good standard for the Java and Ruby realms).
Beyond that’s it’s just a lot of hands-on experience and knowing how to walk the full stack to see what parts of the system aren’t performing up to par, and then knowing enough about Drupal, PHP and Mysql to see what can be done, and being hungry enough about learning cool new stuff to see what’s coming down the pipe that can help.
Conclusion
We were pleased to demonstrate the ease with which we were able to use Web Performance Load Tester to collect performance metrics on various Drupal configurations.
Some of the gains we demonstrated represented fairly low-hanging fruit. Installing a PHP accelerator like APC is a no-brainer on almost any production platform and can nearly triple the capacity of a heavily-trafficked web site. Caching is also very easy to enable.
On the other hand, the Pantheon Drupal Platform outperformed our expectations with better than an order of magnitude improvement in user capacity. Pantheon represents many long hours of tweaking and optimization by Drupal experts, but in the end Pantheon was extremely easy to use and compatible with the vanilla Drupal database schema.
— Lane, Engineer at Web Performance