Drupal: Caching and Database Scalability - Web Performance
Menu

Drupal: Caching and Database Scalability

Note: This is Part 4 of an ongoing series on Drupal performance and load testing. If you haven’t already, read the introduction.

Summary

We measured Drupal’s performance with respect to database size, demonstrating flat performance regardless of the size of the database.  We also got some good data demonstrating Drupal’s behavior with caching.

Procedure

We re-created our previous test platform: a stock Drupal installation on an Amazon Elastic Cloud m1.large instance with both the Alternative PHP Cache (APC) and Drupal’s built-in caching capabilities.  In this test, however, instead of scaling the number of simultaneous users, we instead held the test at 400 users, but varied the amount of content in our database.

We also used a different test scenario.  As in the previous scenario, each user visited a single “hot” page, presumably a page that had been linked from another high-traffic site.  However, each user then browsed to a different arbitrarily selected page, forcing Drupal, over time, to serve the entire contents of its database.  In the graphs below include one line for each of the three pages.

It should be noted that the machines in question run with 7.5 gigabytes of memory — theoretically enough to cache in RAM the entirety of the database we threw at it.

We were looking for any sign that Drupal would struggle to service a large database.

Results:

The first test group simply adjusted the total size of the database in bytes:

To put this number into context, 7 gigabytes is equivalent to 15,000 full-length novels, larger than all but the most popular blogs and wikis, but certainly dwarfed by a multi-terabyte resource like wikipedia.

The following is the graph of average page duration over time against the 10,000 node database.

10000nodes_time_based_page_durationsWhat’s happening here?  The two pages with fixed URLs (the front page and headline article, which was large and therefore a little bit slow), both behave very consistently.  But the “page” that was actually configured to dynamically crawl all of the stories in the database shows interesting behavior.  For the first half of the test, page durations seem to increase gradually, and then quickly drop to one tenth of a second.  This test features slightly over 20,000 test case completions, so the simplest explanation is that roughly halfway through our test we fully populated Drupal’s built-in cache and just started returning every page request from the cache.  If you look very closely, the average page durations of the other two steps in the test case decrease very slightly as load on the server was reduced.

The next graph shows average page duration over time against the 50,000 node database.  (Ignore the swap in colors between the green and blue lines.  I also don’t care that the scale on the y-axis has changed — this is caused by relative sizes of the spikes in the graph, which in this case seem to be nothing more than ordinary noise.)

50000nodesThis graph looks like a reverse of the previous graph.  Why?  Because at the beginning of this test Drupal is still using the cache from the previous test.  Once it exhausts the first 10,000 table rows it must begin constructing pages again and this more than doubles the page duration.  Even after the 20 minute mark of the test, average page durations are still a little lower than what we saw in the 10,000 node test, probably because some virtual users continue to revisit the cached pool.

Technically, this is a mistake on my part — good science would seem to require that I clear the cache at the beginning of each test run.  Nevertheless, I feel we’ve discovered what we were looking for in this test run: Drupal scales cleanly with increasing database size, and configuration details dominate scalability issues in a default Drupal installation.

What about the logical node count?

When talking about database scalability, there are actually two things that interest us: scalability by total number of bytes and scalability by the logical number of database rows.  Drupal also performed flawlessly up to 650,000 nodes (but with no comments).  There was no discernible performance degradation across this range.

Conclusion

We weren’t really expecting to discover any horrible scalability bugs in Drupal (or MySQL, for that matter), and we didn’t.  In fact, there was no detectable performance penalty associated with increased database size.  By any conventional measure, Drupal demonstrated excellent scalability.

— Lane, engineer at Web Performance.

Add Your Comment

You must be logged in to post a comment.

Resources

Copyright © 2024 Web Performance, Inc.

A Durham web design company

×

(1) 919-845-7601 9AM-5PM EST

Just complete this form and we will get back to you as soon as possible with a quote. Please note: Technical support questions should be posted to our online support system.

About You
How Many Concurrent Users