©2008 Web Performance, Inc.
Enabling mod_deflate can reduce the bandwidth usage on a particular file by up to 70%, but also reduces the maximum load a server can handle and may actually reduce site performance if the site compresses large dynamic files.
This article examines the trade-off in CPU utilization versus bandwidth utilization when compressing content with mod_deflate in Apache. The test uses a set of static test files of in a range of sizes to simulate total page size, and measures server CPU utilization and bandwidth utilization across various traffic levels.
The analysis is based on a series of load tests peformed in our test lab. We tested five sample file sizes, and ran the same test on each file twice - once without compression, and once with compression. In each case, we measured the bandwidth necessary to serve the load, CPU utilization, and hits per second. Since we are testing mod_deflate, all content is compressed on the fly and uncached. No dynamic content is used to avoid CPU utilization by the content generation scripts; however, since no content is cached, this effectively simulates large amounts of unique dynamic content. The target webserver, Apache 2.2.3 on CentOS 5.2, was rebooted between each test. For more details, see Appendix A.
Below are the results for 10KB, 50KB, and 100KB file sizes. The dotted lines are with compression turned off, and the solid lines are with compression turned on.
The chart below shows the impact of compression on CPU utilization and bandwidth consumption while serving 10KB files. At each load level, turning on compression decreased bandwidth consumption by ~55% and increased CPU load by a factor of 4x compared to the uncompressed equivalent.
The chart below shows the impact of compression on CPU utilization and bandwidth consumption while serving 50KB files. At each load level, turning on compression decreased bandwidth consumption by ~65% and increased CPU load by a factor of ~10x compared to the uncompressed equivalent.
The chart below shows the impact of compression on the CPU utilization and bandwidth consumption while serving 100KB files. At this high data rate, the CPU was overwhelmed during the second ramp-up. At the first load level, turning on compression decreased bandwidth consumption by ~63% and increased CPU load by a factor of ~30x compared to the uncompressed equivalent.
Since gzip is a block compression algorithm, one might expect that the amount of CPU usage versus the data rate would show a roughly linear increase. However, the measured increase in CPU utilization appears to be non-linear as file size increases; further, the slope of the curve steepens across both load and file size.
There are a number of possible factors that could influence this data. One, of course, is CPU contention and scheduling. We cannot directly compare the compression of a single large file to simultaneous compressions of many smaller files due to the operating system overhead of scheduling and switching between threads and/or processes. Such overhead would add CPU utilization as load increased. Due to Apache's design, however, such overhead is inescapable, so it must be factored into real-world performance analysis.
The gzip algorithm is also sensitive to data type and structure, so it is possible that our data was significantly more difficult to compress in the larger files. The design of the test cases should prevent that from being a factor; the book data is as uniform as possible without being repetitive across the various file sizes (see Appendix A for details). Nonetheless, this remains a possibility.
What are we getting for the use of these CPU resources? The gzip algorithm is known to be fast and efficient on text files, and indeed we see total bandwidth reductions of 60-70% on pure compressible content such as in our test. Clearly this has a sizable and easily quantifiable impact on the cost of the bandwidth to service the site.
Interestingly, if we measure the percentage of traffic reduction by file size, we find that Apache's zlib compression performance was not uniform; there is an apparent optimal efficiency around the 50KB file size.
While this graph is interesting, there are several possible explanations for the curve that do not directly involve the Apache compression algorithm design or performance. Since we are measuring the traffic output and not the size of the output files, the smaller files could simply be suffering from the increased overhead of many more files being compressed and the overhead of more packets being sent partially empty (i.e. less efficiently).
Another possible influence is the data itself; the gzip algorithm is sensitive to the type of data being compressed, and it may be that a section of the file is simply more easily compressed than the others. For the larger files, the sample size was smaller; we did not include the differences in bandwidth once the CPU was at 100%, because the comparison between the uncompressed bandwidth and the compressed bandwidth is no longer valid at that point.
Finally, there are the various options and settings of mod_deflate and zlib itself, which could impact the optimal file size and behavior of the compression. Thus we do not draw any conclusion from this data, other than more testing is needed to determine how the efficiency of the compression is affected by configuration across multiple types of data and file sizes.
The advantages of compression are lower bandwidth usage and faster data transfer for large pages, which should result in a better user experience. Steve Souders, Yahoo's chief performance Yahoo, recommends compression for those exact reasons. These results have been measured previously in a variety of tests, both casual and rigorous, and the results are fairly unanimous that compression is valuable (see Appendix C). However most of these tests only look at the bandwidth advantages of compression, and not the impact on the server, as we do in this report. Further, there is little information on how the type of file being compressed affects the CPU utilization, the impact of misconfiguration (compressing already-compressed files, or avoiding compression of files that could be compressed), or how static file compression costs can be mitigated with various caching schemes.
We recommend dynamic websites that are already CPU limited should be cautious when enabling compression, particularly where it could affect large files. The trend in the data is clear: sites that are compressing large dynamic content will be expending a significant amount of CPU on compression while under load. This CPU usage does not scale well and is almost certain to delay page generation, eliminating the advantages of compression on the user experience, but not the impacts on bandwidth. Such a site could easily end up in a situation where the perceived performance of the site under load is considerably worse than without compression. Conversely, more CPU power (or servers) must be deployed to support a given load and perceived performance level for a CPU-limited application where reduction of bandwidth usage is a priority. Using a hardware-based compression device, as is found on some load-balancers, would also be an option to assist with compression performance.
The traditional software solution for decreasing the CPU load from compression is to cache the result and serve that instead, as long as the file doesn't change. However, for modern websites, at least a portion of the content is dynamic and must be compressed on the fly if it is to be compressed at all. Thus there is a clear need to cleanly separate dynamic content from static content; to minimize the size of dynamic content as much as possible; and to be cautious when enabling mod_deflate in situations where the dynamic content is large. In a future report, we will look at the use of mod_cache can mitigate the performance impacts of enabling compression.
Mod_cache was disabled throughout the test. Otherwise, the configuration was the CentOS default Apache configuration. The entire configuration is available in Appendix B.
The load testing software was Web Performance Load Tester version 3.5.6556, with the Server Agent of the same version installed on the test web server.
The target Apache web server was running on a Dell PowerEdge SC1420 with a 2.8GHz Xeon, family 15, model 4 – Pentium 4 architecture (Gallatin) with hyperthreading on, 1MB of L2 cache, 1GB of RAM, and an 800 MHz system bus. The target server was running CentOS 5.2, default server installation, updated to the current packages in the CentOS yum repositories as of 23 October 2008. The kernel was the default 2.6.18-92.1.13.el5 Linux kernel provided by the operating system.
Three of the load-generating engines were each running on a Dell Poweredge 300 with 2 x 800MHz Pentium III processors and 1GB of RAM. The fourth load-generating engine was running on a Pentium 4 2.4GHz with 2.25GB of RAM. Each engine was running the Web Performance Load Engine 3.5 Boot Disk version 3.5.6556.
The engines and server were networked via a Dell 2324 PowerConnect switch. The server was connected to a 1Gb port and the load engines were each connected to 100Mb ports.
The Load Tester GUI was running on Windows XP SP3, on a Dell Dimension DIM3000, Pentium 4 2.8Ghz with 1GB of RAM. This machine was connected to a 100Mb port on a second switch. This had no impact on the results, since the bandwidth requirements are small, but is mentioned here for completeness.
The test cases chosen were five HTML files composed primarily of text, cut using the "dd" tool to 10KB, 25KB, 50KB, 75KB, and 100KB from the 336KB source file (an HTML version of Cory Doctorow's Down and Out in the Magic Kingdom). The load engines did not uncompress or parse this file in any way. There is a delay of one second between test cases. A book was chosen because the content was relatively homogenous across the entire file without being repetitive, as repetition of large amounts of data makes compression significantly easier.
We used static HTML files to avoid any CPU usage related to dynamic generation of the file, so that the impact of the compression could be clearly seen and not confused with other web server activity; a dynamically generated HTML page would add CPU usage to the load based on how difficult it was to generate. In short, these test cases represent the best-case bandwidth scenario, with 100% highly compressible content. They also represent a worst-case CPU scenario, with all content being compressed on the fly and no caching available to prevent repetitive compression. It is likely that a real-world website will have a lower CPU utilization due to some requested files not being run through the compression. This would also, however, reduce the bandwidth gains, since those files are often images or other static files that are already compressed and thus would use the same amount of bandwidth in either scenario.
The test data repository file is available for examination - the demo version of Load Tester can view the test cases, load configurations, detailed raw metrics, and the test reports. Test reports are also available in html, see Appendix C.
Comments about this report may be posted at the company blog post.
v1.0 - 1st public release (4 Dec 08)
v1.1 - email cleanup (23 Jan 09)