Leading up to the release of Load Tester 5.0, the Web Performance development team focused heavily on improving our capability to run massive load tests. Today, Load Tester 5.0 is specifically engineered to deliver as many as 1 million virtual users while controlling 500 remote load engines.
This is a “how to” article for Load Tester 5.0 users wanting to run their own massive load tests.
There are a few things you absolutely need. First and foremost is a modern workstation for the controller. By modern, I mean a 64-bit architecture with at least 7 GB of working memory. This is an absolute must. To benefit from the 64-bit architecture, it will be necessary to run the 64-bit version of the controller as well. CPU power is a less critical: We ran successful tests with an in-house Intel i7-860 and an Amazon EC2 High-CPU Extra Large Instance (c1.xlarge). The controller can take advantage of up to three hardware threads.
You may also save time during the engine-initialization phase by having plenty of upload bandwidth, especially if you have a testcase that uploads a lot of data or if you have a very large dataset that is marked both re-usable and sharable. Most testcases don’t fit this description, but if this does describe you, consider installing Load Tester on an Amazon cloud instance where it will have oodles of bidirectional bandwidth.
(Remember, these specifications are for those wanting to run load tests to 1 million users — for a typical 1-hour test running between 100 and 10,000 users, system requirements are largely a non-issue.)
Finally, you’ll need a lot of cloud instances. Load Tester is integrated with Amazon EC2, so you can launch arbitrary numbers of cloud instances from within Load Tester at the push of a button. You’ll need to estimate the number of cloud instances required to successfully execute your test. For example, if you run 10 load engines to 20,000 virtual users before they cap out, then you’ll need a bare minimum of 500 load engines to run to 1 million users. Not all test cases will scale this efficiently, especially those that have very short think times. Load engines self-monitor and will stop adding virtual users when they are at capacity.
Be aware that Amazon imposes a quota on the maximum number of cloud instances that can be run by any customer at any given time. The default quota at this time is 20 engines per region. It usually takes a few days for the quota increase to be approved, so plan accordingly. (Amazon has a contact form for this specific purpose at: http://aws.amazon.com/contact-us/ec2-request/.)
This probably goes without saying, but your website will need to be reachable from the Amazon cloud.
You’ll need to allocate extra memory to Load Tester, and you’ll need to do this up-front so that Load Tester’s pre-flight self-checks don’t indicate a shortage. Edit the “webperformance.ini” file in Load Tester’s installation directory. The two lines “-Xms1000m” and “-Xmx1000m” (the actual value you see might be different, but the command-line flag is the same) control the size of Load Tester’s memory heap. If you set this number too low, Load Tester will run out of memory. If you set it too high, you might also starve certain processes that live outside of the heap. About 2/3rds of available hardware memory is a good rule of thumb. For example, on an 8-gigabyte workstation, I would set this to about 5500m.
In the load configuration, you should disable the options named “Detailed Page Durations” and “Individual URL Metrics.” Also, under “Error Recording”, don’t tune the “Number of descriptions” or “Number of pages” settings above their default values — these are already counted per-engine. These options involve features of Load Tester that don’t scale well to hundreds of thousands of users. “Detailed Page Durations” in particular collects data samples on a per-page-load basis.
Best practice is to consider the failure of even one load engine to invalidate the test results from the moment of failure forward. When testing the high-performance features of Load Tester, we never encountered a load engine failure mid-test; however, as with all engineered systems, increasing the number of components increases the probability of a failure. Evidence of a lost load engine will first manifest when composite live statistics (such as the yellow average page duration chart) stop updating. These statistics are only updated after all load engines have reported. Eventually, either the backlog will clear or Load Tester’s built-in self-checks will determine that a material failure has occurred and report it explicitly.
Use the Engines View to monitor the health of the load engines as they are running. This view monitors CPU utilization, memory, upstream and downstream bandwidth, and ping time (round-trip latency). Excessive values for any of these numbers can be considered evidence of a problem.
The Status View monitors the health of the controller. The “Memory” sub-heading of this view can report memory pressure. You can allocate more memory to Load Tester’s heap (using the procedure described above) if this value seems problematic. Using the “Cleanup…” tool before each load test to remove unneeded materials from the repository can also reduce memory pressure. The previous advice, regarding the “Detailed Page Durations” and “Individual URL Metrics” also strongly impact memory utilization.
The “Diagnostic” sub-heading monitors Load Tester’s two data processing pipelines and will only show activity during massive load tests. The visualization queue handles chart updates, and you can clear any backlog on this queue by disabling unneeded charts. The metric storage queue is a critical function and should not exceed zero, although momentary bursts are acceptable; if this queue is overloaded consider moving the controller to a machine with more CPU and memory.