Load Testing Data Population: How Many Rows?

Add Your Comment
2 May 2011
Tags: calculator, data estimates, Load Testing, load testing data
Posted in Load Testing

A common problem when setting up a load testing configuration in Load Tester is figuring out how many rows of data you need for a particular test. For example, you need to have a set of user names and passwords to be used during the test, but how many do you need to ensure that the test will complete?

To answer this question, you need to know three things: the duration of the test, the expected duration of the test case, and how many concurrent users the test will simulate. Fortunately, these things are usually easy to determine. The test duration is set by you in the load configuration:

Setting the test duration in Load Tester

Next, you need to know the expected test case duration. Fortunately, Load Tester calculates this value for you in the Test Case Editor View:

Load Tester calculates the test case duration automatically

As a rule of thumb, round to the nearest minute and round down if possible . This number is not always accurate, as it does not include test case looping (the Restart Options tab in the right-click Test Case Properties dialog on a test case) nor does it take into account page processors that will refresh a page repeatedly while waiting for a particular result. In such cases, you will need to manually estimate the test case durations you are likely to see. In general, it’s a good idea to lowball the test case duration estimate, as the test will only fail if you come up short; having too many dataset rows is never a problem for the test.

Finally, the number of concurrent users is also set by you in the load configuration:

Configuring Maximum Users

Note that the maximum users field is calculated from the fields above it, and is not directly set.

Now that we have the key information we need, how do we put it together? Well, if all the users in a test started at the very beginning and continued to the very end, the calculation is simple: the number of users times the test duration divided by the testcase duration (rounded up) will give you the number of dataset items used.

Visually, we might represent such a test something like this (Calculation #1):

The Simplest Case

So we multiply the number of users (4) by the test duration (60 minutes) divided by the testcase duration (13 minutes), with the test duration/testcase duration value rounded up:

4 * (60/13) = 4 * 5 (was 4.6) = 20

So we need 20 rows of data to satisfy the data demand for this test.

Of course, most load tests ramp up during the course of the test, instead of starting all the users at the beginning. If your data creation process is automated, you could use this simple method and it would overestimate, but would work. However, if your data creation is time-consuming or manual, you probably want a better estimate of how many data rows you actually need. A more standard load test would look something like this (Calculation #2):

The Normal Case

Estimating the number of rows necessary in this case is just like calculating the area of a triangle in high school geometry – divide the previous calculation by two. As long as the user ramp is even and terminates within one test case duration of the end of the test, this estimate will be accurate.

What about in the case of a test where you want to ramp up to a certain number of users, and then hold at that level for an extended period of time? In such a case, to estimate the number of data rows needed, split the test into two parts for the purposes of the estimate – one in which you’re ramping, and one in which you’re holding at a fixed user level:

Separate Calculations For Different Test Sections

Once you calculate both of these values, add them together to get the total number of dataset rows needed. You can use this technique to estimate for any test, even one that ramps unevenly – simply divide up the test sections into ramping periods and load periods, estimate those, and then add them all together. You must also do the same for tests with multiple test cases if the test case durations are not similar.

Finally, you want to pad the end estimate to protect the test from running into issues where the load engines become unbalanced – we usually recommend adding at least 25% to the estimate. When one load engine has significantly more virtual users than the others and the datasets are divided evenly amongst the load engines, that engine is at risk of running out of data before the others. In addition to the padding, you may also need to apply a user limit to that load engine, or even all the load engines, to ensure that they do not get out of hand.

We’ve provided a calculator for calculation #2 with 25% padding on our site.

Happy Testing!

Matt Drew

Web Performance Test Engineer

Resources

DesignHammer – A Durham web design company

919.845.7601 Mon – Fri 9am – 5pm ET

Load Testing Data Population: How Many Rows?

Related Posts:

Add Your Comment

Resources

(1) 919-845-7601 9AM-5PM EST