If you’ve been using average page load data to evaluate your website’s performance, you could be missing important bottlenecks that are reducing your conversion rates and causing customers to have a frustrating experience. While averages are easy to understand, they cover up all of the details. After all, if the average load time of a page is 2 seconds, many users are seeing load times slower than that, but we don’t know exactly how slow. The industry standard of expressing site performance in percentiles is technically accurate, but also doesn’t give a full picture.
Here’s a good example. A recent client, a high-end retailer, showed excellent average page load times. It looks like there wasn’t any slowdown at all in terms of page load times as the load increased to 2,250 concurrent users.
What the average was covering up was a much wider range of response times than you’d imagine just by looking at the above chart. That’s why we added scatter plots to Web Performance Load Tester and you can of course generate them in most modern load testing tools. They plot every singe web page response time in a test, or can be focused on a single test case or even a single web page.
These plots provide us with a readily comprehensible overview of the range of web page load times seen by customers in different parts of the website. While the average web page load time was fast, in reality the all important checkout and shopping cart process had a substantial number of responses within the 5-15 second range even at low load levels! If you had relied on just the average load times you would have missed that all important detail that could be driving down the sales conversion rate and costing the company sales.
Another troubling concern is the spikes circled in red. When the scattered load times bunch up in recognizable vertical lines, it is an artifact of how web page response times are measured. A web page response time is only recorded once the page has fully loaded. Prior to this, the page’s loading time remains unknown. First, one user requests a page that the back end cannot serve, followed by another and another. The first bottleneck in the image above spans 35 seconds, indicating that the cause of the non-responsive page took 35 seconds to unblock itself and return the page content. Once this happened, the blocked page was served to all of the waiting users at once, creating the vertical stack pattern. The first user to request the blocked page waits the longest, the second a little less, etc, and the most recent user may not wait that much time at all. But no matter how long they waited, they all saw the page update at approximately the same time.
You’ll see the scattered load times line up like this when a call to a webpage’s back end, whether directly to the database or an application server, is blocked. Often, this occurs due to a locked database table, but it can also result from any type of failure at any level. In this particular example, the servers were at about 40% CPU load when the blocks started occurring, so the problem wasn’t server capacity.
Some typical causes of a blocked/locked page:
Once you know there is something blocking a web page or API call for long periods of time, the next step is to work with the rest of the DevOps team to comb through logs and monitoring tools to try and find the source of the block. A lot of the time, it’s a DB query that locked a table, which is easy for a DBA to check, but finding solutions to other problems will take more time to pinpoint the exact cause.
Founder of Web Performance, Inc
B.S. Electrical & Computer Engineering
The Ohio State University
LinkedIn Profile