In my last post in this series, I asserted that you must test your production system and then promptly dismissed all the popular reasons for not doing so. But in the real world, things aren’t so simple. There will be cases where the production system can not be tested – for example, because test data can not be effectively purged from the system without a significant investment.
So, if you are in that situation, what can you do? If you cannot test your production environment, then you must recreate the production environment as precisely as possible. Every place that the test system deviates from the production environment is a potential difference in performance – and thus an opportunity for failure. It only takes one small setting in a config file to bring a system down under load!
Here is one example to avoid: “Our production system has ten beefy front-end servers. They are really expensive. Can we just substitute five smaller machines in our test rig and then increase our estimated capacity by a factor of two?” No. The problem with performance limitations in complex systems is that they are frequently not linear. Some are (at least up to the capacity of the hardware) but many are not. In the worst cases, performance can go from great to horrid with only a mild increase in applied load.
Some practical tips to help out:
Have any suggestions for others in this situation? Add yours in the comments – we’d love to hear them!
Chris Merrill, Chief Engineer