First published at Tuesday, 26 September 2017
Checklist For A Reliable Load-Test
Setting up a load-test that produces results you can rely on is not that simple. But without realistic test-results you cannot be sure that your application handles sudden increases of traffic, rapid spikes or even the initial go-live. And you cannot estimate at which number of users you should scale up your hardware. Both details are very important to keep the application running at all times and guarantee that no revenue or developer sleep is lost because of outages.
We have a large checklist of points we go through when setting up performance tests with customers and I wanted to discuss some of the more important points in this blog post.
Dedicated Hardware For The Load-Test Generator
Without dedicated hardware to run the load-tests on you can immediately discard all of your results – especially when you try to run the tests from your local development machine.
You have to make sure that you don't run into client-side network bandwidth problems which increase the latency but are not an indicator of a slow application.
You also have to make sure that the load-test machines don't run into hardware or software resource limits such as the number of open ports, open files, CPU. This can skew your testing results as well.
We usually run the dedicated hardware right next to the tested application, because we can get more realistic results for the performance of the actual web application. This obviously doesn't include the network performance between the internet and your hoster. If you want to test this as well we recommend setting up both internal and external test clusters.
Testing Must Use The Production Cluster
Unless your staging setup is really exactly the same as production, down to every configuration variable, machine setup, software and database contents, then you cannot expect tests against staging or development machine to be any indicator of future load capabilities of your production system.
You can use development or staging machines for benchmarking by using relative performance comparisons during code optimizations – but not for the verification of your production setup. We have seen countless times that initially the staging system had much higher throughput than the production system, because it wasn't a distributed but a single node system. Network file systems can, for example, have a huge negative performance impact on PHP applications and are often only used in production.
If you are load-testing a new application you have to provide a realistic simulation of testing-data. In e-commerce systems this is usually simple because the product catalogue contains the bulk of data and is already fully available. For other application types this can be a difficult task to generate realistic production data at the right scale.
Working With External/Third Party Services
No man is an island and for modern web-applications this holds true with the amount of webservices they usually interact with. These can be internal or third-party systems and during your load test they will face:
With increased load on the main application the web services might be the first to fail. You want to notify third party services of a load test beforehand. Using a sandbox of third party services is not necessarily a good idea, because they might have different performance characterisics.
The flood of non production test-data generated by the load tests has to be "ignored". You don't want inventory to send 100s of packages to John Doe customers, because they didn't know you were testing.
Preparing your application to talk to external services with test-data and in various test-modes is one of the more complicated parts of load-testing setup.
Don't make the mistake of not triggering pages with external service integrations! They are usually much slower because of the HTTP overhead and are often a source of contention, bottlenecks and throughput decline for other parts of your application.
If your load-test doesn't simulate realistic user-scenarios from the real world, then it will most likely lead you to the wrong conclusions. Usually the scenarios which are most complicated to automate trigger the most complicated SQL queries or long-running external service calls and they can be a major source of performance decline when running alongside many small and faster requests. An example for this can be the checkout of an online shop and resulting cache purges when certain products are sold out.
Maybe your application uses caching heavily, so your load-test is built in a way to see extremely high cache hit ratios > 90%. But if your real world users only trigger the caches for a hit ratio of 50%, then your results can be worthless.
Or your database can only handle a much lower number of concurrent requests than your webservers. If you don't have a similar amount of requests that open a connection to the database, and rather hit the cache, then in production the results would give you a false sense of throughput.
Similar with an external system, if 99% of your traffic relies on a call to an external system that will fail once you start simulating the other 1% write traffic to this system, then failing to trigger the writes will leave you with an impression that the external system will hold under the load.
This is why its very important to work with the stakeholders of the application to define a realistic set of "personas" using your application who are using various features in a realistc way. For an ecommerce system this could mean:
Anonymous users coming from Google
Anonymous users clicking a link on a newsletter
Google (and other bots) crawling deep into all links
Logged in users for which the pages are not cached by Varnish
Both logged in and anonymous users going through the checkout
You can setup metrics or use webserver access logs to find out which share of each persona should be used during your load-tests.
Compared to simple benchmarks, setting up a load-test to generate realistic results is a lot of work that requires careful planning – but is very well worth the investment. Especially if your application inhibits sudden spikes due to advertisement (TV, newsletter, events) or long lasting traffic increase such as the Christmas shopping spree (including Cyber-Monday). When your application load is usually very low outside these spikes, then you need a load-test to trigger the anticipated higher traffic to know if it will work, or fail.