Chicken and Egg Problem in Performance Testing
Over the last 20 years, I was involved in hundreds of performance testing projects. Some of them were completed within a few days and others took several years. We've worked in teams of up to 15 performance engineers on core banking replacement projects and our fast forward approach has pushed us in a kind of frontline runner role. We've uncovered dramatic reliability problems and surprised decision-makers how insightful performance engineering/performance testing and performance monitoring is.
But, things are not always so straight-forward.
If you start on a greenfield, would you ensure proper performance monitoring or focus more on performance testing in a first step?
Before 2010 a big proportion of my projects ignored the importance of a powerful performance monitoring that traces all transactions across all layers. We simply identified the performance requirements, created workload models, implemented and executed load, stress, and scalability test scenarios, and worked with developers on problem analysis and tuning. I remember plenty of situations in which finger-pointing started and teams blamed each other. Load testing results demonstrated that response times and error rates were beyond the agreed expectations. However, we had not enough insights to understand what has caused these problems. We've started war rooms, demonstrated our findings, and asked all attendees to proceed with the analysis on their end. In some cases we were lucky and a system engineer came back with the root cause of these problems. For the biggest proportion of our performance tests, we ended up in let's repeat the test and see if we can reproduce this issue and hope to get more insights. This was always very frustrating because we had no chance to speed up this analysis process. Analysis and tuning was always a massive exercise and a very uncontrollable element in our performance testing projects.
In the last 5 years, things have changed in a big way
Performance engineers are much more on the driver seat when it comes to monitoring. We do no longer rely on system monitoring or log files. Our monitoring solutions are at the heart of our applications. This powerful concept of application performance monitoring seems to replace many other monitoring approaches soon and there are very good reasons for that.
Let's start with the role of performance monitoring in the development environment. Engineers are coding new features in isolated environments. They run component and unit-level checks to ensure that everything is working as intended. CI/CD pipelines build, deploy, and test the new code also for performance. Only a traffic light-based quality gate provides evidence that from a functionality, performance, and security perspective all is in a good shape. Automated performance validation in such a CICD pipeline is only possible if you have a powerful performance monitoring in place.
Once several sprints have been successfully tested for performance, releases are deployed to integration and acceptance environments. Performance engineers start to build their production like regular, peak, and stress test scenarios. Hundreds of services can be involved and functional issues can hold back the entire team. Performance monitoring does not only help to identify performance problems root cause, but it is also a nice solution to trace functional problems and uncover the reason behind in complex environments. My favorite use cases are architectural analysis, horizontal and vertical drill down to the root cause of performance bottlenecks, and powerful AI-based anomaly detection. Imagine you would have to uncover hotspots in a distributed, microservice based environment hosted on a container platform. It's too much effort and a big risk for such an undertaking.
The question is now: What to start first on a greenfield?
The Chicken or the Egg. Performance Testing or Performance Monitoring. The former is not successful without the latter but there are many cases in which Performance Monitoring provides tremendous help in terms of resolving issues in a complex environment. My recommendation is to improve observability first, ensure high-quality performance monitoring of all involved services before you start the implementation or execution of performance tests. Such an approach pays back in several ways. You will be able to detect more issues earlier and not end in trial and error debugging sessions.
Keep doing the great work. Happy Performance Engineering!