Updated: May 20, 2022
We often find ourselves busy with too many things at the same time. Living in this kind of multitasking mode not only affects our health, but also our daily work. It often leads to disaster, in fact.
Customers feel frustrated when our new IT services fail to meet their performance expectations. And when you discover that the new application just doesn’t perform well in a real-world situation, you realize how all your investments in building a new product were a waste of money. This post is the first in a series called “Rethink performance engineering” that will address the areas in performance engineering where we tend to be error-prone. In this series—meant as a wake-up call for all readers involved in the value stream—we’ll take a look at the following topics:
Business case for performance engineering
This first blog post provides a few glimpses into the problem patterns that generally hold performance engineering back and always waste time and money.
When the team from one of my recent projects contacted me during user-acceptance testing, the scheduled deployment to production was 2 weeks away. They wanted me to validate their performance requirements. A manual walk-through already exposed a few slow-loading pages. I still needed to implement a realistic load test and present the results a week before their go-live date. Unsurprisingly, this application was full of performance issues. I identified memory leaks, slow SQL, capacity issues, problems related to web design, and end-to-end response times that were far beyond their agreed thresholds. The management teams were quickly alerted of the bad news because there was only one week left for fixing those issues. Performance-related problems are always time-consuming and difficult to solve. After spending another three weeks to remedy all the high-severity performance problems, the company had to face not only the massive costs incurred, but also the loss of trust on the part of the end-user domain awaiting the new application that was overdue by about two weeks.
We all know the good old analogy “a fool with a tool is still a fool”, but what does that mean for our performance-engineering teams? Is there such a thing as tool obsession? I’ve seen so many performance-testing assignments end as a complete flop, and the load-generation approach selected often played a major role.
I recently conducted a performance audit for one of my clients. Several weeks of performance testing had been done before the roll-out of the new application stack to the end customers. Somewhat surprisingly, this time, the real customers were not too happy with the new application and reported a frustrating user experience. All the monitoring-visualization radiators charted a dark-green component-level health. A review of the user-simulation strategy implemented, and a closer look at the monitoring cockpit, brought light into the darkness. Fast back-end services are still not enough these days. Their load-testing approach is focused on protocol and API-level testing: the same methodology used for monitoring on production. Since we now have rich browser-based applications in place, many business functions can actually be out-tasked to the client layer. The last mile must be part of performance testing and production monitoring. Here, tools play a critical role in mitigating and detecting performance issues. If you have only a hammer in your toolbox, you’ll notice only the nails. Don’t make this mistake in your next performance-engineering assignment.
Test coverage is king. Thinking about the test pyramid will help you visualize how this story goes. Build a reliable fundament and capture all functions by unit tests. Re-run these tests in early performance testing to create a continuous-performance baseline. These old ideas are still good ones, but sadly, once the pressure on your developers increases, these are commonly neglected. Insufficient test coverage is a critical sign for shortcuts in software quality. It’s difficult to detect this problem spot though because it comes with a delay of several weeks or even months. Once you start to see the error rate increasing, no short-term fix can be applied. Don’t run too fast without having the required measures in place. Keep an eye on test coverage metrics. Performance can’t be implemented like a new feature. If it’s not on everyone’s daily agenda, you’ll pay a high price later.
We hope the applications we develop will last for years or even decades, so why do teams rush through a process as critical as designing reliable applications? Think of your ongoing projects. From Dev through Ops, the biggest goal is to push a new feature very quickly to production. Long-term thinking has no place in our development projects these days, it seems. Short-term thinking merely introduces a high risk because we don’t consider carefully enough how our applications will deal with certain production-load scenarios. In some cases, response-time checks are part of development testing. If we’re very lucky, a continuous-performance baseline is maintained, but it’s rare to find a performance engineer as part of a scrum team who ensures that the user experience gets the full attention it needs.
Currently, no university teaches either the theoretical or the practical principles for performance testing, monitoring or continuous-performance improvement. Developers are left to their own devices due to a lack of any relevant performance-engineering documents or design patterns. We all hate sitting in front of screens and watching slow-loading websites or business applications. The only winners in this situation are those of us guiding businesses in how to resolve the bottlenecks in their applications. The drawback is that such involvements are often too short and fail to fully address the real root cause. They tend to identify breaking points far too late and spend too much time on troubleshooting.
In the next part of this series I’ll give you more insider tips on how to make performance engineering part of your SDLC. Happy performance engineering!