In the last 20 years, I have worked with many customers to reduce performance risks in critical IT systems. Most of these performance engineering and load testing projects were introduced after severe outages on the production environment affected these businesses. Unfortunately, system performance is an invisible element during software development. We realize performance bottlenecks when it's almost too late and often only when customers are affected and escalate such problems.
When severe performance problems and downtimes impact customers, they tend to overreact, spend all their money for a poorly laid out performance improvement exercise which results in even more significant issues.
I am sharing the performance engineering budget quadrants idea in this post as a guideline and reality-check.
Adhoc (low cost, high risk)
Performance engineering is not part of your value stream. However, you run load and performance tests from time to time if you identify some long-running requests. This unstructured practice introduces a high risk because it's up to the product teams to consider a performance test as required. Time pressure leads to shortcuts and higher risk tolerance. Your investment in performance engineering is small, but the risks of running into significant performance issues are too high.
Ineffective (high cost, high risk)
Performance testing and monitoring is a lot of manual work. It blocks your best engineers because your teams are reinventing the wheel by building unique solutions. Creating new performance engineering tools is not that easy as it looked in the first place. You need to pull a lot of data from different sources together, analyze these data lakes and implement proper reporting. You will realize that manual analysis is very time-consuming, so you start using machine learning algorithms to run experiments and build a better problem detection solution. In addition, due to time constraints, performance problems are often not identified during software development, and teams ending up in too much troubleshooting on production.
Waste (high cost, low risk)
You have the best toolbox for performance engineering in place, but its implementation is underdeveloped. As a result, your teams spend too much time on manual load testing and analysis, and performance problems appear late in the software development life cycle. Decision-makers consider performance engineering as waste because it's simply too expensive and increases time to market.
Continuous (low risk, low costs)
Overspending and taking to high risk is something you always avoid. Instead, you balance both in the most effective way. You have excellent risk-based performance engineering and monitoring in place. All changes undergo a performance risk analysis and, depending on the identified risk, performance validations are part of every stage in your development process. This practice keeps the risks resulting from a performance-related problem low and your performance engineering budget at the minimum.
Are you interested to learn more? Our team is here to bring your performance engineering budget to the right quarter.
Keep doing the great work! Happy performance engineering!