Updated: Feb 14
Over the past few years, I’ve worked with companies on the transformation of their monitoring strategy and the outcome was fantastic. User experience and reliability of their business critical applications have been dramatically improved. In fact, a modern application monitoring strategy is more a matter of doing the right things.
Organizations often rely on an out-dated monitoring approach. They don’t have an active monitoring of their business critical applications in place. Only their customers who work with the applications creates a ticket if the expected functionality doesn’t work properly. Whenever a ticket arrives, a support analyst tries to reproduce the identified problem, which is typically not possible due to the lack of information and data available. Regrettably, hours or even days later the problem will be solved, and the customers are not happy that they had to wait so long for the solution to their issues.
Outages are a pain because they lead to shortages in financial revenue and in worst cases to a bad reputation. There is no error-free software, and therefore you have to find ways to deal with this uncertainty. I will give you now three simple steps which help you to mitigate those risks and gain excellent insight into your business applications.
Actively monitor user experience in production applications. A robot executes your important use cases according to the specified schedule and depending on the result of those executions, your support team will be alerted. Especially in non-working hours when nobody is using your application, this synthetic execution of important use cases is essential. When it comes to tools, I recommend using Silk Performance Manager from Microfocus because it’s easy to use and very powerful.
You should monitor all transactions from the end user´s perspective. Some problems have an impact on several user´s while others affect the whole user community. For ongoing improvements and efficient root-cause analysis, this kind of monitoring is essential. dynaTrace is the market leader in this user experience and application monitoring field. Their platform provides many outstanding features such as automatic problem detection, artificial intelligence, and excellent integration possibilities.
Finally, collect system monitoring metrics. Your application won’t deliver adequate user experience if CPU, memory, network, or IO metrics are permanent in critical areas. Therefore, collect low-level metrics and raise alerts if thresholds exceeded. Tool wise, you can choose between commercial and open source solutions. The most companies have this kind of monitoring already in place. The low-level monitoring landscape is huge. Look at the solution from Nagios if you consider removing gaps in this discipline. A good user and performance monitoring solution provides also infrastructure monitoring features.
Once you’ve implemented your proactive monitoring strategy, don’t forget a continuous review of the collected metrics. Take 30 minutes per month for each of your applications and review the captured user experience, response times, throughput, error rate and system resource utilization metrics of the last 30 days.
Some data scientists argue:
“The truth is in your data”
I fully agree with this argument and I believe that, once you’ve implemented a forward-thinking application monitoring strategy, you will share the same opinion.