top of page

The four golden signals for performance and machine learning experts

  • Writer: Josef Mayrhofer
    Josef Mayrhofer
  • 7 days ago
  • 2 min read

Artificial intelligence is revolutionizing our world. Performance engineering and Machine Learning experts work hard behind the scenes to ensure fast response times and optimal predictions. Both domains focus on performance metrics, but a closer look reveals significant differences.


What are the top 4 performance metrics for machine learning?

For machine learning experts, it is crucial to determine how well ML models perform on datasets to solve given problems. Accuracy, Prediction, Recall, and F1-Score are essential performance metrics when validating machine learning models because they guide us in finding the best solution to a given problem.


Let's consider a modern approach to detecting ransomware using machine learning models. Would it be acceptable if our ML model detects only 90 percent of ransomware? Can we accept 10 percent of false positives? We prioritize missed alerts over false alerts when detecting ransomware attacks, and therefore, Precision is our most important performance metric.


Accuracy = (TP+TN) / (TP+TN+FP+FN): High accuracy indicates classes are in the center and identify most true positives.


Precision = TP/(TP+FP): High Precision indicates classes are close to each other, reducing false positives.


Recall = TP / (TP+FN): Identify most of the actual positives and reduce false negatives, which is crucial for ransomware and zero-day detection.


F1 Score = 2∗ (Precision∗Recall) / (Precision + Recall): We balance Recall and Precision, which is crucial for fraud detection cases.


Performance Problems in machine learning models can lead to poor decision-making and, in the worst case, unacceptable business risks.


What are the top 4 performance metrics for performance engineers?

Performance engineers work hard to build scalable systems by reducing latency and optimizing system resource utilization. It's often a tradeoff between response time reduction and throughput maximization because, at some point, every service slows down, and design considerations turn to bottlenecks. In the performance engineering industry, we call these bottlenecks knee points. Some of them are acceptable; others must be remediated because if our customers' performance requirements are at risk, we must find strategies to solve them.


Latency (Response Time): Measures how long it takes to complete a request. It's a crucial metric when we validate performance requirements of real-time applications, APIs, and web performance.


Throughput: Number of transactions or operations handled per unit time. A system can process a certain amount of requests during a given time. Usually, we measure it in requests or transactions per second expressed by TPS or RPS.


Utilization measures hardware resource consumption, such as CPU, Network, Memory, or I/O. It becomes a critical resource for mission-critical and high-performance systems. We balance latency, throughput, and utilization to keep service costs low.


Error Rate measures the percentage of failed operations. Both functional and performance problems can lead to rising error rates. Error rate metrics are a crucial indicator of our systems' service quality. We must always include error rate metrics when optimizing services for reliability and robustness.


System performance problems result in an unacceptable user experience, a decline in trust and reputation, and high troubleshooting and infrastructure costs.


1+1 = 3

Machine learning requires predictive and accurate models and techniques that run them efficiently and reliably. Performance and machine learning experts build systems that solve tomorrow's problems.


Keep up the great work! Happy Performance Engineering



Commentaires


bottom of page