Accuracy (in machine learning)

Definition

Accuracy is one of the most fundamental evaluation metrics in machine learning, used to assess the performance of classification models. It is defined as the ratio of the number of correct predictions to the total number of predictions made by the model. In other words, accuracy measures the proportion of instances that the model correctly classified, whether they belong to positive or negative classes. A higher accuracy score indicates better model performance, ranging from 0 to 1 (or 0% to 100%). While accuracy provides a straightforward measure of overall correctness, it is most appropriate when classes are balanced and all misclassifications carry equal weight.

Calculation

The Accuracy of a classification model is calculated using the following formula:

\[\text{Accuracy} = \frac{\text{TP} + \text{TN}}{\text{TP} + \text{TN} + \text{FP} + \text{FN}} \tag{1}\]

Where:

TP (True Positives): The number of positive instances correctly classified as positive.
TN (True Negatives): The number of negative instances correctly classified as negative.
FP (False Positives): The number of negative instances incorrectly classified as positive (Type I error).
FN (False Negatives): The number of positive instances incorrectly classified as negative (Type II error).

The denominator represents the total number of instances in the dataset.

Example

Consider a binary classification model that predicts whether emails are spam or not:

TP = 90 (spam emails correctly identified)
TN = 850 (legitimate emails correctly identified)
FP = 20 (legitimate emails incorrectly marked as spam)
FN = 40 (spam emails missed)

\[\text{Accuracy} = \frac{90 + 850}{90 + 850 + 20 + 40} = \frac{940}{1000} = 0.94 \text{ or } 94\%\]

Necessity

Accuracy is important in machine learning for several reasons:

Simplicity and Interpretability: Accuracy is easy to understand and communicate to non-technical stakeholders, making it ideal for initial model assessment.
General Performance Overview: It provides a quick snapshot of how well a model performs across all classes, useful for comparing different models.
Balanced Datasets: When classes are evenly distributed, accuracy effectively reflects model performance.
Baseline Metric: Accuracy serves as a baseline metric before exploring more sophisticated evaluation measures tailored to specific problem domains.
Regulatory Compliance: In some applications, overall correctness is a key requirement for compliance or certification purposes.

Limitations and Alternatives

Despite its popularity, accuracy has significant limitations:

Class Imbalance Problem: In datasets with imbalanced classes (e.g., 99% negative, 1% positive), a model that predicts everything as negative would achieve 99% accuracy while being useless. This misleads practitioners into thinking the model performs well.
Equal Cost Assumption: Accuracy treats all misclassifications equally, ignoring scenarios where false positives and false negatives carry different costs (e.g., in medical diagnosis, a false negative may be more dangerous).
No Information on Error Types: Accuracy does not distinguish between different types of errors, obscuring whether the model is better at detecting one class over another.

Alternatives and Complementary Metrics

Precision: $\frac{\text{TP}}{\text{TP} + \text{FP}}$ measures the proportion of positive predictions that are correct, useful when false positives are costly.
Recall (Sensitivity): $\frac{\text{TP}}{\text{TP} + \text{FN}}$ measures the proportion of actual positives correctly identified, critical when false negatives are costly.
F1-Score: The harmonic mean of precision and recall, balancing both metrics and preferred in imbalanced datasets.
ROC-AUC: The area under the Receiver Operating Characteristic curve, useful for evaluating classifier performance across different thresholds.
Confusion Matrix: Provides a detailed breakdown of all prediction types (TP, TN, FP, FN), enabling comprehensive analysis.
Matthews Correlation Coefficient (MCC): A balanced measure for binary classification that performs well even with imbalanced datasets.
Balanced Accuracy: Calculates the average of recall for each class, addressing class imbalance issues: $\frac{\text{Recall}{\text{positive}} + \text{Recall}{\text{negative}}}{2}$.

Derived Subsequent Concepts

Accuracy has led to the development of several related evaluation frameworks and concepts:

Multi-Class Accuracy: Extension of binary accuracy to problems with more than two classes, calculated the same way but applied across all classes.
Weighted Accuracy: Assigns different weights to different classes to address imbalance issues.
Top-K Accuracy: Used in multi-class problems, it checks if the true label is among the top K predictions, common in image classification tasks.
Macro and Micro Averaging: Methods for computing accuracy in multi-label classification settings by aggregating results across labels.
Stratified Cross-Validation: Uses accuracy as an evaluation metric while ensuring class distribution is maintained across folds during model validation.
Threshold Tuning: Adjusting the classification threshold to optimize accuracy for specific use cases, often balanced against other metrics like precision and recall.
Fairness Metrics: Extensions of accuracy that assess whether model performance is consistent across different demographic groups, addressing bias and fairness concerns.

References

scikit-learn developers. (2023). sklearn.metrics.accuracy_score. Scikit-learn: Machine Learning in Python.
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning (2nd ed.). Springer.
https://www.statlearning.com/
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction (2nd ed.). Springer.
https://hastie.su.domains/ElemStatLearn/
Wikipedia contributors. (2024). Accuracy and precision. In Wikipedia, The Free Encyclopedia.
https://en.wikipedia.org/wiki/Accuracy_and_precision
Chicco, D., & Jurman, G. (2020). The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics, 21(6), 6.
https://doi.org/10.1186/s12864-019-6413-7

Accuracy (in machine learning)

Definition

Calculation

Example

Necessity

Limitations and Alternatives

Alternatives and Complementary Metrics

Derived Subsequent Concepts

References

Further Reading

TF-IDF: Term Frequency-Inverse Document Frequency

Setup Guide: Optimizing Adobe Reader (Disable AI & Classic View)

IBM Plex Typeface Guide: Understanding TTF, OTF, WOFF2, Complete, Split, and Hinted Formats