Accuracy in machine learning (ML) is an evaluation metric that measures how often a model’s predictions are correct. Formally, the definition of accuracy is the number of correct predictions divided by the total number of predictions. For example, if your computer vision model classified 100 images and got 90 of them right, its accuracy would be 90%. It’s the model’s overall correctness or success rate (an accuracy synonym would be “percentage correct”).
Accuracy is one of the most common evaluation metrics for classification. In a confusion matrix, accuracy is simply the fraction of all examples the model classified correctly. It’s a convenient single‑number summary of performance and often the default metric for classifiers.
However, relying on accuracy alone is misleading. Accuracy doesn’t tell you which mistakes a model makes. In an imbalanced dataset, a model may get high accuracy by always predicting the most common class. For example, if only 1% of your data is class A, always predicting “not A” will be 99% accurate despite being useless. A medical classifier that says “no disease” for everyone can score 99% accuracy in a rare‑disease scenario while missing all the actual cases.
For deeper analysis, tools like FiftyOne let you go beyond a single number. You can compute not just accuracy but also precision, recall, and more, and inspect individual results to understand why your model’s accuracy is what it is and how to improve it.
In everyday use, accuracy and precision are often thought to mean the same thing – the classic precision and accuracy mix‑up – but in ML these metrics are distinct. Precision and recall are additional measures that help clarify the types of errors a model makes.
For precise definitions you can refer to the scikit‑learn metrics documentation, which outlines the mathematical formulas used to compute precision, recall, and related metrics.
When comparing accuracy versus precision, accuracy is how often you’re right overall while precision is how often you’re right when you predict “positive.” A model with high precision might have lower overall accuracy if it’s very selective. A model with high accuracy could still have low precision if it predicts “positive” too often and gets many wrong. Recall focuses on not missing positives: a high‑recall model finds most of the positives but may also include false alarms. In practice, there’s a trade‑off. If false alarms are costly, you’d prioritize precision; if missing positives is worse, you’d emphasize recall. A good model aims to maximize accuracy, precision, and recall together. By looking at all three, you get a much clearer picture of model performance than any single metric alone.
Talk to an expert about FiftyOne for your enterprise.
Like what you see on GitHub? Give the Open Source FiftyOne project a star
Get answers and ask questions in a variety of use case-specific channels on Discord