What Is F1 Score?
The F1 score is a machine learning evaluation metric that combines precision and recall into a single number using their harmonic mean, providing a balanced measure of a model's accuracy that accounts for both false positives and false negatives.
How F1 Score Works
In classification tasks, precision measures how many of the model's positive predictions are correct, while recall measures how many actual positives the model catches. The F1 score is the harmonic mean of these two: F1 = 2 * (precision * recall) / (precision + recall). It ranges from 0 to 1, with 1 being perfect. The harmonic mean ensures that a model must perform well on both metrics — a model with perfect precision but zero recall scores 0. F1 is especially useful for imbalanced datasets where accuracy alone can be misleading (a spam filter that marks everything as 'not spam' could have 99% accuracy if only 1% of emails are spam, but would have 0 F1 for the spam class).
Real-World Examples
A spam filter achieving an F1 score of 0.95, indicating it catches most spam (high recall) with few false flags (high precision)
A medical diagnostic AI with F1 of 0.89 for detecting tumors, balancing the need to catch all tumors vs. avoiding false alarms
A named entity recognition model reporting F1 scores per entity type: Person 0.93, Organization 0.87, Location 0.91