LanguageEst. 2020

MMLU

Massive Multitask Language Understanding measures knowledge across 57 academic subjects including STEM, humanities, social sciences, and more. It tests both world knowledge and problem-solving ability at varying difficulty levels from elementary to professional.

Metrics

Accuracy (%) across 57 subjects

Created By

Dan Hendrycks et al.

Top Model Scores

RankModelScoreDate
1GPT-5.292.4%2026-03
2Claude Opus 4.691.8%2026-02
3Gemini 3 Ultra91.2%2026-01
4Grok 490.5%2026-02
5Llama 4 405B88.7%2026-01