MMMU
Massive Multi-discipline Multimodal Understanding (MMMU) evaluates multimodal models on college-level subject knowledge and deliberate reasoning across 30 subjects and 183 subfields, using images, charts, diagrams, and domain-specific visualizations.
Metrics
Accuracy (%) across 30 college subjects
Created By
IN2 Lab / Waterloo
Paper
View paper →Website
Visit website →Top Model Scores
| Rank | Model | Score | Date |
|---|---|---|---|
| 1 | GPT-5.2 | 74.6% | 2026-03 |
| 2 | Gemini 3 Ultra | 73.8% | 2026-01 |
| 3 | Claude Opus 4.6 | 72.1% | 2026-02 |
| 4 | Grok 4 | 68.5% | 2026-02 |
| 5 | InternVL 3 | 65.2% | 2026-01 |
Related Multimodal Benchmarks
MathVista
MathVista evaluates mathematical reasoning in visual contexts. It combines challenges from diverse math and vision tasks including geometry, statistics, chart/graph understanding, and scientific figure interpretation.
Top: GPT-5.2 — 78.4%
MMLU-Pro Vision
MMLU-Pro Vision extends MMLU-Pro to multimodal settings where questions include images, diagrams, charts, and figures alongside text. It tests whether vision-language models can leverage visual information for academic reasoning.
Top: GPT-5.2 — 68.4%
MathVista
MathVista evaluates mathematical reasoning in visual contexts by combining math problems with charts, plots, geometry figures, scientific diagrams, and synthetic scenes. It aggregates 6,141 examples from 28 existing datasets and 3 new datasets, covering five task types and seven mathematical reasoning abilities. Models must interpret visual information accurately and apply mathematical reasoning to derive correct answers.
Top: GPT-5.2 — 72.6%