MultimodalEst. 2024

MMLU-Pro Vision

MMLU-Pro Vision extends MMLU-Pro to multimodal settings where questions include images, diagrams, charts, and figures alongside text. It tests whether vision-language models can leverage visual information for academic reasoning.

Metrics

Accuracy (%) on visual academic questions

Created By

TIGER-AI Lab

Top Model Scores

RankModelScoreDate
1GPT-5.268.4%2026-03
2Gemini 3 Ultra67.1%2026-01
3Claude Opus 4.665.8%2026-02
4Grok 461.3%2026-02
5InternVL 358.7%2026-01