VisionEst. 2024

RealWorldQA

RealWorldQA evaluates vision-language models on practical, real-world visual understanding tasks including spatial reasoning about real photographs, reading text in images, understanding scenes, and answering practical questions.

Metrics

Accuracy (%) on real-world visual questions

Created By

xAI

Top Model Scores

RankModelScoreDate
1Gemini 3 Ultra79.6%2026-01
2GPT-5.278.3%2026-03
3Claude Opus 4.676.7%2026-02
4Grok 473.1%2026-02
5InternVL 370.8%2026-01