RealWorldQA
RealWorldQA evaluates vision-language models on practical, real-world visual understanding tasks including spatial reasoning about real photographs, reading text in images, understanding scenes, and answering practical questions.
Metrics
Accuracy (%) on real-world visual questions
Created By
xAI
Paper
View paper →Website
Visit website →Top Model Scores
| Rank | Model | Score | Date |
|---|---|---|---|
| 1 | Gemini 3 Ultra | 79.6% | 2026-01 |
| 2 | GPT-5.2 | 78.3% | 2026-03 |
| 3 | Claude Opus 4.6 | 76.7% | 2026-02 |
| 4 | Grok 4 | 73.1% | 2026-02 |
| 5 | InternVL 3 | 70.8% | 2026-01 |
Related Vision Benchmarks
VQAv2
Visual Question Answering v2 is a large-scale benchmark for visual question answering containing over 1 million questions about images from COCO. It tests the ability to answer open-ended questions that require understanding image content.
Top: Gemini 3 Ultra — 88.9%
DocVQA
Document Visual Question Answering evaluates the ability of models to understand and answer questions about document images including forms, invoices, scientific papers, and handwritten notes.
Top: Gemini 3 Ultra — 95.2%
ChartQA
ChartQA tests the ability of models to answer questions about charts and visualizations, requiring both visual understanding of chart elements and reasoning about the underlying data.
Top: GPT-5.2 — 90.1%
DocVQA
DocVQA (Document Visual Question Answering) tests AI models on their ability to understand and answer questions about document images including invoices, letters, reports, forms, and tables. Models must perform optical character recognition, layout understanding, and reasoning over document structure to extract specific information. It is a critical benchmark for enterprise document processing and automation applications.
Top: GPT-5.2 — 0.952