VisionEst. 2017

VQAv2

Visual Question Answering v2 is a large-scale benchmark for visual question answering containing over 1 million questions about images from COCO. It tests the ability to answer open-ended questions that require understanding image content.

Metrics

Accuracy (%) on visual questions

Created By

Virginia Tech / Georgia Tech

Top Model Scores

RankModelScoreDate
1Gemini 3 Ultra88.9%2026-01
2GPT-5.288.3%2026-03
3Claude Opus 4.687.1%2026-02
4InternVL 385.4%2026-01
5Grok 484.8%2026-02