ReasoningEst. 2019

WinoGrande

WinoGrande is a large-scale dataset of 44,000 Winograd-style problems that require commonsense reasoning to resolve pronoun ambiguity. It is adversarially constructed to be challenging for statistical models.

Metrics

Accuracy (%) on commonsense pronoun resolution

Created By

Allen Institute for AI

Top Model Scores

RankModelScoreDate
1GPT-5.296.4%2026-03
2Claude Opus 4.696.1%2026-02
3Gemini 3 Ultra95.7%2026-01
4Grok 495.2%2026-02
5Llama 4 405B94.1%2026-01