MathEst. 2021

GSM8K

Grade School Math 8K is a dataset of 8,500 high-quality, linguistically diverse grade school math word problems. It tests multi-step mathematical reasoning with problems requiring 2-8 steps to solve.

Metrics

Accuracy (%) on 8,500 math word problems

Created By

OpenAI

Top Model Scores

RankModelScoreDate
1GPT-5.298.1%2026-03
2Claude Opus 4.697.8%2026-02
3Gemini 3 Ultra97.5%2026-01
4Grok 497.2%2026-02
5Llama 4 405B96.3%2026-01