ReasoningEst. 2018

ARC (AI2 Reasoning Challenge)

The AI2 Reasoning Challenge contains 7,787 genuine grade-school science questions, split into Easy and Challenge sets. The Challenge set contains only questions that are answered incorrectly by both a retrieval-based algorithm and a word co-occurrence algorithm.

Metrics

Accuracy (%) on ARC-Challenge set

Created By

Allen Institute for AI

Top Model Scores

RankModelScoreDate
1GPT-5.298.2%2026-03
2Claude Opus 4.697.9%2026-02
3Gemini 3 Ultra97.6%2026-01
4Grok 497.1%2026-02
5Llama 4 405B96.5%2026-01