ReasoningEst. 2023

GPQA (Diamond)

Graduate-Level Google-Proof Q&A (GPQA) Diamond is a challenging benchmark of expert-level questions in biology, physics, and chemistry. Questions are designed to be answerable by domain experts but extremely difficult for non-experts, even with web search.

Metrics

Accuracy (%) on expert-level science questions

Created By

David Rein et al.

Top Model Scores

RankModelScoreDate
1GPT-5.294.7%2026-03
2Claude Opus 4.693.2%2026-02
3Gemini 3 Ultra91.8%2026-01
4Grok 489.5%2026-02
5DeepSeek V387.1%2026-01