LanguageEst. 2024

SimpleQA

SimpleQA evaluates factual accuracy on straightforward, unambiguous factual questions with short, verifiable answers. It specifically tests whether models provide correct factual information vs. hallucinating plausible-sounding but incorrect answers.

Metrics

Factual accuracy (%) on simple questions

Created By

OpenAI

Top Model Scores

RankModelScoreDate
1GPT-5.252.8%2026-03
2Claude Opus 4.648.3%2026-02
3Gemini 3 Ultra46.7%2026-01
4Grok 443.1%2026-02
5Llama 4 405B38.9%2026-01