GeneralEst. 2024

Arena-Hard-Auto

Arena-Hard-Auto is an automated benchmark that correlates highly with Chatbot Arena rankings. It uses 500 challenging user queries and automated judge evaluation to approximate human preferences at a fraction of the cost.

Metrics

Win rate (%) vs baseline model

Created By

LMSYS Org

Top Model Scores

RankModelScoreDate
1GPT-5.292.1%2026-03
2Claude Opus 4.690.6%2026-02
3Gemini 3 Ultra87.3%2026-01
4Grok 484.8%2026-02
5DeepSeek V381.2%2026-01