CodeEst. 2023

HumanEval+

HumanEval+ augments the original HumanEval benchmark with 80x more test cases per problem, providing a more rigorous evaluation of code correctness. Many models that score well on HumanEval see significant drops on HumanEval+.

Metrics

Pass@1 (%) with augmented test cases

Created By

EvalPlus Team

Top Model Scores

RankModelScoreDate
1Claude Opus 4.690.2%2026-02
2GPT-5.289.4%2026-03
3Gemini 3 Ultra87.1%2026-01
4DeepSeek Coder V385.8%2026-01
5Grok 484.3%2026-02