CodeEst. 2021

HumanEval

HumanEval evaluates the functional correctness of code generated by language models. It consists of 164 hand-written programming problems with function signatures, docstrings, and unit tests, measuring pass@1 and pass@k rates.

Metrics

Pass@1 (%) on 164 problems

Created By

OpenAI

Top Model Scores

RankModelScoreDate
1Claude Opus 4.696.3%2026-02
2GPT-5.295.7%2026-03
3Gemini 3 Ultra94.1%2026-01
4DeepSeek Coder V393.4%2026-01
5Grok 492.8%2026-02