LanguageEst. 2023

IFEval

IFEval (Instruction Following Evaluation) tests how well models follow verifiable formatting instructions such as word count constraints, inclusion/exclusion of specific phrases, formatting requirements, and structural constraints.

Metrics

Strict accuracy (%) on instruction following

Created By

Google Research

Top Model Scores

RankModelScoreDate
1Claude Opus 4.691.2%2026-02
2GPT-5.290.6%2026-03
3Gemini 3 Ultra88.3%2026-01
4Grok 486.7%2026-02
5Llama 4 405B84.1%2026-01