LanguageEst. 2023

LegalBench

LegalBench is a collaboratively built benchmark for evaluating legal reasoning in language models. It consists of 162 tasks spanning 6 types of legal reasoning: issue-spotting, rule-recall, interpretation, rule-application, conclusion, and rhetorical understanding.

Metrics

Accuracy (%) across 162 legal reasoning tasks

Created By

Stanford HAI / Hazy Research

Top Model Scores

RankModelScoreDate
1Claude Opus 4.684.6%2026-02
2GPT-5.283.9%2026-03
3Gemini 3 Ultra81.2%2026-01
4Grok 478.7%2026-02
5Llama 4 405B75.3%2026-01