LanguageEst. 2023

LegalBench

LegalBench is a collaboratively built benchmark for evaluating legal reasoning in language models. It consists of 162 tasks spanning 6 types of legal reasoning: issue-spotting, rule-recall, interpretation, rule-application, conclusion, and rhetorical understanding.

Metrics

Accuracy (%) across 162 legal reasoning tasks

Created By

Stanford HAI / Hazy Research

Paper

View paper →

Website

Visit website →

Top Model Scores

Rank	Model	Score	Date
1	Claude Opus 4.6	84.6%	2026-02
2	GPT-5.2	83.9%	2026-03
3	Gemini 3 Ultra	81.2%	2026-01
4	Grok 4	78.7%	2026-02
5	Llama 4 405B	75.3%	2026-01

Related Language Benchmarks

MMLU

Massive Multitask Language Understanding measures knowledge across 57 academic subjects including STEM, humanities, social sciences, and more. It tests both world knowledge and problem-solving ability at varying difficulty levels from elementary to professional.

Top: GPT-5.2 — 92.4%

← Back to all benchmarks

LegalBench

Top Model Scores

Related Language Benchmarks

MMLU

MT-Bench

AlpacaEval 2.0

WildBench