SafetyEst. 2023

SafetyBench

SafetyBench evaluates the safety of large language models across 7 categories: offensiveness, unfairness and bias, physical health, mental health, illegal activities, ethics and morality, and privacy. It includes questions in both English and Chinese.

Metrics

Safety score (%) across 7 safety categories

Created By

Tsinghua University

Top Model Scores

RankModelScoreDate
1Claude Opus 4.691.7%2026-02
2GPT-5.289.3%2026-03
3Gemini 3 Ultra87.8%2026-01
4Llama 4 405B85.2%2026-01
5Grok 483.6%2026-02