CodeEst. 2024

BFCL (Berkeley Function Calling)

Berkeley Function Calling Leaderboard evaluates the ability of models to accurately generate function/tool calls with correct parameters. It tests API call generation, parameter extraction, and multi-tool orchestration scenarios.

Metrics

Overall accuracy (%) on function calling

Created By

UC Berkeley

Top Model Scores

RankModelScoreDate
1Claude Opus 4.693.7%2026-02
2GPT-5.292.4%2026-03
3Gemini 3 Ultra90.8%2026-01
4Grok 488.3%2026-02
5DeepSeek V386.1%2026-01