Compare 107+ large language models side-by-side across five key benchmarks. Find the best model for your use case — sorted by overall performance, cost, or capability.
107
Total Models
27
Frontier Models
51
Open Source
| # | Model | Provider | MMLU | HumanEval | Math | Reasoning | Coding | Overall ↓ | Context |
|---|---|---|---|---|---|---|---|---|---|
| 1 | OpenAI o3 Multimodal | OpenAI | 93.8 | 94.5 | 96.2 | 96.8 | 94 | 95.1 | 200K |
| 2 | GPT-5.2 Multimodal | OpenAI | 95.2 | 95.8 | 93.1 | 95.5 | 95.3 | 95 | 512K |
| 3 | Claude Opus 4.6 Multimodal | Anthropic | 94.8 | 95.5 | 91.8 | 95 | 95.2 | 94.5 | 400K |
| 4 | Gemini 3 Ultra Multimodal | 94.5 | 93 | 92.5 | 94.8 | 93.2 | 93.6 | 2M | |
| 5 | DeepSeek R2 Open Source | DeepSeek | 92 | 93.5 | 95 | 94.5 | 93 | 93.6 | 256K |
| 6 | OpenAI o1 | OpenAI | 91.8 | 92.4 | 94.8 | 95.1 | 92 | 93.2 | 200K |
| 7 | GPT-5 Multimodal | OpenAI | 93.5 | 94 | 89.5 | 94.2 | 93.8 | 93 | 256K |
| 8 | DeepSeek R1 Open Source | DeepSeek | 90.8 | 92 | 94.3 | 93.8 | 91.5 | 92.5 | 128K |
| 9 | Grok 4 Multimodal | xAI | 93 | 92.5 | 91 | 93.5 | 92 | 92.4 | 256K |
| 10 | Claude Opus 4 Multimodal | Anthropic | 92.3 | 95.2 | 85.6 | 93.5 | 95 | 92.3 | 200K |
| 11 | Gemini 2.5 Pro Multimodal | 92 | 93.2 | 86.4 | 92.8 | 93 | 91.5 | 1M | |
| 12 | OpenAI o4-mini | OpenAI | 88.2 | 91.5 | 92.8 | 92 | 91 | 91.1 | 200K |
| 13 | Llama 4 Behemoth Open SourceMultimodal | Meta | 92.5 | 91 | 89.5 | 92 | 90.5 | 91.1 | 256K |
| 14 | Claude Sonnet 4.6 Multimodal | Anthropic | 91.5 | 94 | 85.2 | 90.8 | 93.5 | 91 | 200K |
| 15 | Qwen 3 Max Multimodal | Alibaba | 91.5 | 91 | 88.5 | 90.5 | 90 | 90.3 | 256K |
| 16 | Gemini 3 Pro Multimodal | 91 | 90.5 | 87.8 | 91.5 | 90 | 90.2 | 1M | |
| 17 | Grok 3 Multimodal | xAI | 91.2 | 90.5 | 85 | 91.5 | 90 | 89.6 | 128K |
| 18 | OpenAI o3-mini | OpenAI | 86.9 | 90 | 90.2 | 90.5 | 89.8 | 89.5 | 200K |
| 19 | Claude Sonnet 4 Multimodal | Anthropic | 89.5 | 93.9 | 80.1 | 89.8 | 93.5 | 89.4 | 200K |
| 20 | Qwen 3 235B-A22B Open Source | Alibaba | 89.5 | 91.5 | 85 | 90.2 | 91 | 89.4 | 128K |
| 21 | DeepSeek V3.5 Open Source | DeepSeek | 90.5 | 91 | 86.5 | 89 | 90 | 89.4 | 128K |
| 22 | OpenAI o1-mini | OpenAI | 85.2 | 90 | 90 | 88.5 | 89.5 | 88.6 | 128K |
| 23 | Mistral Large 3 Multimodal | Mistral AI | 90 | 89.5 | 84 | 89.5 | 89 | 88.4 | 256K |
| 24 | Claude 3.5 Sonnet Multimodal | Anthropic | 88.7 | 93.7 | 78.3 | 87.6 | 93 | 88.3 | 200K |
| 25 | Llama 4 Maverick Open SourceMultimodal | Meta | 89.2 | 91 | 82.5 | 88 | 90 | 88.1 | 1M |
| 26 | Grok 3.5 | xAI | 89.5 | 89 | 84.5 | 89 | 88.5 | 88.1 | 128K |
| 27 | GPT-4.5 Multimodal | OpenAI | 90.8 | 88.5 | 81.2 | 91.3 | 88 | 88 | 128K |
| 28 | GPT-4o Multimodal | OpenAI | 88.7 | 90.2 | 76.6 | 86.4 | 89.5 | 86.3 | 128K |
| 29 | DeepSeek V3 Open Source | DeepSeek | 87.1 | 89 | 82 | 84.5 | 88 | 86.1 | 128K |
| 30 | Llama 4 Scout Open SourceMultimodal | Meta | 87.5 | 89.2 | 79 | 85.8 | 88 | 85.9 | 10M |
| 31 | Qwen 3 Plus Multimodal | Alibaba | 87.5 | 87 | 82 | 86 | 86.5 | 85.8 | 128K |
| 32 | Ernie 5.0 Multimodal | Baidu | 88 | 85 | 82.5 | 87.5 | 84 | 85.4 | 256K |
| 33 | Gemini 2.5 Flash Multimodal | 87 | 88.5 | 78.5 | 85.5 | 87 | 85.3 | 1M | |
| 34 | GPT-5.2 Mini Multimodal | OpenAI | 86.5 | 89.2 | 78.4 | 84 | 88.6 | 85.3 | 256K |
| 35 | Llama 3.1 405B Open Source | Meta | 88.6 | 89 | 73.8 | 85 | 88.5 | 85 | 128K |
| 36 | Llama 3.3 70B Open Source | Meta | 86 | 88.4 | 77 | 83.5 | 87 | 84.4 | 128K |
| 37 | Jamba 2 | AI21 Labs | 87 | 84.5 | 79 | 86 | 83.5 | 84 | 512K |
| 38 | Qwen 2.5 72B Open Source | Alibaba | 85.3 | 86.4 | 80 | 82.5 | 85.5 | 83.9 | 128K |
| 39 | Gemini 3 Flash Multimodal | 85.5 | 86 | 78.5 | 84 | 85.5 | 83.9 | 1M | |
| 40 | Yi-Lightning 2 | 01.AI | 86 | 84.5 | 80 | 85 | 83.5 | 83.8 | 128K |
| 41 | Pixtral Large 2 Multimodal | Mistral AI | 86 | 84 | 78.5 | 85.5 | 83.5 | 83.5 | 128K |
| 42 | Claude Haiku 4.5 Multimodal | Anthropic | 85 | 88.5 | 74.2 | 82.5 | 87 | 83.4 | 200K |
| 43 | Grok 2 Multimodal | xAI | 87.5 | 85 | 76 | 83.2 | 84.5 | 83.2 | 128K |
| 44 | Reka Core 2 Multimodal | Reka | 86.5 | 83.5 | 78 | 85.5 | 82.5 | 83.2 | 128K |
| 45 | DeepSeek Coder V3 Open Source | DeepSeek | 78.5 | 94 | 72 | 76.5 | 93.5 | 82.9 | 128K |
| 46 | Gemini 2.0 Flash Multimodal | 85.8 | 86.5 | 73.4 | 82.1 | 85 | 82.6 | 1M | |
| 47 | Claude 3 Opus Multimodal | Anthropic | 86.8 | 84.9 | 72 | 85 | 84.5 | 82.6 | 200K |
| 48 | Command R+ 2 | Cohere | 86.5 | 83 | 76 | 85.5 | 82 | 82.6 | 256K |
| 49 | WizardLM 3 Open Source | Microsoft | 82.5 | 85 | 78 | 82 | 84.5 | 82.4 | 128K |
| 50 | Phi-4 Open Source | Microsoft | 84.8 | 82.6 | 80.4 | 81 | 82 | 82.2 | 16K |
| 51 | Nemotron-4 340B Open Source | NVIDIA | 85.5 | 82 | 78.5 | 84 | 81 | 82.2 | 128K |
| 52 | QwQ-32B-Preview Open Source | Alibaba | 79.5 | 80 | 85.5 | 87 | 78.5 | 82.1 | 32K |
| 53 | Mistral Large 2 | Mistral AI | 84 | 84.5 | 74.5 | 82.8 | 84 | 82 | 128K |
| 54 | Falcon 3 180B Open Source | TII | 85 | 82.5 | 77 | 84 | 81.5 | 82 | 128K |
| 55 | DBRX 2 Open Source | Databricks | 84 | 83.5 | 75.5 | 83 | 82.5 | 81.7 | 128K |
| 56 | Claude 3.5 Haiku Multimodal | Anthropic | 84 | 88.1 | 69.3 | 78.2 | 86.5 | 81.2 | 200K |
| 57 | HyperCLOVA X 2 Multimodal | Naver | 85 | 80.5 | 76.5 | 84 | 79.5 | 81.1 | 128K |
| 58 | GPT-4o Mini Multimodal | OpenAI | 82 | 87 | 70.2 | 78.5 | 85.3 | 80.6 | 128K |
| 59 | Gemini 1.5 Pro Multimodal | 85.9 | 84.1 | 67.7 | 82 | 83.5 | 80.6 | 2M | |
| 60 | Qwen 3 Turbo Open Source | Alibaba | 82 | 83.5 | 75 | 80.5 | 82 | 80.6 | 128K |
| 61 | Mistral Medium 3 | Mistral AI | 83.5 | 83 | 71.5 | 81 | 82.5 | 80.3 | 128K |
| 62 | Codestral 2 | Mistral AI | 72 | 93.5 | 68 | 74 | 93 | 80.1 | 256K |
| 63 | Qwen 2.5 Coder 32B Open Source | Alibaba | 74.2 | 92.7 | 68.5 | 72 | 92 | 79.9 | 128K |
| 64 | Nous Hermes 3 Open Source | Nous Research | 80.5 | 82 | 73 | 79.5 | 81 | 79.2 | 128K |
| 65 | Inflection Pi-3 | Inflection AI | 84 | 78.5 | 73 | 83 | 77 | 79.1 | 128K |
| 66 | Arctic 2 Open Source | Snowflake | 81 | 80.5 | 74 | 80 | 79.5 | 79 | 128K |
| 67 | DeepSeek Coder V2 Open Source | DeepSeek | 71 | 90.2 | 73.5 | 69 | 89.5 | 78.6 | 128K |
| 68 | Nemotron 70B Open Source | NVIDIA | 83.5 | 80 | 68.5 | 80.5 | 79.5 | 78.4 | 128K |
| 69 | Command A | Cohere | 82.8 | 80.5 | 68 | 80.2 | 79.5 | 78.2 | 256K |
| 70 | Llama 3.1 70B Open Source | Meta | 82 | 80.5 | 68 | 79 | 80 | 77.9 | 128K |
| 71 | Dolphin 3 Open Source | Cognitive Computations | 79 | 80.5 | 71 | 78 | 79.5 | 77.6 | 128K |
| 72 | GLM-4-Plus Multimodal | Zhipu AI | 82 | 78 | 70.5 | 79 | 77.5 | 77.4 | 128K |
| 73 | Gemma 3 27B Open SourceMultimodal | 80.5 | 80 | 68 | 78.2 | 79 | 77.1 | 128K | |
| 74 | Mistral Small 3 Open Source | Mistral AI | 80.5 | 81 | 66.5 | 76.3 | 80 | 76.9 | 128K |
| 75 | Yi-Lightning | 01.AI | 82 | 78.5 | 68 | 78 | 77.5 | 76.8 | 16K |
| 76 | MiniMax-01 Open Source | MiniMax | 82.5 | 78 | 66.5 | 79 | 77.5 | 76.7 | 4M |
| 77 | Codestral | Mistral AI | 70.5 | 90 | 60 | 68.5 | 91 | 76 | 256K |
| 78 | Falcon 3 40B Open Source | TII | 78.5 | 76 | 70 | 77 | 75.5 | 75.4 | 64K |
| 79 | Gemini 2.0 Flash Lite Multimodal | 80.2 | 78.5 | 65 | 74.3 | 77.8 | 75.2 | 1M | |
| 80 | Solar 2 Pro Multimodal | Upstage | 77.5 | 76 | 70.5 | 76.5 | 75 | 75.1 | 64K |
| 81 | Amazon Nova Pro Multimodal | Amazon | 80.2 | 77.5 | 64 | 77 | 76.5 | 75 | 300K |
| 82 | Command R 2 | Cohere | 78 | 76.5 | 68.5 | 77 | 75 | 75 | 128K |
| 83 | Mixtral 8x22B Open Source | Mistral AI | 77.8 | 79 | 64.2 | 75.5 | 78 | 74.9 | 64K |
| 84 | CodeLlama 2 70B Open Source | Meta | 68 | 86 | 62 | 68.5 | 85.5 | 74 | 128K |
| 85 | Jamba 1.5 Large Open Source | AI21 Labs | 80 | 75.5 | 62.4 | 76.8 | 75 | 73.9 | 256K |
| 86 | Gemma 3 9B Open SourceMultimodal | 74.5 | 76 | 68 | 73 | 75.5 | 73.4 | 128K | |
| 87 | Yi-34B-Chat Open Source | 01.AI | 76.5 | 74 | 66 | 74.5 | 73 | 72.8 | 32K |
| 88 | StarCoder 3 Open Source | BigCode | 65 | 88.5 | 58 | 65 | 87.5 | 72.8 | 64K |
| 89 | Cohere Aya 3 Open Source | Cohere | 78.5 | 72 | 66.5 | 76 | 71 | 72.8 | 64K |
| 90 | Qwen 2.5 7B Open Source | Alibaba | 74.2 | 75.6 | 65 | 71.5 | 74 | 72.1 | 128K |
| 91 | BLOOM-3 Open Source | BigScience | 76 | 70.5 | 65 | 74.5 | 69.5 | 71.1 | 64K |
| 92 | Phi-4 Mini Open Source | Microsoft | 72.5 | 72 | 68.3 | 70 | 71.5 | 70.9 | 128K |
| 93 | WizardLM 2 8x22B Open Source | Microsoft | 75.5 | 74 | 58 | 73.5 | 73.5 | 70.9 | 64K |
| 94 | Reka Core Multimodal | Reka AI | 79 | 72.5 | 56 | 75 | 72 | 70.9 | 128K |
| 95 | Claude 3 Haiku Multimodal | Anthropic | 75.2 | 75.9 | 57.3 | 71 | 74.5 | 70.8 | 200K |
| 96 | Gemma 2 27B Open Source | 75.2 | 73.5 | 58.3 | 72 | 73 | 70.4 | 8K | |
| 97 | Command R+ | Cohere | 75.7 | 71.5 | 57.8 | 74 | 72 | 70.2 | 128K |
| 98 | InternLM 2.5 20B Open Source | Shanghai AI Lab | 73.8 | 72 | 62 | 71 | 71.5 | 70.1 | 1M |
| 99 | DBRX Open Source | Databricks | 73.7 | 70.1 | 54.2 | 69.5 | 69.8 | 67.5 | 32K |
| 100 | Amazon Nova Lite Multimodal | Amazon | 72.5 | 68 | 55 | 68 | 67.5 | 66.2 | 300K |
| 101 | Jamba 1.5 Mini Open Source | AI21 Labs | 72 | 68 | 52.5 | 68.5 | 67 | 65.6 | 256K |
| 102 | Aya Expanse 32B Open Source | Cohere | 73 | 65 | 52 | 70 | 64 | 64.8 | 128K |
| 103 | Mistral Nemo Open Source | Mistral AI | 68 | 67.5 | 52 | 65.5 | 66.5 | 63.9 | 128K |
| 104 | Falcon 180B Open Source | TII | 70.5 | 62 | 45 | 65 | 60.5 | 60.6 | 2K |
| 105 | Snowflake Arctic Open Source | Snowflake | 67 | 64.5 | 45.5 | 62 | 63.5 | 60.5 | 4K |
| 106 | Llama 3.1 8B Open Source | Meta | 68.4 | 62.1 | 47.2 | 62 | 61.5 | 60.2 | 128K |
| 107 | StarCoder 2 15B Open Source | BigCode | 52 | 73.5 | 35 | 48.5 | 75 | 56.8 | 16K |
Vincony gives you access to 400+ AI models — compare responses side-by-side, run AI debates, and find the best model for your task.
Visit Vincony.com