MultimodalDecember 6, 2023Google DeepMind
Gemini: A Family of Highly Capable Multimodal Models
Google DeepMind
Abstract
We report on Gemini, a family of highly capable multimodal models that demonstrate strong generalist capabilities across image, audio, video, and text understanding. The Gemini Ultra model advances the state of the art in 30 of 32 benchmarks, achieving the first model to reach human-expert performance on the MMLU exam benchmark.
Key Findings
- 1First model to achieve human-expert performance on MMLU (90.0%)
- 2Natively multimodal: trained on text, images, audio, and video simultaneously
- 3Achieved state-of-the-art on 30 of 32 benchmarks across modalities
- 4Demonstrated strong reasoning across text, code, and visual inputs
- 5Introduced a family of models (Ultra, Pro, Nano) for different deployment scenarios
Impact & Significance
Gemini established Google DeepMind as a competitor to OpenAI at the frontier. The model family powers Google's AI products including Bard/Gemini chat, Google Search AI features, and Android AI capabilities.
Related Tools
Related Papers
LLMJuly 23, 2024
The Llama 3 Herd of Models
Meta AI
LLMJuly 15, 2024
Qwen2 Technical Report
Alibaba Cloud / Qwen Team
EfficiencyMay 7, 2024
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
DeepSeek AI
LLMMarch 4, 2024
The Claude 3 Model Family: Opus, Sonnet, and Haiku
Anthropic