MultimodalDecember 6, 2023Google DeepMind

Gemini: A Family of Highly Capable Multimodal Models

Google DeepMind

Abstract

We report on Gemini, a family of highly capable multimodal models that demonstrate strong generalist capabilities across image, audio, video, and text understanding. The Gemini Ultra model advances the state of the art in 30 of 32 benchmarks, achieving the first model to reach human-expert performance on the MMLU exam benchmark.

Key Findings

1First model to achieve human-expert performance on MMLU (90.0%)
2Natively multimodal: trained on text, images, audio, and video simultaneously
3Achieved state-of-the-art on 30 of 32 benchmarks across modalities
4Demonstrated strong reasoning across text, code, and visual inputs
5Introduced a family of models (Ultra, Pro, Nano) for different deployment scenarios

Impact & Significance

Gemini established Google DeepMind as a competitor to OpenAI at the frontier. The model family powers Google's AI products including Bard/Gemini chat, Google Search AI features, and Android AI capabilities.

Related Tools

Gemini Google Ai Studio

Read Full Paper

Gemini: A Family of Highly Capable Multimodal Models

Abstract

Key Findings

Impact & Significance

Related Tools

Related Papers

The Llama 3 Herd of Models

Qwen2 Technical Report

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

The Claude 3 Model Family: Opus, Sonnet, and Haiku