MultimodalDecember 6, 2023Google DeepMind

Gemini: A Family of Highly Capable Multimodal Models

Google DeepMind

Abstract

We report on Gemini, a family of highly capable multimodal models that demonstrate strong generalist capabilities across image, audio, video, and text understanding. The Gemini Ultra model advances the state of the art in 30 of 32 benchmarks, achieving the first model to reach human-expert performance on the MMLU exam benchmark.

Key Findings

  • 1First model to achieve human-expert performance on MMLU (90.0%)
  • 2Natively multimodal: trained on text, images, audio, and video simultaneously
  • 3Achieved state-of-the-art on 30 of 32 benchmarks across modalities
  • 4Demonstrated strong reasoning across text, code, and visual inputs
  • 5Introduced a family of models (Ultra, Pro, Nano) for different deployment scenarios

Impact & Significance

Gemini established Google DeepMind as a competitor to OpenAI at the frontier. The model family powers Google's AI products including Bard/Gemini chat, Google Search AI features, and Android AI capabilities.

Read Full Paper