AgentsMay 12, 2022Google DeepMind

A Generalist Agent

Scott Reed, Konrad Zolna, Emilio Parisotto, Sergio Gomez Colmenarejo, Alexander Novikov, Gabriel Barth-Maron, Mai Gimenez, Yury Sulsky, Jackie Kay, Jost Tobias Springenberg, Tom Eccles, Jake Bruce, Ali Razavi, Ashley Edwards, Nicolas Heess, Yutian Chen, Raia Hadsell, Oriol Vinyals, Mahyar Bordbar, Nando de Freitas

Abstract

We introduce Gato, a single generalist agent that works as a multi-modal, multi-task, multi-embodiment generalist policy. The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm, and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens.

Key Findings

1Created a single model performing 604 distinct tasks across modalities
2Demonstrated a unified architecture for text, vision, and robot control
3Showed that a single set of weights can handle diverse embodiments
4Achieved competitive performance across all task categories
5Pointed toward the possibility of foundation models for robotics

Impact & Significance

Gato demonstrated the feasibility of building generalist AI agents that can operate across modalities and embodiments, influencing the vision of AGI as a single model that can do everything from chat to robot control.

Related Tools

Gemini

Read Full Paper

A Generalist Agent

Abstract

Key Findings

Impact & Significance

Related Tools

Related Papers

The Llama 3 Herd of Models

Qwen2 Technical Report

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

The Claude 3 Model Family: Opus, Sonnet, and Haiku