AgentsMay 12, 2022Google DeepMind

A Generalist Agent

Scott Reed, Konrad Zolna, Emilio Parisotto, Sergio Gomez Colmenarejo, Alexander Novikov, Gabriel Barth-Maron, Mai Gimenez, Yury Sulsky, Jackie Kay, Jost Tobias Springenberg, Tom Eccles, Jake Bruce, Ali Razavi, Ashley Edwards, Nicolas Heess, Yutian Chen, Raia Hadsell, Oriol Vinyals, Mahyar Bordbar, Nando de Freitas

Abstract

We introduce Gato, a single generalist agent that works as a multi-modal, multi-task, multi-embodiment generalist policy. The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm, and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens.

Key Findings

  • 1Created a single model performing 604 distinct tasks across modalities
  • 2Demonstrated a unified architecture for text, vision, and robot control
  • 3Showed that a single set of weights can handle diverse embodiments
  • 4Achieved competitive performance across all task categories
  • 5Pointed toward the possibility of foundation models for robotics

Impact & Significance

Gato demonstrated the feasibility of building generalist AI agents that can operate across modalities and embodiments, influencing the vision of AGI as a single model that can do everything from chat to robot control.

Related Tools

Read Full Paper