LLMApril 5, 2022Google Research

PaLM: Scaling Language Modeling with Pathways

Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann

Abstract

We trained a 540-billion parameter, dense decoder-only Transformer model, which we call Pathways Language Model (PaLM). PaLM achieves state-of-the-art few-shot learning results on hundreds of language understanding and generation tasks. It demonstrates breakthrough capabilities on reasoning tasks requiring multi-step logical inference.

Key Findings

1Scaled to 540B parameters using Google's Pathways system across 6144 TPU v4 chips
2Achieved state-of-the-art on hundreds of benchmarks with few-shot prompting
3Showed breakthrough performance on reasoning and code generation tasks
4Demonstrated discontinuous improvements in reasoning at scale
5Established new efficiency records for training at this scale

Impact & Significance

PaLM demonstrated Google's ability to train models at unprecedented scale and laid the groundwork for the Gemini model family. Its strong reasoning capabilities influenced the industry's focus on emergent abilities at scale.

Related Tools

Gemini Google Ai Studio

Read Full Paper

PaLM: Scaling Language Modeling with Pathways

Abstract

Key Findings

Impact & Significance

Related Tools

Related Papers

The Llama 3 Herd of Models

Qwen2 Technical Report

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

The Claude 3 Model Family: Opus, Sonnet, and Haiku