PaLM: Scaling Language Modeling with Pathways
Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann
Abstract
We trained a 540-billion parameter, dense decoder-only Transformer model, which we call Pathways Language Model (PaLM). PaLM achieves state-of-the-art few-shot learning results on hundreds of language understanding and generation tasks. It demonstrates breakthrough capabilities on reasoning tasks requiring multi-step logical inference.
Key Findings
- 1Scaled to 540B parameters using Google's Pathways system across 6144 TPU v4 chips
- 2Achieved state-of-the-art on hundreds of benchmarks with few-shot prompting
- 3Showed breakthrough performance on reasoning and code generation tasks
- 4Demonstrated discontinuous improvements in reasoning at scale
- 5Established new efficiency records for training at this scale
Impact & Significance
PaLM demonstrated Google's ability to train models at unprecedented scale and laid the groundwork for the Gemini model family. Its strong reasoning capabilities influenced the industry's focus on emergent abilities at scale.
Related Tools
Related Papers
The Llama 3 Herd of Models
Meta AI
Qwen2 Technical Report
Alibaba Cloud / Qwen Team
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
DeepSeek AI
The Claude 3 Model Family: Opus, Sonnet, and Haiku
Anthropic