LLMJanuary 28, 2022Google Brain

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, Denny Zhou

Abstract

We explore how generating a chain of thought — a series of intermediate reasoning steps — significantly improves the ability of large language models to perform complex reasoning. We show that chain-of-thought prompting substantially outperforms standard prompting on arithmetic, commonsense, and symbolic reasoning benchmarks, with improvements most dramatic in the largest models.

Key Findings

  • 1Demonstrated that chain-of-thought prompting dramatically improves LLM reasoning
  • 2Showed that providing step-by-step reasoning examples unlocks emergent capabilities
  • 3Achieved state-of-the-art on GSM8K math benchmark with prompting alone
  • 4Found that chain-of-thought is an emergent ability appearing primarily in large models
  • 5Required no fine-tuning, only changes to the prompt format

Impact & Significance

Chain-of-thought prompting became one of the most widely used techniques in prompt engineering and influenced how models like GPT-4 and Claude are designed to reason through complex problems step by step.

Related Tools

Read Full Paper