LLMMay 28, 2020OpenAI

Language Models are Few-Shot Learners

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell

Abstract

We demonstrate that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. We train GPT-3, an autoregressive language model with 175 billion parameters, and test its performance in the few-shot setting. GPT-3 achieves strong performance on many NLP datasets without any gradient updates or fine-tuning.

Key Findings

1Demonstrated that 175B parameter models exhibit strong few-shot learning abilities
2Showed that in-context learning emerges at sufficient scale
3Achieved competitive results without any gradient updates or fine-tuning
4Revealed scaling laws: bigger models show qualitatively different capabilities
5Introduced the concept of prompting as a new paradigm for using LLMs

Impact & Significance

GPT-3 launched the era of large language models and demonstrated that scale enables qualitatively new capabilities. It made AI accessible through APIs and natural language prompts, directly enabling the creation of ChatGPT and the AI application ecosystem.

Related Tools

Chatgpt Openai Api

Read Full Paper

Language Models are Few-Shot Learners

Abstract

Key Findings

Impact & Significance

Related Tools

Related Papers

The Llama 3 Herd of Models

Qwen2 Technical Report

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

The Claude 3 Model Family: Opus, Sonnet, and Haiku