EfficiencyJune 20, 2023Microsoft Research

Textbooks Are All You Need

Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio Cesar Teodoro Mendes, Allie Del Giorno, Sefa Gokce, Hao Lu, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivia, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sebastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee, Yuanzhi Li

Abstract

We introduce phi-1, a 1.3 billion parameter Transformer model for code generation, trained on a combination of filtered web data and synthetically generated textbook-quality data. Despite its small size, phi-1 achieves pass@1 accuracy of 50.6% on HumanEval and 55.5% on MBPP, substantially outperforming existing models of similar or even much larger size.

Key Findings

1A 1.3B model achieving 50.6% on HumanEval, outperforming much larger models
2Demonstrated that high-quality textbook-like data enables small model excellence
3Used synthetic data generation to create clean, educational training examples
4Showed that data quality trumps data quantity and model size
5Established the foundation for the Phi model series

Impact & Significance

This paper launched the Phi model series and proved that carefully curated training data can make tiny models remarkably capable. It changed how the industry thinks about data quality vs. quantity in model training.

Related Tools

Azure Ai

Read Full Paper

Textbooks Are All You Need

Abstract

Key Findings

Impact & Significance

Related Tools

Related Papers

The Llama 3 Herd of Models

Qwen2 Technical Report

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

The Claude 3 Model Family: Opus, Sonnet, and Haiku