EfficiencyDecember 12, 2023Microsoft Research

Phi-2: The Surprising Power of Small Language Models

Microsoft Research

Abstract

We present Phi-2, a 2.7 billion parameter language model that demonstrates outstanding reasoning and language understanding capabilities, matching or outperforming models up to 25x larger. Phi-2 is trained on carefully curated synthetic and web data, showing that data quality can compensate for model size in achieving strong performance.

Key Findings

1A 2.7B model matching or outperforming models 25x larger on benchmarks
2Demonstrated that data quality is more important than model size
3Used carefully curated synthetic and filtered web data for training
4Achieved strong performance on coding, math, and reasoning tasks
5Showed that small models can be viable for edge deployment

Impact & Significance

Phi-2 advanced the efficient small model paradigm, proving that careful data curation can produce surprisingly capable small models. It influenced the development of on-device AI and edge deployment strategies.

Related Tools

Azure Ai Ollama

Read Full Paper

Phi-2: The Surprising Power of Small Language Models

Abstract

Key Findings

Impact & Significance

Related Tools

Related Papers

The Llama 3 Herd of Models

Qwen2 Technical Report

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

The Claude 3 Model Family: Opus, Sonnet, and Haiku