EfficiencyDecember 12, 2023Microsoft Research
Phi-2: The Surprising Power of Small Language Models
Microsoft Research
Abstract
We present Phi-2, a 2.7 billion parameter language model that demonstrates outstanding reasoning and language understanding capabilities, matching or outperforming models up to 25x larger. Phi-2 is trained on carefully curated synthetic and web data, showing that data quality can compensate for model size in achieving strong performance.
Key Findings
- 1A 2.7B model matching or outperforming models 25x larger on benchmarks
- 2Demonstrated that data quality is more important than model size
- 3Used carefully curated synthetic and filtered web data for training
- 4Achieved strong performance on coding, math, and reasoning tasks
- 5Showed that small models can be viable for edge deployment
Impact & Significance
Phi-2 advanced the efficient small model paradigm, proving that careful data curation can produce surprisingly capable small models. It influenced the development of on-device AI and edge deployment strategies.
Related Papers
LLMJuly 23, 2024
The Llama 3 Herd of Models
Meta AI
LLMJuly 15, 2024
Qwen2 Technical Report
Alibaba Cloud / Qwen Team
EfficiencyMay 7, 2024
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
DeepSeek AI
LLMMarch 4, 2024
The Claude 3 Model Family: Opus, Sonnet, and Haiku
Anthropic