EfficiencyDecember 12, 2023Microsoft Research

Phi-2: The Surprising Power of Small Language Models

Microsoft Research

Abstract

We present Phi-2, a 2.7 billion parameter language model that demonstrates outstanding reasoning and language understanding capabilities, matching or outperforming models up to 25x larger. Phi-2 is trained on carefully curated synthetic and web data, showing that data quality can compensate for model size in achieving strong performance.

Key Findings

  • 1A 2.7B model matching or outperforming models 25x larger on benchmarks
  • 2Demonstrated that data quality is more important than model size
  • 3Used carefully curated synthetic and filtered web data for training
  • 4Achieved strong performance on coding, math, and reasoning tasks
  • 5Showed that small models can be viable for edge deployment

Impact & Significance

Phi-2 advanced the efficient small model paradigm, proving that careful data curation can produce surprisingly capable small models. It influenced the development of on-device AI and edge deployment strategies.

Related Tools

Read Full Paper