Question 1

What is Knowledge Distillation?

Accepted Answer

Knowledge distillation is a model compression technique where a smaller 'student' model is trained to replicate the behavior of a larger 'teacher' model, producing a compact model that retains much of the teacher's performance at a fraction of the computational cost.

Question 2

How does Knowledge Distillation work?

Accepted Answer

Large AI models are powerful but expensive to run. Knowledge distillation addresses this by training a smaller model to mimic a larger one's outputs, including its probability distributions over answers (soft labels), not just its final predictions. The student model learns from the teacher's 'dark knowledge' — the nuanced relationships between possible answers that hard labels don't capture. This produces a model that is significantly smaller and faster while maintaining 90-99% of the teacher's performance. Distillation is widely used to create models suitable for mobile devices, edge deployment, and real-time applications.

Question 3

What are examples of Knowledge Distillation?

Accepted Answer

OpenAI distilling GPT-4's knowledge into GPT-4o mini to create a smaller, faster, and cheaper model Google creating DistilBERT, which is 60% smaller and 60% faster than BERT while retaining 97% of its performance A company distilling a large vision model into a lightweight version that runs on smartphones for real-time object detection

What Is Knowledge Distillation?

How Knowledge Distillation Works

Real-World Examples

Recommended Tools

Related Terms