What Is Semi-Supervised Learning?
Semi-supervised learning is a machine learning approach that trains models using a small set of labeled data combined with a large set of unlabeled data, leveraging the structure of the unlabeled data to improve performance beyond what labeled data alone could achieve.
How Semi-Supervised Learning Works
Labeling data is often the most expensive and time-consuming part of building AI systems. Semi-supervised learning addresses this by using a small amount of labeled data to get the model started, then using the model's own predictions on unlabeled data (pseudo-labels) to expand the training set. The model learns from both the labeled examples and the patterns it discovers in the unlabeled data. This approach is especially valuable in domains like medical imaging, where expert labels are expensive, but raw data is abundant. Semi-supervised methods can achieve near-supervised-level performance with a fraction of the labeled data.
Real-World Examples
A medical imaging system trained on 100 labeled X-rays and 10,000 unlabeled X-rays achieving accuracy close to a model trained on 10,000 labeled examples
A text classification system using 500 labeled reviews and millions of unlabeled reviews to build an accurate sentiment analyzer
A speech recognition model improving its accuracy by learning from both transcribed and untranscribed audio