Question 1

What is Perplexity (Metric)?

Accepted Answer

Perplexity is a language model evaluation metric that measures how well the model predicts a given sequence of text — lower perplexity indicates the model assigns higher probability to the actual text, meaning it is less 'surprised' and better at modeling the language.

Question 2

How does Perplexity (Metric) work?

Accepted Answer

Perplexity can be intuitively understood as the effective number of equally likely next-token choices the model considers at each step. A perplexity of 10 means the model is, on average, as uncertain as if it were choosing between 10 equally likely tokens. Lower perplexity means the model makes more confident and accurate predictions. Perplexity is computed as the exponential of the average cross-entropy loss over the test data. While perplexity is a useful intrinsic measure of language modeling quality, it does not directly measure a model's usefulness for downstream tasks — a model with lower perplexity is not necessarily better at following instructions or reasoning. It is most useful for comparing language models trained on similar data.

Question 3

What are examples of Perplexity (Metric)?

Accepted Answer

GPT-2 achieving a perplexity of 18.34 on the WikiText test set, indicating strong predictive performance on Wikipedia text A researcher comparing two model checkpoints and selecting the one with lower perplexity for further fine-tuning A language model pre-trained on English text showing very high perplexity on Japanese text, indicating poor Japanese modeling

What Is Perplexity (Metric)?

How Perplexity (Metric) Works

Real-World Examples

Recommended Tools

Related Terms