OtherJanuary 6, 2022OpenAI

Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets

Alethea Power, Yuri Burda, Harri Edwards, Igor Babuschkin, Vedant Misra

Abstract

We show that neural networks can learn to generalize on algorithmic tasks long after memorizing the training data, a phenomenon we call grokking. In some cases, networks achieve perfect generalization thousands of training steps after reaching perfect training accuracy. This challenges conventional wisdom about the relationship between memorization and generalization.

Key Findings

  • 1Discovered that generalization can occur long after memorization (grokking)
  • 2Showed networks achieving 100% test accuracy thousands of steps after 100% train accuracy
  • 3Demonstrated this on modular arithmetic, permutation groups, and other tasks
  • 4Challenged assumptions about when to stop training neural networks
  • 5Opened new research directions in understanding deep learning generalization

Impact & Significance

Grokking revealed fundamental insights about how neural networks learn and generalize, challenging conventional training practices. It has influenced research on training dynamics and when models truly learn vs. memorize.

Read Full Paper