LLMOctober 11, 2018Google AI Language

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova

Abstract

We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. The pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

Key Findings

  • 1Introduced bidirectional pre-training for language understanding
  • 2Achieved state-of-the-art results on 11 NLP benchmarks simultaneously
  • 3Demonstrated the effectiveness of masked language modeling pre-training
  • 4Showed that fine-tuning a pre-trained model beats task-specific architectures
  • 5Popularized the pre-train then fine-tune paradigm in NLP

Impact & Significance

BERT revolutionized NLP by establishing the pre-training and fine-tuning paradigm. It became the backbone of Google Search and influenced virtually every NLP system built after 2018. BERT models power search, classification, and understanding tasks globally.

Read Full Paper