LLMOctober 11, 2018Google AI Language

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova

Abstract

We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. The pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

Key Findings

1Introduced bidirectional pre-training for language understanding
2Achieved state-of-the-art results on 11 NLP benchmarks simultaneously
3Demonstrated the effectiveness of masked language modeling pre-training
4Showed that fine-tuning a pre-trained model beats task-specific architectures
5Popularized the pre-train then fine-tune paradigm in NLP

Impact & Significance

BERT revolutionized NLP by establishing the pre-training and fine-tuning paradigm. It became the backbone of Google Search and influenced virtually every NLP system built after 2018. BERT models power search, classification, and understanding tasks globally.

Related Tools

Hugging Face Google Search

Read Full Paper

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Abstract

Key Findings

Impact & Significance

Related Tools

Related Papers

The Llama 3 Herd of Models

Qwen2 Technical Report

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

The Claude 3 Model Family: Opus, Sonnet, and Haiku