What Is Encoder-Decoder Architecture?
The encoder-decoder architecture is a neural network design pattern where an encoder processes an input sequence into a compressed internal representation, and a decoder generates an output sequence from that representation, enabling sequence-to-sequence tasks like translation and summarization.
How Encoder-Decoder Architecture Works
The encoder reads the entire input and compresses it into a fixed or variable-length representation that captures its meaning. The decoder then uses this representation to generate the output one step at a time. In the original Transformer paper, both the encoder and decoder used self-attention layers, with the decoder also attending to the encoder's output through cross-attention. Models like T5 and BART use the full encoder-decoder architecture, while models like GPT use decoder-only and BERT uses encoder-only. Each variant is suited to different tasks: encoder-decoder for translation, decoder-only for text generation, and encoder-only for understanding and classification.
Real-World Examples
Google Translate using an encoder-decoder model to encode an English sentence and decode it into French
T5 using its encoder to understand a long document and its decoder to generate a concise summary
A speech-to-text system encoding audio into a representation and decoding it into written text