What Is Beam Search?
Beam search is a decoding algorithm used in AI text generation that maintains multiple candidate sequences (beams) at each step, expanding the most promising ones to find a higher-probability overall output than greedy decoding would produce.
How Beam Search Works
When generating text, a language model must decide which token to output at each step. Greedy decoding always picks the single most likely token, but this can lead to suboptimal sequences because the locally best choice is not always globally best. Beam search addresses this by keeping track of multiple partial sequences (called beams) simultaneously. At each generation step, it expands all current beams with their top candidate next tokens, scores the resulting sequences, and keeps only the top-B sequences (where B is the beam width). At the end, the highest-scoring complete sequence is selected as the output. A beam width of 1 is equivalent to greedy decoding, while larger beam widths explore more possibilities at the cost of additional computation. Beam search is particularly valuable in machine translation, speech recognition, and summarization where finding the globally optimal sequence matters. However, it tends to produce less diverse and sometimes repetitive outputs compared to sampling-based methods, which is why modern conversational AI systems often prefer top-P or temperature-based sampling for more natural-sounding responses.
Real-World Examples
Google Translate using beam search with a width of 4 to find the most fluent translation of a complex sentence
A speech-to-text system using beam search to consider multiple interpretations of ambiguous audio and select the most coherent transcription
A summarization model using beam search with length normalization to generate a concise, high-quality summary of a long article