EfficiencyDecember 1, 2023Carnegie Mellon / Princeton

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Albert Gu, Tri Dao

Abstract

We introduce Mamba, a new architecture for sequence modeling based on structured state space models (SSMs) with a selection mechanism. Mamba achieves performance comparable to Transformers while scaling linearly with sequence length instead of quadratically. On language modeling, Mamba matches or exceeds Transformers of the same size while being 5x faster at inference.

Key Findings

1Achieved linear-time sequence modeling compared to quadratic Transformer attention
2Matched Transformer performance on language modeling benchmarks
35x faster inference than equivalent-size Transformers
4Introduced selection mechanism for input-dependent state space dynamics
5Demonstrated strong performance on long-sequence tasks without attention

Impact & Significance

Mamba challenged the dominance of Transformer attention for sequence modeling and sparked intense research into SSM-based and hybrid architectures. It demonstrated a viable path to more efficient sequence models at scale.

Related Tools

Hugging Face

Read Full Paper

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Abstract

Key Findings

Impact & Significance

Related Tools

Related Papers

The Llama 3 Herd of Models

Qwen2 Technical Report

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

The Claude 3 Model Family: Opus, Sonnet, and Haiku