EfficiencyDecember 1, 2023Carnegie Mellon / Princeton

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Albert Gu, Tri Dao

Abstract

We introduce Mamba, a new architecture for sequence modeling based on structured state space models (SSMs) with a selection mechanism. Mamba achieves performance comparable to Transformers while scaling linearly with sequence length instead of quadratically. On language modeling, Mamba matches or exceeds Transformers of the same size while being 5x faster at inference.

Key Findings

  • 1Achieved linear-time sequence modeling compared to quadratic Transformer attention
  • 2Matched Transformer performance on language modeling benchmarks
  • 35x faster inference than equivalent-size Transformers
  • 4Introduced selection mechanism for input-dependent state space dynamics
  • 5Demonstrated strong performance on long-sequence tasks without attention

Impact & Significance

Mamba challenged the dominance of Transformer attention for sequence modeling and sparked intense research into SSM-based and hybrid architectures. It demonstrated a viable path to more efficient sequence models at scale.

Related Tools

Read Full Paper