VisionDecember 20, 2021LMU Munich / Stability AI
High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Bjorn Ommer
Abstract
By decomposing the image formation process into a sequential application of denoising autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results. We apply diffusion models in the latent space of powerful pretrained autoencoders, achieving a near-optimal point between complexity reduction and detail preservation, greatly boosting visual fidelity.
Key Findings
- 1Introduced Latent Diffusion Models (LDMs) operating in compressed latent space
- 2Achieved high-quality image synthesis with dramatically reduced computational cost
- 3Enabled text-to-image generation accessible on consumer hardware
- 4Demonstrated cross-attention conditioning for flexible guided synthesis
- 5Achieved state-of-the-art results on image inpainting, super-resolution, and generation
Impact & Significance
This paper is the foundation of Stable Diffusion, which democratized AI image generation by making it possible to run on consumer GPUs. It spawned an entire ecosystem of open-source generative art tools and models.
Related Tools
Related Papers
LLMJuly 23, 2024
The Llama 3 Herd of Models
Meta AI
LLMJuly 15, 2024
Qwen2 Technical Report
Alibaba Cloud / Qwen Team
EfficiencyMay 7, 2024
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
DeepSeek AI
LLMMarch 4, 2024
The Claude 3 Model Family: Opus, Sonnet, and Haiku
Anthropic