VisionDecember 20, 2021LMU Munich / Stability AI

High-Resolution Image Synthesis with Latent Diffusion Models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Bjorn Ommer

Abstract

By decomposing the image formation process into a sequential application of denoising autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results. We apply diffusion models in the latent space of powerful pretrained autoencoders, achieving a near-optimal point between complexity reduction and detail preservation, greatly boosting visual fidelity.

Key Findings

1Introduced Latent Diffusion Models (LDMs) operating in compressed latent space
2Achieved high-quality image synthesis with dramatically reduced computational cost
3Enabled text-to-image generation accessible on consumer hardware
4Demonstrated cross-attention conditioning for flexible guided synthesis
5Achieved state-of-the-art results on image inpainting, super-resolution, and generation

Impact & Significance

This paper is the foundation of Stable Diffusion, which democratized AI image generation by making it possible to run on consumer GPUs. It spawned an entire ecosystem of open-source generative art tools and models.

Related Tools

Stable Diffusion Automatic1111 Comfyui

Read Full Paper

High-Resolution Image Synthesis with Latent Diffusion Models

Abstract

Key Findings

Impact & Significance

Related Tools

Related Papers

The Llama 3 Herd of Models

Qwen2 Technical Report

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

The Claude 3 Model Family: Opus, Sonnet, and Haiku