VisionDecember 20, 2021LMU Munich / Stability AI

High-Resolution Image Synthesis with Latent Diffusion Models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Bjorn Ommer

Abstract

By decomposing the image formation process into a sequential application of denoising autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results. We apply diffusion models in the latent space of powerful pretrained autoencoders, achieving a near-optimal point between complexity reduction and detail preservation, greatly boosting visual fidelity.

Key Findings

  • 1Introduced Latent Diffusion Models (LDMs) operating in compressed latent space
  • 2Achieved high-quality image synthesis with dramatically reduced computational cost
  • 3Enabled text-to-image generation accessible on consumer hardware
  • 4Demonstrated cross-attention conditioning for flexible guided synthesis
  • 5Achieved state-of-the-art results on image inpainting, super-resolution, and generation

Impact & Significance

This paper is the foundation of Stable Diffusion, which democratized AI image generation by making it possible to run on consumer GPUs. It spawned an entire ecosystem of open-source generative art tools and models.

Read Full Paper