2 Amazing Ideas in Latent Diffusion Models LDM w/ VAE, U-Net & CLIP: Generative AI

Published: 23 October 2022
on channel: Discover AI

1,787

New Latent Diffusion Models, LDM by Rombach & Blattmann, 2022, run the diffusion process in latent space instead of pixel space, making training cost lower and inference speed faster. Insights from a theoretical physicist applying Markov chains, UNet data augmentation theory. Keywords: stable ai art, generative AI.

LDM loosely decomposes the perceptual compression and semantic compression with generative modeling learning by first trimming off pixel-level redundancy with auto-encoder and then manipulate/generate semantic concepts with diffusion process on learned latent. Architecture wise Diffusion Models consists of Variational Autoencoders, a U-Net and CLIP Text Encoder (or BERT) for Generative AI.

Remember: Stable Diffusion is a text-to-image latent diffusion model created by the researchers and engineers from CompVis, Stability AI and LAION.

The key difference between standard diffusion and latent diffusion models: in latent diffusion the model is trained to generate latent (compressed) representations of the images.

There are three main components in latent diffusion models:

1. Variational AutoEncoder (VAE).
2. A U-Net Data Augmentation (2015).
3. A text-encoder, e.g. CLIP's Text Encoder.

Explained:
CompVis - Machine Vision and Learning LMU Munich
Machine Vision and Learning research group at Ludwig Maximilian University of Munich (formerly Computer Vision Group at Heidelberg University)

Noticeable links:

High-Resolution Image Synthesis with Latent Diffusion Models
https://arxiv.org/pdf/2112.10752.pdf

U-Net: Convolutional Networks for Biomedical Image Segmentation
https://arxiv.org/pdf/1505.04597.pdf

https://lilianweng.github.io/posts/20...
https://deepsense.ai/the-recent-rise-...

00:00 Latent Diffusion Model explained
00:37 Nonequilibrium Thermodynamics 2015
02:32 Generative Markov Chains
05:10 UNet Data Augmentation 2015
06:39 UNet Architecture
08:12 LDM 2022 pretrained Autoencoders w/ cross-attention layers
10:18 Schema of LDM - Latent Diffusion Model
13:07 Summary 5 Videos

#text-to-image
#stablediffusion
#ai
#generativeai

Watch video 2 Amazing Ideas in Latent Diffusion Models LDM w/ VAE, U-Net & CLIP: Generative AI online without registration, duration hours minute second in high quality. This video was added by user Discover AI 23 October 2022, don't forget to share it with your friends and acquaintances, it has been viewed on our site 1,787 once and liked it 54 people.

6,221