Lumiere: A Space-Time Diffusion Model for Video Generation (Paper Explained)

Published: 04 February 2024
on channel: Yannic Kilcher

28,713

711

#lumiere #texttovideoai #google

LUMIERE by Google Research tackles globally consistent text-to-video generation by extending the U-Net downsampling concept to the temporal axis of videos.

OUTLINE:
0:00 - Introduction
8:20 - Problems with keyframes
16:55 - Space-Time U-Net (STUNet)
21:20 - Extending U-Nets to video
37:20 - Multidiffusion for SSR prediction fusing
44:00 - Stylized generation by swapping weights
49:15 - Training & Evaluation
53:20 - Societal Impact & Conclusion

Paper: https://arxiv.org/abs/2401.12945
Website: https://lumiere-video.github.io/

Abstract:
We introduce Lumiere - a text-to-video diffusion model designed for synthesizing videos that portray realistic, diverse and coherent motion -- a pivotal challenge in video synthesis. To this end, we introduce a Space-Time U-Net architecture that generates the entire temporal duration of the video at once, through a single pass in the model. This is in contrast to existing video models which synthesize distant keyframes followed by temporal super-resolution -- an approach that inherently makes global temporal consistency difficult to achieve. By deploying both spatial and (importantly) temporal down and up-sampling and leveraging a pre-trained text-to-image diffusion model, our model learns to directly generate a full-frame-rate, low-resolution video by processing it in multiple space-time scales. We demonstrate state-of-the-art text-to-video generation results, and show that our design easily facilitates a wide range of content creation tasks and video editing applications, including image-to-video, video inpainting, and stylized generation.

Authors: Omer Bar-Tal, Hila Chefer, Omer Tov, Charles Herrmann, Roni Paiss, Shiran Zada, Ariel Ephrat, Junhwa Hur, Yuanzhen Li, Tomer Michaeli, Oliver Wang, Deqing Sun, Tali Dekel, Inbar Mosseri

Links:
Homepage: https://ykilcher.com
Merch: https://ykilcher.com/merch
YouTube:    / yannickilcher
Twitter:   / ykilcher
Discord: https://ykilcher.com/discord
LinkedIn:   / ykilcher

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannick...
Patreon:   / yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Watch video Lumiere: A Space-Time Diffusion Model for Video Generation (Paper Explained) online without registration, duration hours minute second in high quality. This video was added by user Yannic Kilcher 04 February 2024, don't forget to share it with your friends and acquaintances, it has been viewed on our site 28,713 once and liked it 711 people.

141