Lumiere

A video generation spatio-temporal diffusion model

CommonProductVideoVideo SynthesisText-to-Video
Lumiere is a text-to-video diffusion model designed to synthesize videos that exhibit realistic, diverse, and coherent motion, addressing key challenges in video synthesis. We introduce a spatio-temporal U-Net architecture that enables the generation of an entire video's temporal duration in a single model pass. This contrasts with existing video models, which synthesize distant keyframes and then perform temporal super-resolution, a method that intrinsically makes global temporal consistency difficult to achieve. By deploying spatial and, importantly, temporal downsampling and upsampling, and leveraging a pre-trained text-to-image diffusion model, our model learns to directly generate full-frame rate, low-resolution videos at multiple spatio-temporal scales. We demonstrate state-of-the-art results in text-to-video generation and showcase that our design readily facilitates a variety of content creation tasks and video editing applications, including image-to-video, video repair, and style generation.
Visit

Lumiere Visit Over Time

Monthly Visits

20899836

Bounce Rate

46.04%

Page per Visit

5.2

Visit Duration

00:04:57

Lumiere Visit Trend

Lumiere Visit Geography

Lumiere Traffic Sources

Lumiere Alternatives