MCVD

A general-purpose model for video generation, prediction, and interpolation

CommonProductVideoVideo GenerationVideo Prediction
MCVD is a general-purpose model for video generation, prediction, and interpolation. It utilizes a score-based diffusion loss function to generate novel frames by injecting Gaussian noise into the current frame and conditioning on past and/or future frames for denoising. Training involves randomly masking past and/or future frames to achieve four capabilities: unconditional generation, future prediction, past reconstruction, and interpolation. The model employs a 2D convolutional U-Net architecture that conditions on past and future frames using concatenated or spatiotemporal adaptive normalization, resulting in high-quality and diverse video samples. Trained on 1-4 GPUs, it can be scaled to more channels. MCVD, a simple non-recursive 2D convolutional architecture, generates videos of arbitrary lengths and achieves SOTA results.
Visit

MCVD Visit Over Time

Monthly Visits

386

Bounce Rate

41.25%

Page per Visit

1.0

Visit Duration

00:00:00

MCVD Visit Trend

MCVD Visit Geography

MCVD Traffic Sources

MCVD Alternatives