MCVD
A general-purpose model for video generation, prediction, and interpolation
CommonProductVideoVideo GenerationVideo Prediction
MCVD is a general-purpose model for video generation, prediction, and interpolation. It utilizes a score-based diffusion loss function to generate novel frames by injecting Gaussian noise into the current frame and conditioning on past and/or future frames for denoising. Training involves randomly masking past and/or future frames to achieve four capabilities: unconditional generation, future prediction, past reconstruction, and interpolation. The model employs a 2D convolutional U-Net architecture that conditions on past and future frames using concatenated or spatiotemporal adaptive normalization, resulting in high-quality and diverse video samples. Trained on 1-4 GPUs, it can be scaled to more channels. MCVD, a simple non-recursive 2D convolutional architecture, generates videos of arbitrary lengths and achieves SOTA results.
MCVD Visit Over Time
Monthly Visits
386
Bounce Rate
41.25%
Page per Visit
1.0
Visit Duration
00:00:00