Motion-I2V

A controllable image-to-video generation framework

CommonProductImageImage GenerationVideo Generation
Motion-I2V is a novel framework for achieving consistent and controllable image-to-video (I2V) generation. Unlike previous methods that directly learn complex image-to-video mappings, Motion-I2V decomposes I2V into two stages and adopts explicit motion modeling. In the first stage, we propose a diffusion-based motion field predictor that focuses on inferring trajectories of reference image pixels. In the second stage, we propose enhanced motion-enhanced temporal attention to augment the limited one-dimensional temporal attention in the video potential diffusion model. This module effectively propagates reference image features to synthesized frames guided by the trajectories predicted in the first stage. Compared to existing methods, Motion-I2V can generate more consistent videos even in the presence of large motions and viewpoint changes. By training a sparse trajectory control network for the first stage, Motion-I2V enables users to precisely control motion trajectories and motion regions, offering control with sparse trajectory and region annotations, which is more controllable than relying solely on text descriptions. Furthermore, the second stage of Motion-I2V naturally supports zero-shot video-to-video conversion. Qualitative and quantitative comparisons demonstrate that Motion-I2V outperforms prior methods in terms of consistent and controllable image-to-video generation.
Visit

Motion-I2V Visit Over Time

Monthly Visits

880

Bounce Rate

41.24%

Page per Visit

1.0

Visit Duration

00:00:00

Motion-I2V Visit Trend

Motion-I2V Visit Geography

Motion-I2V Traffic Sources

Motion-I2V Alternatives