Motion-I2V
A controllable image-to-video generation framework
CommonProductImageImage GenerationVideo Generation
Motion-I2V is a novel framework for achieving consistent and controllable image-to-video (I2V) generation. Unlike previous methods that directly learn complex image-to-video mappings, Motion-I2V decomposes I2V into two stages and adopts explicit motion modeling. In the first stage, we propose a diffusion-based motion field predictor that focuses on inferring trajectories of reference image pixels. In the second stage, we propose enhanced motion-enhanced temporal attention to augment the limited one-dimensional temporal attention in the video potential diffusion model. This module effectively propagates reference image features to synthesized frames guided by the trajectories predicted in the first stage. Compared to existing methods, Motion-I2V can generate more consistent videos even in the presence of large motions and viewpoint changes. By training a sparse trajectory control network for the first stage, Motion-I2V enables users to precisely control motion trajectories and motion regions, offering control with sparse trajectory and region annotations, which is more controllable than relying solely on text descriptions. Furthermore, the second stage of Motion-I2V naturally supports zero-shot video-to-video conversion. Qualitative and quantitative comparisons demonstrate that Motion-I2V outperforms prior methods in terms of consistent and controllable image-to-video generation.
Motion-I2V Visit Over Time
Monthly Visits
145
Bounce Rate
47.34%
Page per Visit
1.0
Visit Duration
00:00:00