MuVi
A video-to-music generation framework that achieves semantic alignment and rhythmic synchronization of audio and visual content.
CommonProductMusicVideo to MusicSemantic Alignment
MuVi is an innovative framework that analyzes video content to extract contextually and temporally relevant features, generating music that aligns with the mood, theme, rhythm, and tempo of the video. This framework implements a comparative music-visual pre-training scheme to ensure the periodic synchronization of musical phrases, and showcases the capabilities of a flow-matching-based music generator with contextual learning, allowing for control over the style and type of generated music. MuVi demonstrates superior performance in audio quality and temporal synchronization, providing new solutions for the integration of audio and video content and enhancing immersive experiences.