Recently, a research team from the University of Illinois at Urbana-Champaign, Sony AI, and Sony Group introduced a new technology called MMAudio, which aims to achieve high-quality video-to-audio synthesis through multimodal joint training.
The core innovation of MMAudio lies in its ability to generate synchronized audio using video and text inputs, thereby expanding the application scenarios for audio generation. It supports inputting either video or text to produce sound effects that align with the video content.
The design of MMAudio allows it to be trained on various audiovisual and audio-text datasets. This multimodal joint training method not only enhances the quality of synthesized audio but also ensures synchronization between the generated audio and video frames. The introduction of this synchronization module significantly improves the accuracy of audio generation, ensuring consistency between audio and video content.
Currently, the MMAudio codebase is still under development. Researchers have stated that the single example inference function is already operational, while the training code will be released in future versions. To facilitate user access, this technology has been tested on the Ubuntu operating system and relevant installation guides are provided. Users need to prepare Python 3.9 or higher, along with appropriate versions of PyTorch and ffmpeg, and can then install MMAudio with a simple command.
There are still some limitations in MMAudio's audio generation, such as occasionally producing unclear speech or background music, and it struggles with certain unfamiliar concepts. The research team believes that increasing the quality of training data can help address these issues. As research continues, MMAudio is expected to further optimize its performance in the future.
Try it out: https://huggingface.co/spaces/hkchengrex/MMAudio
Code: https://github.com/hkchengrex/MMAudio
Key Points:
🌟 MMAudio technology achieves high-quality synthesis of video and audio through multimodal joint training.
📦 Users can easily install MMAudio on Ubuntu for audio generation.
⚠️ The current version has some limitations, but the research team is working to improve performance by increasing training data.