Recently, the Adobe research team collaborated with researchers from the University of Michigan to develop an artificial intelligence system called MultiFoley, which can generate sound effects for voiceovers in movies and videos, aiding in post-production.
The innovation of MultiFoley lies in its ability to allow users to create sound effects through text prompts, reference audio, or video examples. In demonstrations, the system can even transform a cat's meow into a lion's roar or convert the sound of a typewriter into piano notes, perfectly syncing with the video visuals.
The audio output quality of MultiFoley reaches a high bandwidth of 48kHz, primarily due to the researchers training the system using videos and professional sound effect libraries available online. Unlike previous systems, MultiFoley integrates multiple input methods—text, audio, and video references—into a single model for the first time. It analyzes visual features at 8 frames per second and scales them to match a 40Hz audio sampling rate, ensuring that the generated audio remains tightly synchronized with the video.
In testing, MultiFoley excelled in synchronizing audio with video and matching sound effects to text descriptions, achieving an average synchronization accuracy of 0.8 seconds, significantly better than the typical delay of over one second found in traditional systems. User studies showed that 85.8% of participants believed MultiFoley outperformed the runner-up in semantic consistency, while 94.5% preferred its synchronization results.
Although MultiFoley demonstrates strong potential, the research team also pointed out some current limitations, such as a relatively small training dataset, which restricts the variety of sound effects it can produce. Additionally, the system faces challenges in generating multiple simultaneous sound effects. The research team plans to release the source code and model soon.
While Adobe has not yet announced plans to integrate MultiFoley into its products, this technology aligns well with the existing AI features in Adobe Premiere Pro video editing software, promising to bring convenience to individual creators and production companies in the sound design process.
Key Points:
🎬 MultiFoley is an AI sound effect generation system developed by Adobe in collaboration with the University of Michigan, capable of generating sound effects through various input methods.
🔊 The system achieves an audio output quality of 48kHz with an average synchronization accuracy of 0.8 seconds, surpassing traditional sound effect systems.
📈 User research indicates that MultiFoley received high ratings for both semantic consistency and synchronization of sound effects.