Adobe Research, in collaboration with Northwestern University, has developed a groundbreaking AI system called Sketch2Sound. This technology can transform simple vocal imitations and text descriptions into professional-grade sound effects, potentially revolutionizing the way sound design is done in the industry.
The system analyzes three key elements of the voice input: loudness, timbre (which determines the brightness of the sound), and pitch. It then combines these features with text descriptions to generate the desired sound.
Video: García et al., Adobe Research
What makes Sketch2Sound interesting is its ability to understand context. For example, if someone inputs "forest ambiance" and makes a short sound, the system automatically identifies that these sounds should be bird calls - without needing specific instructions.
The same intelligence applies to music. When creating a drum pattern, users can input "bass drum, snare drum" and then hum the rhythm with low and high tones. The system will automatically place the bass drum in the low range and the snare drum in the high range.
Providing Fine Control for Professionals
The research team has built in special filtering techniques that allow users to adjust the precision of the generated sounds. Sound designers can choose between precise, detailed control or a more relaxed, approximate method according to their needs.
This flexibility makes Sketch2Sound particularly valuable for sound designers (professionals who create sound effects for movies and television shows). They can create effects more quickly using voice and text descriptions instead of manipulating physical objects to produce sounds.
Researchers noted that the spatial audio characteristics of input recordings can sometimes affect the generated sounds in undesirable ways, but they are working to address this issue. Adobe has not yet announced when or if Sketch2Sound will become a commercial product.