Recently, Adobe Research collaborated with Northwestern University to develop an artificial intelligence system called Sketch2Sound, a tool that is expected to revolutionize the way sound designers work. Sketch2Sound allows users to create professional sound effects and ambient sounds through humming, mimicking sounds, and simple text descriptions.
This system analyzes three key elements of the user's vocal input: volume, timbre (which determines whether a sound is bright or dark), and pitch. It then combines these features with the user's text descriptions to generate the desired sounds. For example, when a user inputs "forest ambiance" and produces a short sound, the system automatically recognizes these sounds as birds chirping without needing specific instructions.
Another highlight of Sketch2Sound is its ability to understand context. When creating music, users can input "bass drum, snare drum" and hum a rhythm. The system intelligently places the bass drum on low notes and the snare drum on high notes. This intelligent processing greatly simplifies the sound design process.
To meet the needs of professionals, the research team has also integrated special filtering technology that allows users to adjust the precision of the generated sounds as needed. Sound designers can choose very precise control or a more relaxed, approximate approach. This flexibility may make Sketch2Sound particularly popular among Foley artists, who are responsible for creating sound effects for films and TV shows. With this tool, they can quickly create effects using sound and text descriptions without needing to manipulate physical objects to produce sounds.
Although researchers noted that the spatial audio characteristics in recorded inputs can sometimes negatively affect the generated sounds, they are working to address this issue. Currently, Adobe has not announced whether Sketch2Sound will be released as a commercial product or a specific launch date.
Project link: https://hugofloresgarcia.art/sketch2sound/
Key points:
🎵 Sketch2Sound is a newly developed AI tool that creates sound effects through humming and text descriptions.
🔊 The system analyzes volume, timbre, and pitch, combining the user's vocal input with text to generate target sound effects.
🎬 Especially suitable for Foley artists, it can quickly generate film and TV sound effects, enhancing work efficiency.