Sketch2Sound
A model that generates controllable audio through temporal signal variations and sound imitation.
CommonProductMusicAudio GenerationSound Imitation
Sketch2Sound is a model for generating audio from a set of interpretable temporal control signals (loudness, brightness, pitch) and text prompts, creating high-quality sound. This model can be implemented on any text-to-audio potential diffusion transformer (DiT) and requires only 40k steps of fine-tuning and one separate linear layer for each control, making it more lightweight than existing methods like ControlNet. The main advantages of Sketch2Sound include the ability to synthesize arbitrary sounds from sound imitation, and while maintaining the input text prompts and audio quality, it adheres to the general intent of input control. This enables sound artists to creatively combine the semantic flexibility of text prompts with the expressiveness and precision of sound gestures or sound imitation.
Sketch2Sound Visit Over Time
Monthly Visits
776
Bounce Rate
39.50%
Page per Visit
1.2
Visit Duration
00:01:37