Sketch2Sound

A model that generates controllable audio through temporal signal variations and sound imitation.

CommonProductMusicAudio GenerationSound Imitation
Sketch2Sound is a model for generating audio from a set of interpretable temporal control signals (loudness, brightness, pitch) and text prompts, creating high-quality sound. This model can be implemented on any text-to-audio potential diffusion transformer (DiT) and requires only 40k steps of fine-tuning and one separate linear layer for each control, making it more lightweight than existing methods like ControlNet. The main advantages of Sketch2Sound include the ability to synthesize arbitrary sounds from sound imitation, and while maintaining the input text prompts and audio quality, it adheres to the general intent of input control. This enables sound artists to creatively combine the semantic flexibility of text prompts with the expressiveness and precision of sound gestures or sound imitation.
Visit

Sketch2Sound Visit Over Time

Monthly Visits

776

Bounce Rate

39.50%

Page per Visit

1.2

Visit Duration

00:01:37

Sketch2Sound Visit Trend

Sketch2Sound Visit Geography

Sketch2Sound Traffic Sources

Sketch2Sound Alternatives