AudioSep

AudioSep is an open-domain audio source separation model based on natural language queries. It consists of two key components: a text encoder and a separation model. We trained AudioSep on a large-scale multimodal dataset and extensively evaluated its capabilities on many tasks, including audio event separation, instrument separation, and voice enhancement. AudioSep demonstrates strong separation performance and impressive zero-shot generalization ability, significantly outperforming previous audio query and language query sound separation models when using audio titles or text labels as queries.

CommonProductMusicAudio SeparationNatural Language Queries
AudioSep is an open-domain audio source separation model based on natural language queries. It consists of two key components: a text encoder and a separation model. We trained AudioSep on a large-scale multimodal dataset and extensively evaluated its capabilities on many tasks, including audio event separation, instrument separation, and voice enhancement. AudioSep demonstrates strong separation performance and impressive zero-shot generalization ability, significantly outperforming previous audio query and language query sound separation models when using audio titles or text labels as queries. To ensure the reproducibility of this work, we will release the source code, evaluation benchmark, and pre-trained models.
Visit

AudioSep Visit Over Time

Monthly Visits

19076186

Bounce Rate

44.54%

Page per Visit

3.2

Visit Duration

00:04:19

AudioSep Visit Trend

AudioSep Visit Geography

AudioSep Traffic Sources

AudioSep Alternatives