AudioSep
AudioSep is an open-domain audio source separation model based on natural language queries. It consists of two key components: a text encoder and a separation model. We trained AudioSep on a large-scale multimodal dataset and extensively evaluated its capabilities on many tasks, including audio event separation, instrument separation, and voice enhancement. AudioSep demonstrates strong separation performance and impressive zero-shot generalization ability, significantly outperforming previous audio query and language query sound separation models when using audio titles or text labels as queries.
AudioSep Visit Over Time
Monthly Visits
21961423
Bounce Rate
43.58%
Page per Visit
3.2
Visit Duration
00:04:12