Alibaba Damo Academy's Tongyi Laboratory recently announced the open-source release of a voice processing technology called ClearerVoice-Studio, aimed at enhancing voice quality and intelligibility. With the widespread application of voice technology, the quality of voice has garnered increasing attention, especially in situations involving background noise, reverberation, and device pickup, leading to a growing demand for voice processing technology.
ClearerVoice-Studio integrates features such as voice enhancement, voice separation, and audio-video speaker extraction, significantly improving voice noise reduction and separation performance by combining complex domain deep learning algorithms. This technology maximally eliminates background noise while preserving voice clarity and minimizing voice distortion.
The core models and algorithms of ClearerVoice-Studio include the FRCRN model, which achieved second place overall in the 2022 IEEE/INTER Speech DNS Challenge, and the MossFormer series models, which excel in voice separation tasks. The 48kHz voice enhancement model based on MossFormer2 effectively suppresses noise while significantly reducing voice distortion.
Alibaba's Tongyi Laboratory aims to provide developers, researchers, and enterprises with powerful voice processing tools through the ClearerVoice-Studio platform to facilitate innovative applications. Users can experience an online demo by preparing a voice file containing noise, uploading it to the designated page, and processing it with a single click to listen to or download the results, immediately obtaining clear audio quality and excellent noise reduction effects.
GitHub Repository: https://github.com/modelscope/ClearerVoice-Studio
Online Demo Experience: https://huggingface.co/spaces/alibabasglab/ClearVoice