The Alibaba Tongyi Lab's voice team has announced that its open-source voice generation model, CosyVoice, has been upgraded to version 2.0. This upgrade marks significant advancements in voice generation technology regarding accuracy, stability, and natural experience. CosyVoice 2.0 utilizes an integrated modeling technique for offline and streaming voice generation, achieving bidirectional streaming voice synthesis, with an initial synthesis delay of as low as 150ms, greatly enhancing the responsiveness of voice synthesis.
In terms of pronunciation accuracy, CosyVoice 2.0 has reduced the error rate by 30% to 50% compared to the previous version, achieving the lowest character error rate on the hard test set of the Seed-TTS test set, particularly excelling in synthesizing tongue twisters, homophones, and rare characters. Furthermore, version 2.0 maintains tonal consistency in zero-shot voice generation and cross-language voice synthesis, showing a marked improvement in cross-language capabilities compared to version 1.0.
CosyVoice 2.0 has also enhanced the prosody, sound quality, and emotional matching of synthesized audio, with the Mean Opinion Score (MOS) rating rising from 5.4 to 5.53, approaching the score of a certain commercial voice synthesis model. Additionally, version 2.0 supports more granular emotional control and dialect accent control, offering users a richer selection of languages, including Cantonese, Sichuanese, Zhengzhou dialect, Tianjin dialect, and Changsha dialect, as well as role-playing features, such as mimicking robots and speaking in the style of Peppa Pig.
The upgrade of CosyVoice 2.0 not only enhances the technology and experience of voice synthesis but also further promotes the development of the open-source community, encouraging more developers to engage in the innovation and application of voice processing technology.
GitHub Repository: CosyVoice (https://github.com/FunAudioLLM/CosyVoice) for the latest updates on CosyVoice 2
Online Experience DEMO: https://www.modelscope.cn/studios/iic/CosyVoice2-0.5B
Open Source Code: https://github.com/FunAudioLLM/CosyVoice
Open Source Model: https://www.modelscope.cn/models/iic/CosyVoice2-0.5B