FunAudioLLM
Foundation model for natural voice interaction understanding and generation
CommonProductOthersSpeech RecognitionSpeech Synthesis
FunAudioLLM is a framework aimed at enhancing natural voice interaction between humans and Large Language Models (LLMs). It comprises two innovative models: SenseVoice, responsible for high-precision multi-lingual speech recognition, emotion recognition, and audio event detection; and CosyVoice, responsible for natural voice generation, supporting multi-lingual, timbre, and emotion control. SenseVoice supports over 50 languages with extremely low latency; CosyVoice excels in multi-lingual voice generation, zero-shot context generation, cross-lingual voice cloning, and instruction following capabilities. Relevant models are open-sourced on Modelscope and Huggingface, and corresponding training, inference, and fine-tuning codes are released on GitHub.
FunAudioLLM Visit Over Time
Monthly Visits
17142
Bounce Rate
61.20%
Page per Visit
1.4
Visit Duration
00:00:50