FunAudioLLM
Foundation model for natural voice interaction understanding and generation
CommonProductOthersSpeech RecognitionSpeech Synthesis
FunAudioLLM is a framework aimed at enhancing natural voice interaction between humans and Large Language Models (LLMs). It comprises two innovative models: SenseVoice, responsible for high-precision multi-lingual speech recognition, emotion recognition, and audio event detection; and CosyVoice, responsible for natural voice generation, supporting multi-lingual, timbre, and emotion control. SenseVoice supports over 50 languages with extremely low latency; CosyVoice excels in multi-lingual voice generation, zero-shot context generation, cross-lingual voice cloning, and instruction following capabilities. Relevant models are open-sourced on Modelscope and Huggingface, and corresponding training, inference, and fine-tuning codes are released on GitHub.
FunAudioLLM Visit Over Time
Monthly Visits
15168
Bounce Rate
58.74%
Page per Visit
1.4
Visit Duration
00:00:59