FunAudioLLM

Foundation model for natural voice interaction understanding and generation

CommonProductOthersSpeech RecognitionSpeech Synthesis
FunAudioLLM is a framework aimed at enhancing natural voice interaction between humans and Large Language Models (LLMs). It comprises two innovative models: SenseVoice, responsible for high-precision multi-lingual speech recognition, emotion recognition, and audio event detection; and CosyVoice, responsible for natural voice generation, supporting multi-lingual, timbre, and emotion control. SenseVoice supports over 50 languages with extremely low latency; CosyVoice excels in multi-lingual voice generation, zero-shot context generation, cross-lingual voice cloning, and instruction following capabilities. Relevant models are open-sourced on Modelscope and Huggingface, and corresponding training, inference, and fine-tuning codes are released on GitHub.
Visit

FunAudioLLM Visit Over Time

Monthly Visits

15168

Bounce Rate

58.74%

Page per Visit

1.4

Visit Duration

00:00:59

FunAudioLLM Visit Trend

FunAudioLLM Visit Geography

FunAudioLLM Traffic Sources

FunAudioLLM Alternatives