FunAudioLLM: Alibaba's Open-Source Audio Generation Model Supports Emotional Voice Dialogues and Audiobooks

AIbase

Published inAI News · 3 min read · Jul 8, 2024

699

The AliTongyi Lab recently opened sourced a large-scale audio generation model project called FunAudioLLM, aiming to enhance the natural voice interaction experience between humans and Large Language Models (LLMs). The project consists of two core models: SenseVoice and CosyVoice.

CosyVoice focuses on natural voice generation, featuring multi-language support, voice and emotion control functions, and excels in multi-language voice generation, zero-shot voice generation, cross-language voice synthesis, and command execution. Trained on 150,000 hours of data, it supports Chinese, English, Japanese, Cantonese, and Korean languages, can quickly simulate voice timbres, and provide fine-grained control over emotion and rhythm.

SenseVoice is dedicated to high-precision multi-language speech recognition, emotion recognition, and audio event detection. Trained on 400,000 hours of data, it supports over 50 languages, achieving recognition results superior to the Whisper model, with improvements over 50% in Chinese and Cantonese. SenseVoice also features emotion recognition and sound event detection capabilities, as well as rapid reasoning speed.

WeChat Screenshot_20240708084503.png

FunAudioLLM supports various human-computer interaction scenarios, such as multi-language translation, emotional voice conversations, interactive podcasts, and audiobooks. By combining SenseVoice, LLMs, and CosyVoice, it can achieve seamless voice-to-voice translation, emotional voice chat applications, and interactive podcast radio stations.

In terms of technical principles, CosyVoice is based on voice quantization encoding, supporting natural and smooth voice generation, while SenseVoice provides comprehensive voice processing functions, including automatic speech recognition, language recognition, emotion recognition, and audio event detection.

The open-source models and code have been released on ModelScope and Huggingface, and training, inference, and fine-tuning code is also available on GitHub. Both the CosyVoice and SenseVoice models have online experiences on ModelScope, making it convenient for users to directly try these advanced voice technologies.

Project Address: https://github.com/FunAudioLLM

AI Model Large Language Model Audio Generation Natural Speech Interaction

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Google will release the Gemini 3.0 model in December

Google CEO announces Gemini 3.0 AI at Dreamforce 2025, featuring revolutionary autonomous decision-making to enhance efficiency and user experience in daily life and business.....

Oct 20, 2025

310

Bubble Launches Its First AI Agent to Revolutionize Visual Development Experience

Bubble launches AI Agent beta, merging AI generation with visual development. Users can quickly build web and mobile apps via drag-and-drop or natural language, balancing efficiency and precise control.....

Oct 20, 2025

120

American Woman Wins $100,000 Lottery Prize by Using AI to Choose Numbers!

American woman Tammy Kave used numbers recommended by ChatGPT to buy Powerball lottery tickets and luckily won $100,000 (about 720,000 Chinese yuan). She matched four white ball numbers and one Powerball number. She usually only buys lottery tickets when the jackpot exceeds $1 billion, but this time she bought them out of curiosity.

Oct 20, 2025

110

Google Plans to Release New AI Model Gemini 3.0 in December, Performance to Improve Significantly

Google plans to release the Gemini 3.0 AI model in December this year, following the tradition of releasing new models at the end of the year. This version is expected to bring significant performance improvements, enhancing Google's competitiveness in the AI field and being seen as a key turning point in competing with models like GPT-4.

Oct 20, 2025

100

AI Daily: Visual China has reached cooperation with multiple large model companies; OpenAI urgently suspended Sora from generating deceased celebrities; Google launches Gemini map data integration tool

Visual China partners with AI firms to develop a commercial visual model, securing orders from Alibaba and Microsoft. It targets the creative industry with traceability features, helping businesses understand AI trends.....

Oct 20, 2025

120

Visual China Collaborates with Multiple AI Companies to Develop Commercially Available Visual Large Models: Has Received Orders from Alibaba, Microsoft, and Others

Visual China disclosed the progress of its AI business during an online meeting, stating that it has collaborated with multiple AIGC companies to develop a "commercially available and traceable" visual creative large model, and has received compliant data service orders from Alibaba, Microsoft, and others. The company positions itself as providing high-quality, copyright-compliant data resources for AI model training, and possesses over 700 million content data entries for visual training.

Oct 20, 2025

150

Jingyuan Technology's Zhang Lei Proposes the Concept of Physical Artificial Intelligence: Predicting the Future Energy System Will Be Centered Around Smart Assets

Jingyuan Technology Group's Zhang Lei proposed the concept of physical artificial intelligence, indicating that AI is transitioning from a tool in energy systems to a decision-making entity. The competitiveness of future energy companies will depend on smart assets rather than the scale of physical assets. He emphasized that AI has self-perception and decision-making capabilities, which is the essential difference of the technological revolution.

Oct 20, 2025

Google Launches Gemini Map Data Integration Tool: AI Can Access Real-Time Information of 250 Million Locations

Google launches a new tool called Grounding with Google Maps for the Gemini API, integrating AI with map data deeply. This feature provides access to over 250 million location details, including addresses and operating hours, to generate geospatial answers based on real data. When users ask questions related to locations, Gemini can automatically call real-time map data to respond.

Oct 20, 2025

100

Facebook Launches AI Photo Editing Suggestions Feature in the US and Canada: Access to Unshared Photos in Camera Roll

Meta announced that the AI photo editing suggestions feature on Facebook is now fully available in the United States and Canada. The feature can access users' unshared photos in their camera roll and provide editing suggestions, encouraging users to post AI-optimized images to their feed or Stories. It was tested this summer, and users received a request to allow cloud processing permissions when opening the app, in order to enable personalized creative recommendations.

Oct 20, 2025

170

GEO Practical Guide: How Is Content Referenced by AI Search?

Explores Generative Engine Optimization (GEO), its importance, differences from traditional SEO, and practical tips for content creators adapting to AI-driven search technologies.....

Oct 20, 2025

150

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

FunAudioLLM: Alibaba's Open-Source Audio Generation Model Supports Emotional Voice Dialogues and Audiobooks

AIbase

This article is from AIbase Daily

AI News Recommendations

Google will release the Gemini 3.0 model in December

Bubble Launches Its First AI Agent to Revolutionize Visual Development Experience

American Woman Wins $100,000 Lottery Prize by Using AI to Choose Numbers!

Google Plans to Release New AI Model Gemini 3.0 in December, Performance to Improve Significantly

AI Daily: Visual China has reached cooperation with multiple large model companies; OpenAI urgently suspended Sora from generating deceased celebrities; Google launches Gemini map data integration tool

Visual China Collaborates with Multiple AI Companies to Develop Commercially Available Visual Large Models: Has Received Orders from Alibaba, Microsoft, and Others

Jingyuan Technology's Zhang Lei Proposes the Concept of Physical Artificial Intelligence: Predicting the Future Energy System Will Be Centered Around Smart Assets

Google Launches Gemini Map Data Integration Tool: AI Can Access Real-Time Information of 250 Million Locations

Facebook Launches AI Photo Editing Suggestions Feature in the US and Canada: Access to Unshared Photos in Camera Roll

GEO Practical Guide: How Is Content Referenced by AI Search?

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services​

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

FunAudioLLM: Alibaba's Open-Source Audio Generation Model Supports Emotional Voice Dialogues and Audiobooks

AIbase

This article is from AIbase Daily

AI News Recommendations

Google will release the Gemini 3.0 model in December

Bubble Launches Its First AI Agent to Revolutionize Visual Development Experience

American Woman Wins $100,000 Lottery Prize by Using AI to Choose Numbers!

Google Plans to Release New AI Model Gemini 3.0 in December, Performance to Improve Significantly

AI Daily: Visual China has reached cooperation with multiple large model companies; OpenAI urgently suspended Sora from generating deceased celebrities; Google launches Gemini map data integration tool

Visual China Collaborates with Multiple AI Companies to Develop Commercially Available Visual Large Models: Has Received Orders from Alibaba, Microsoft, and Others

Jingyuan Technology's Zhang Lei Proposes the Concept of Physical Artificial Intelligence: Predicting the Future Energy System Will Be Centered Around Smart Assets

Google Launches Gemini Map Data Integration Tool: AI Can Access Real-Time Information of 250 Million Locations

Facebook Launches AI Photo Editing Suggestions Feature in the US and Canada: Access to Unshared Photos in Camera Roll

GEO Practical Guide: How Is Content Referenced by AI Search?

GEO Services