AI News

Don't miss any moment of global AI innovation

AI Daily

Daily three-minute AI industry trends

AI Timeline

AI industry milestones

Al Hardware

Lists all AI hardware products.

AI Monetization Guide

Latest Cases

AI monetization case sharing

Image Collection

AI image creation monetization cases

Video Collection

AI video creation monetization cases

Audio Collection

AI audio creation monetization cases

Content Collection

AI content writing monetization cases

AI Tutorials

Latest Tutorials

Free sharing of the latest AI tutorials

AI Product Rankings

AI Product Ranking

Shows total visits ranking of AI websites

AI Traffic Growth Ranking

Track fastest growing AI websites by traffic

AI Traffic Decline Ranking

Focus on AI websites with significant traffic drops

AI Weekly Ranking

Shows weekly visits ranking of AI websites

Popular Country Rankings

United States

AI websites most popular with US users

China

AI websites most popular with Chinese users

India

AI websites most popular with Indian users

Brazil

AI websites most popular with Brazilian users

Popular Category Rankings

Image Generation

Total visits ranking of AI image generation websites

Personal Assistant

Total visits ranking of AI personal assistant websites

Character Generation

Total visits ranking of AI character generation websites

Video Generation

Total visits ranking of AI video generation websites

Popular Open Source Data Rankings

AI Project Ranking

GitHub popular AI projects by total stars

AI Project Growth Ranking

GitHub popular AI projects by growth rate

AI Developer Ranking

GitHub popular AI developer ranking

AI Organization Ranking

GitHub popular AI organization ranking

Popular Open Source Categories

Deepseek

GitHub popular deepseek open source projects

TTS

GitHub popular TTS open source projects

LLM

GitHub popular LLM open source projects

ChatGPT

GitHub popular ChatGPT open source projects

AI Open Source Project Library

Overview

Overview of GitHub popular AI open source projects

Product Library Tool Navigation

FunAudioLLM

Foundation model for natural voice interaction understanding and generation

CommonProductOthersSpeech RecognitionSpeech Synthesis

Visit

FunAudioLLM is a framework aimed at enhancing natural voice interaction between humans and Large Language Models (LLMs). It comprises two innovative models: SenseVoice, responsible for high-precision multi-lingual speech recognition, emotion recognition, and audio event detection; and CosyVoice, responsible for natural voice generation, supporting multi-lingual, timbre, and emotion control. SenseVoice supports over 50 languages with extremely low latency; CosyVoice excels in multi-lingual voice generation, zero-shot context generation, cross-lingual voice cloning, and instruction following capabilities. Relevant models are open-sourced on Modelscope and Huggingface, and corresponding training, inference, and fine-tuning codes are released on GitHub.

Visit

FunAudioLLM Visit Over Time

Monthly Visits

10165

Bounce Rate

57.90%

Page per Visit

1.1

Visit Duration

00:00:20

FunAudioLLM Visit Trend

FunAudioLLM Visit Geography

FunAudioLLM Traffic Sources

FunAudioLLM Alternatives

FunAudioLLM — Foundation model for natural voice interaction understanding and generation

Others

•Speech Recognition•Speech Synthesis

882

Sesame AI — Sesame AI is an advanced text-to-speech platform that generates natural conversational speech with emotional intelligence.

Others

•Speech Synthesis•Artificial Intelligence

1170

Inkr — Inkr transcription is a fast, accurate, and smooth audio and video transcription tool.

ChineseSelection

•Transcription•Speech Recognition

294

Llasa — A TTS base model based on the Llama framework, compatible with 160,000 hours of tokenized speech data.

Productivity

•Speech Synthesis•Artificial Intelligence

360

ElevenLabs Scribe — Scribe is the world's most accurate speech-to-text model, supporting 99 languages.

Productivity

•Speech Recognition•Multilingual

486

Phi-4-multimodal-instruct — Phi-4-multimodal-instruct is a lightweight, multimodal foundational model developed by Microsoft, supporting text, image, and audio inputs.

Productivity

•Multimodal•Speech Recognition

336

FireRedASR-AED-L — An open-source industrial-grade automatic speech recognition model that excels in Mandarin, dialects, and English.

Productivity

•Speech Recognition•Open Source

390

GLM-4-Voice — An end-to-end English-Chinese voice dialogue model.

Productivity

•Speech Recognition•Speech Synthesis

510

Deepgram Voice Agent API — Real-time conversational AI with one-click API integration.

Programming

•Speech Recognition•Speech Synthesis

516

iFlytek Virtual Human — Full-Stack Virtual Human Multi-Scenario Application Services

ChineseSelection

•AI Virtual Image•Speech Recognition

630

EVI 2 — A new foundational voice-to-voice model that delivers a human-like conversation experience.

chatting

•Artificial Intelligence•Speech Recognition

258

Mini-Omni — An open-source multimodal large language model that supports real-time voice input and streaming audio output.

Productivity

•Multimodal•Speech Recognition

798

speech-to-speech — Open-source speech-to-speech conversion module

Programming

•Speech Recognition•Natural Language Processing

732

SenseVoice — Multilingual speech understanding model providing high-precision speech recognition and sentiment analysis.

Others

•Speech Recognition•Sentiment Analysis

1884

Azure Cognitive Services Speech — Enables applications to interact intelligently through the conversion of speech to text and vice versa.

Others

•Speech Recognition•Speech Synthesis

414

ToucanTTS — Multilingual controllable text-to-speech synthesis toolkit

Education

•Text-to-Speech•Speech Synthesis

966

sherpa-onnx — Open-source project supporting various speech recognition and speech synthesis functionalities

Programming

•Speech Recognition•Speech Synthesis

1980

StreamSpeech — Real-time speech translation, bridging cross-language communication.

Productivity

•Real-time translation•Multi-task learning

1086

ChatTTS.com — Text-to-speech model for natural conversational scenarios

Others

•Speech Synthesis•Dialogue

1176

Xunfei A.I. Intelligent Customer Service Solution — A multi-channel intelligent customer service solution based on科大讯飞speech technology.

ChineseSelection

•Intelligent Customer Service•Speech Recognition

4512

Neon AI — Easy-to-use conversational AI, meeting the needs of businesses and families.

Productivity

•Conversational AI•Speech Recognition

198

EaseVoice Trainer — A simple and easy-to-use speech cloning and speech model training tool.

Music

•Speech Synthesis•Machine Learning

HaiSnap — Breaking technological boundaries, unleashing the growth of creativity.

GlobalTrending

•Creativity•Productivity

Amazon Nova Sonic — Amazon's new foundational model understands tone, intonation, and rhythm, enhancing the naturalness of human-computer dialogue.

Productivity

•Speech Recognition•Artificial Intelligence

AI News

AI Daily

AI Timeline

Al Hardware

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview

FunAudioLLM

FunAudioLLM Visit Over Time

FunAudioLLM Visit Trend

FunAudioLLM Visit Geography

FunAudioLLM Traffic Sources

FunAudioLLM Alternatives

FunAudioLLM — Foundation model for natural voice interaction understanding and generation

Sesame AI — Sesame AI is an advanced text-to-speech platform that generates natural conversational speech with emotional intelligence.

Inkr — Inkr transcription is a fast, accurate, and smooth audio and video transcription tool.

Llasa — A TTS base model based on the Llama framework, compatible with 160,000 hours of tokenized speech data.

ElevenLabs Scribe — Scribe is the world's most accurate speech-to-text model, supporting 99 languages.

Phi-4-multimodal-instruct — Phi-4-multimodal-instruct is a lightweight, multimodal foundational model developed by Microsoft, supporting text, image, and audio inputs.

FireRedASR-AED-L — An open-source industrial-grade automatic speech recognition model that excels in Mandarin, dialects, and English.

GLM-4-Voice — An end-to-end English-Chinese voice dialogue model.

Deepgram Voice Agent API — Real-time conversational AI with one-click API integration.

iFlytek Virtual Human — Full-Stack Virtual Human Multi-Scenario Application Services

EVI 2 — A new foundational voice-to-voice model that delivers a human-like conversation experience.

Mini-Omni — An open-source multimodal large language model that supports real-time voice input and streaming audio output.

speech-to-speech — Open-source speech-to-speech conversion module

SenseVoice — Multilingual speech understanding model providing high-precision speech recognition and sentiment analysis.

Azure Cognitive Services Speech — Enables applications to interact intelligently through the conversion of speech to text and vice versa.

ToucanTTS — Multilingual controllable text-to-speech synthesis toolkit

sherpa-onnx — Open-source project supporting various speech recognition and speech synthesis functionalities

StreamSpeech — Real-time speech translation, bridging cross-language communication.

ChatTTS.com — Text-to-speech model for natural conversational scenarios

Xunfei A.I. Intelligent Customer Service Solution — A multi-channel intelligent customer service solution based on科大讯飞speech technology.

Any GPT — A multi-modal large-scale language model

Whisper — General-purpose Speech Recognition Model

Xfyun Open Platform — An AI-powered open platform based on voice interaction

What Would They Say — Smart language assistant, making communication easier

Speechllect — Real-time AI speech-to-text/text-to-speech solution

TTSLabs — Online Voice Synthesis and Speech Recognition Service

Neon AI — Easy-to-use conversational AI, meeting the needs of businesses and families.

EaseVoice Trainer — A simple and easy-to-use speech cloning and speech model training tool.

HaiSnap — Breaking technological boundaries, unleashing the growth of creativity.

Amazon Nova Sonic — Amazon's new foundational model understands tone, intonation, and rhythm, enhancing the naturalness of human-computer dialogue.