Doubao Launches Real-time Voice Large Model: Leading in Chinese, Online Emotional Intelligence

AIbase基地

Published inAI News · 4 min read · Jan 20, 2025

405

Recently, Doubao Company announced the launch of its brand new real-time voice large model, claiming to achieve a "cliff-like lead" in Chinese dialogue, marking a significant enhancement in AI conversational capabilities. This model is fully available in the Doubao App (version 7.2.0 Spring Edition), providing users with a richer and more authentic voice communication experience.

According to reports, Doubao's real-time voice large model achieves a deep integration of speech understanding and generation, forming an end-to-end voice dialogue system. This technological breakthrough allows the model to excel in voice expressiveness, control, and emotional continuity, featuring low latency and the ability to interrupt conversations at any time, greatly enhancing user interaction experience. The official statement indicates that this technology not only improves "IQ" but also emotional intelligence, enabling better understanding and expression of emotions.

This update also includes a real-time voice call feature, which leverages Doubao's latest large model to flexibly adjust dialogue pace, retroflex sounds, volume, and breathiness in different scenarios. Additionally, the new voice function can mimic various vocal tones, support multiple dialects and English conversations, and even has the ability to sing certain songs. All of this elevates the realism of human-machine dialogue to a new level, almost reaching a state where "it’s hard to distinguish between human and machine."

Doubao's research and development team stated that this new technology is based on an end-to-end framework, deeply integrating speech and text patterns through native methods for unified modeling. This design not only optimizes the processes of speech recognition and generation but also endows AI with a richer "soul," enabling it to communicate better with humans.

The launch of Doubao's real-time voice large model in the field of Chinese voice dialogue will provide users with an unprecedented interactive experience and promote the development of intelligent voice technology.

Real-time Voice Large Model Doubao App Voice Dialogue System Low Latency

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

DouBao · Voice Podcast Model Released and Will Be Launched on the DouBao App, PC End, and Other Platforms

ByteDance's Volcano Engine has officially launched the DouBao · Voice Podcast Model. Built on a streaming model, it can achieve second-level conversion from text creation to two-way conversational podcasts, offering users an entirely new experience of "low cost, high efficiency, and strong interaction." This release not only resolves many challenges in traditional AI podcast creation but also greatly simplifies the production process, making trending content instantly transform into vivid podcasts.

May 21, 2025

230

ByteDance Restructures AI Product Line: Cat Box Leadership Change, Xinghui Merged into Doubao, Focusing on Growth

According to LatePost, ByteDance recently made significant adjustments to its AI product department, Flow. The social companionship AI product, Cat Box, has a new leader. The previous head, Liang Chenqi, has left the company, and has been replaced by Xi Yuan (codename), the former head of Xinghui. Meanwhile, the Xinghui team, which develops AI camera and image generation applications, is slated to merge into the Doubao App, under the unified management of Doubao App's head, Lu You (codename). The Flow department is headed by Zhu Jun and includes Doubao, Cat Box, Xinghui, Doubao Aixue, and G.

Apr 23, 2025

380

ByteDance Releases Doubao 1.5 Deep Thinking Model: Multimodal Deep Thinking, Low Latency

Apr 17, 2025

890

99 Languages, Low Latency, AI-Powered Summarization... How Powerful Are These Speech-to-Text Tools?

Mar 5, 2025

600

AI Daily: Alibaba's Spring Recruitment Features 1500 AI-Related Positions; DeepSeek Open-Sources DualPipe and EPLB Technologies; ByteDance's Doubao App Launches 'Photos to Life' Feature

Welcome to the AI Daily column! Your daily guide to exploring the world of artificial intelligence. We bring you the hottest AI news, focusing on developers and helping you understand technology trends and innovative AI applications. Discover new AI products here: https://top.aibase.com/ 1. DeepSeek Releases Parallel Strategy Upgrade on Day Four of Open Source Initiative: DualPipe and EPLB Technologies Revolutionize Large Model Training DeepSeek, on the fourth day of its open-source initiative, has introduced optimizations...

Feb 27, 2025

220

Doubao App, ByteDance's AI Assistant, Launches 'Bring Photos to Life' Feature

ByteDance's AI-powered assistant app, Doubao, recently announced a new feature called "Bring Photos to Life." This feature aims to meet users' needs for animating old photos, breathing new life into cherished memories. The process is reportedly simple: users open the Doubao app, select the feature, upload an old photo, and describe the actions of the people or objects in the photo. After a short wait, Doubao uses advanced AI technology to transform the static image into a lively animation.

Feb 27, 2025

380

Lantu Auto to Launch Ultra-Efficient AI Voice Dialogue System with Sub-1-Second Response Time

Lantu Auto announced that its self-developed AI voice dialogue system will be officially launched in the first half of 2025 and integrated into its upcoming new models. The system boasts an extremely fast response time of less than one second for all operations, ensuring a smoother user experience for voice-controlled vehicle functions. Furthermore, it features over 98% wake-up recognition accuracy, enabling precise command recognition. With continued technological advancements, Lantu Auto plans a large-scale OTA (Over-the-Air) update in the second half of 2025.

Feb 26, 2025

1.1k

Voice AI 'Step to Success'! Step Audio Unveils 130B Dominant Voice Model, Real-Time Dialogue + Emotion Cloning, Here It Comes!

The voice interaction field has reached a milestone breakthrough! Domestic AI company Step Audio has recently made headlines by open-sourcing a massive 130 billion parameter voice model, attracting significant attention in the industry. Dubbed a 'dominant' model, it is the first product-grade open-source real-time voice dialogue system that integrates voice understanding and generation control. The comprehensiveness of its functions and the advanced technology are astonishing, indicating that the development of voice AI technology may leap to a new height in a 'step to success'. The core highlight of this open-source model lies in its...

Feb 18, 2025

5.4k

Doubao App Launches New Voice Mode, First to Implement Singing and Role Playing with GPT-4o

On January 20, 2025, Doubao App officially released its latest 'end-to-end' voice large model, with significant updates to its real-time voice call functionality. This development marks another leap for Doubao in the field of voice interaction, surpassing the previous ASR (Automatic Speech Recognition), LLM (Large Language Model), and TTS (Text-to-Speech) cascading solutions by integrating voice recognition, understanding, and generation into a single model. After testing by 'Intelligent Emergence,' the standout feature of the new Doubao version is its human-like capabilities.

Jan 21, 2025

3.7k

Lightning: Ultra-Fast Text-to-Speech Model with Ultra-Low Latency, 100ms to Generate 10 Seconds of Audio

Nov 6, 2024

3.6k

AI News

AI Daily

AI Timeline

Al Hardware

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview