Recently, Doubao Company announced the launch of its brand new real-time voice large model, claiming to achieve a "cliff-like lead" in Chinese dialogue, marking a significant enhancement in AI conversational capabilities. This model is fully available in the Doubao App (version 7.2.0 Spring Edition), providing users with a richer and more authentic voice communication experience.

According to reports, Doubao's real-time voice large model achieves a deep integration of speech understanding and generation, forming an end-to-end voice dialogue system. This technological breakthrough allows the model to excel in voice expressiveness, control, and emotional continuity, featuring low latency and the ability to interrupt conversations at any time, greatly enhancing user interaction experience. The official statement indicates that this technology not only improves "IQ" but also emotional intelligence, enabling better understanding and expression of emotions.

image.png

This update also includes a real-time voice call feature, which leverages Doubao's latest large model to flexibly adjust dialogue pace, retroflex sounds, volume, and breathiness in different scenarios. Additionally, the new voice function can mimic various vocal tones, support multiple dialects and English conversations, and even has the ability to sing certain songs. All of this elevates the realism of human-machine dialogue to a new level, almost reaching a state where "it’s hard to distinguish between human and machine."

Doubao's research and development team stated that this new technology is based on an end-to-end framework, deeply integrating speech and text patterns through native methods for unified modeling. This design not only optimizes the processes of speech recognition and generation but also endows AI with a richer "soul," enabling it to communicate better with humans.

The launch of Doubao's real-time voice large model in the field of Chinese voice dialogue will provide users with an unprecedented interactive experience and promote the development of intelligent voice technology.