At the ByteDance AI Innovation Tour Shanghai Station held on August 21, 2024, ByteDance showcased the comprehensive upgrade of its Doubao large model and enhancements to its conversational AI real-time interaction solutions.

Since its release on May 15, the Doubao large model has seen an average daily token usage exceeding 500 billion, with enterprise customer usage growing 22-fold. The new version of the Doubao large language model has seen a 20.3% improvement in overall capabilities, with a 38.3% increase in role-playing abilities and a 33.3% enhancement in language comprehension.

ByteDance Douyin Doubao Large Model

Among these, the Doubao text-to-image model excels in accurately matching long texts with images, has stronger image generation capabilities for complex scenarios such as multiple subjects and positions, and better understands Chinese elements, creating more aesthetically pleasing Chinese-style images. The Doubao speech recognition model, based on the rich knowledge and reasoning capabilities of the large language model, enhances speech recognition accuracy through contextual awareness, achieving up to a 40% reduction in error rates in multiple public test sets compared to other publicly released speech recognition models in China. It supports the recognition of Mandarin and dialects including Cantonese, Shanghainese, Sichuanese, Xi'an dialect, and Minnan. The Doubao speech synthesis model has upgraded its streaming speech synthesis capabilities, enabling real-time response and precise sentence breaks, supporting "thinking while speaking."

Additionally, ByteDance released a conversational AI real-time interaction solution, which integrates the Doubao large model with real-time communication (RTC) technology to provide an end-to-end large model real-time dialogue solution. Enterprises can easily embed this real-time voice function into their AI applications, allowing users to not only engage in voice conversations with AI but also interrupt or interject during the conversation as they would in normal speech. The upgraded AI voice has improved expressiveness and emotional color, making the dialogue more natural, authentic, and fluent, enhancing the large model interaction experience.

ByteDance also announced a strategic partnership with DMALL to establish the Retail Large Model Ecosystem Alliance, aimed at promoting the intelligent upgrade of the retail industry and accelerating industry innovation. The inaugural ceremony involved 18 initial member units. Furthermore, the Automotive Large Model Ecosystem Alliance welcomed new members, with ByteDance collaborating with alliance members on AI automotive industry definitions and evaluation standard releases.