At the launch of the Volcano Engine AI Innovation Tour in Hangzhou on April 17th, Tan Dai, president of ByteDance subsidiary Volcano Engine, officially announced the latest deep reasoning model, Doubao 1.5. The launch attracted significant attention from industry professionals, with Tan Dai showcasing the model's exceptional performance across various domains.
Reportedly, the Doubao 1.5 model demonstrated exceptional capabilities in professional fields such as mathematics, programming, and scientific reasoning, as well as creative writing tasks. This new model utilizes a Mixture of Experts (MoE) architecture with a total of 20 billion parameters, while only 2 billion parameters are activated. This is significantly smaller than the parameter scale of similar industry models, resulting in a clear advantage in inference cost.
Tan Dai also detailed the powerful features of the Doubao 1.5 deep reasoning model, including various applications leveraging visual understanding technology. These capabilities can not only analyze landscapes from photos but also assist users with ordering food while traveling and even help businesses complete project management workflows.
Furthermore, Volcano Engine also released version 3.0 of the text-to-image generation model, Doubao. This update features improved text formatting, more refined image generation capabilities, and the ability to directly output 2K images. Users can enjoy a richer visual experience.
Even more noteworthy is the improvement in the new model's video search capabilities. When conducting a search, the model can quickly locate relevant answers within the video. This feature significantly improves user convenience in obtaining information.
According to Tan Dai, the usage of the Doubao model is growing at an astonishing rate. In March 2025, the daily token usage of the Doubao model exceeded 12.7 trillion, a more than 106-fold increase compared to its initial launch. This data fully reflects the popularity of the Doubao model in the market.
Highlights:
📈 The Doubao 1.5 model exhibits exceptional performance across various professional fields and creative writing, utilizing advanced MoE architecture and optimized parameter configuration.
🌍 The new model, combined with visual understanding technology, can analyze photos, assist with travel, and project management, offering powerful features.
🎥 Video search capabilities have been significantly enhanced, allowing users to quickly access relevant information within videos, with continued growth in usage.