On July 25th, Volcano Engine held its 2024 AI Innovation Tour in Chengdu. Volcano Engine announced that the daily token usage of its Doubao large model has exceeded 500 billion, with the average daily token usage per enterprise customer increasing 22-fold since the model's release on May 15th. Zhang Xin, Vice President of Volcano Engine, stated that Volcano Engine is evolving towards greater intelligence, industry specialization, and regionalization, and is helping enterprises innovate their business through industry solutions, products, and optimized services.

ByteDance Douyin Doubao Large Model

At the event, Volcano Engine unveiled the latest capabilities of the Doubao large model, including upgrades in visual imagery, voice synthesis, and voice replication. The Doubao Image-to-Image model and the Doubao Text-to-Image model have excelled in preserving the original image features and enhancing the quality of the images. The Doubao Voice Synthesis model and the Doubao Voice Replication model have improved in expressing emotions and replicating the speaker's voice characteristics.

  1. Doubao Image-to-Image model: Not only highly preserves the multi-dimensional features of the original image such as the character outline, expression, and spatial structure, but also supports over 50 different styles, including image expansion, local redrawing, and smudge play, allowing creative extensions of the image. It is now applied in applications like Douyin, Jianying, Doubao, and Xinghui, and serves companies like Samsung and Nubia, covering multiple fields such as mobile photo albums, tool assistants, e-commerce marketing, and advertising placement.

  2. Doubao Text-to-Image model: Deeply understands information such as multiple subject quantities, subject-object relationships, character construction, and spatial construction, matching images and text more accurately; excels in enhancing image quality from three aspects: light and shadow, atmospheric color, and character aesthetics; optimizes content with Chinese characteristics, capable of detailed understanding of Chinese characters, items, dynasties, geography, food, and festivals.

  3. Doubao Voice Synthesis model: Can deeply understand the plot and characters, correctly express emotions; retains speech habits such as swallowing and accents, comparable to real human voices, making the voice more natural; offers 26 premium hyper-natural voices to meet various scene needs.

  4. Doubao Voice Replication model: Only requires 5 seconds to replicate a high-fidelity voice, highly restoring the speaker's voice characteristics and accent, supports cross-language transfer across six major languages, and the pronunciation is closer to local expressions.

Additionally, Volcano Ark provides core plug-ins and agent capabilities, as well as a full-cycle data security and trustworthiness solution, helping enterprises easily implement large models. The same three plug-ins as Toutiao Douyin have been upgraded, and new web parsing and calculator plug-ins have been added to support diverse enterprise application needs. Volcano Engine offers the Button Professional Edition, supporting low-code construction of expert-level "AI Bot" tailored to enterprise business scenarios.

Volcano Engine has also created the HiAgent enterprise-exclusive AI application innovation platform, helping enterprises bridge the last mile to embrace large models. HiAgent empowers enterprise AI applications from three dimensions: speed, density, and thickness, facilitating rapid implementation and continuous optimization. Volcano Engine's AI full-stack cloud, relying on ByteDance's massive resource pool, supports multi-chip and multi-cloud architectures, provides ultra-large-scale computing power, supports ten-thousand-card cluster networking, and trillion-parameter MoE large models.