At the Volcano Engine FORCE Power Conference on December 18, 2024, Volcano Engine announced a comprehensive upgrade to the Doubao large model family and launched a brand new visual understanding model.
Tan Dai, the president of Volcano Engine, stated that the daily token usage of the Doubao large model has surged in the past few months, exceeding 4 trillion tokens, a 33-fold increase compared to its release in May. This growth trend indicates the widespread use of the Doubao large model across various application scenarios.
With the launch of the visual understanding model, users can input both text and image questions simultaneously, allowing the model to comprehend and provide accurate answers. This innovation will significantly simplify the application development process and unlock the potential of large models in more scenarios.
The visual understanding model possesses enhanced content recognition capabilities, allowing it to identify basic elements such as object categories and shapes in images, as well as understand the relationships between objects, spatial layouts, and the overall meaning of scenes. For example, it can recognize shadows and understand natural knowledge.
The visual understanding model also features stronger understanding and reasoning abilities, enabling it to better recognize content and perform complex logical calculations based on the identified text and image information, such as chart reasoning and physical reasoning.
Additionally, it has a more refined visual description capability, allowing for detailed descriptions of the content presented in images and enabling various forms of creative writing, such as image creation and image poetry.
The Doubao visual understanding model shows broad application prospects in various fields such as education, tourism, and e-commerce. For instance, in education, the model can help students optimize their essays and scientific knowledge; in tourism, it can provide translations of foreign menus and explanations of architectural backgrounds for tourists; in e-commerce marketing, it can assist merchants in detailing product features, thus improving advertising effectiveness.
The usage cost of the visual understanding model is very affordable, with a price of 0.003 yuan per thousand tokens, which is 85% lower than the industry average. This pricing allows for the processing of up to 284 images at 720P for every yuan spent, marking the entry of visual understanding technology into the "cent era." Furthermore, Volcano Engine offers up to 15,000 initial traffic supports for enterprises and developers to better utilize this technology.
At this conference, Volcano Engine not only launched the visual understanding model but also upgraded several other models. The comprehensive task handling capability of the Doubao general model pro has improved by 32% since May, with significant enhancements in reasoning, instruction following, coding, and mathematics. Meanwhile, the Doubao video generation model will be available for external service in January 2025, and enterprises can make reservations to use it.
To enhance enterprises' information acquisition and search recommendation capabilities, Volcano Engine also launched a comprehensive AI search service, helping businesses better connect information with user needs and facilitating the intelligent transformation of various industries.
Key Points:
🔍 The daily token usage of the Doubao large model has reached 4 trillion, a 33-fold increase since May.
💡 The newly launched visual understanding model supports simultaneous input of text and images, applicable in education, tourism, and e-commerce.
💰 The usage cost is only 0.003 yuan per thousand tokens, significantly lower than the industry average.