Today, at the Create 2025 AI Developer Conference held in Wuhan, Baidu founder Robin Li delivered a nearly 60-minute keynote speech themed "The World of Models, the Reign of Applications." He officially launched the Wenxin large model 4.5 Turbo and X1 Turbo versions, and revealed the deployment progress and existing challenges of the DeepSeek model within the Baidu ecosystem.
Li revealed that core Baidu products, including Wenxiaoyan, Baidu Search, and Baidu Maps, have integrated the full-fledged DeepSeek model, improving efficiency in intelligent customer service and search enhancement. However, he also frankly admitted that DeepSeek currently has technical limitations: "The model only supports text processing and cannot generate multi-modal content such as images, audio, and video. Over 60% of Baidu Smart Cloud's enterprise clients have a clear demand for multi-modal capabilities." He used the e-commerce live streaming scenario as an example, pointing out that DeepSeek's high hallucination rate (generating incorrect information) makes it difficult to directly apply in high-risk fields such as finance and medicine. "If it incorrectly promotes a 'buy one, get ten free' offer during a live stream, the merchant will face uncontrollable losses." Furthermore, its slow response speed and high call cost also limit large-scale deployment – currently, the API call cost of most large models in the Chinese market is lower than that of the full-fledged DeepSeek version, and the response speed is faster.
To address these pain points, the two versions of the Wenxin large model released by Baidu propose three major upgrade directions: multi-modality, strong reasoning, and low cost. Li emphasized that multi-modality has become the "entry ticket" for the next generation of AI models. "The market share of pure text models will shrink rapidly; enterprises need general-purpose models that can handle text, images, audio, and video simultaneously."
In terms of performance, the Wenxin 4.5 Turbo's price has dropped to 0.8 yuan per million tokens for input and 3.2 yuan for output, with inference speed increased by 2 times and cost reduced by 80% compared to the previous generation. The X1 Turbo version costs 1 yuan for input and 4 yuan for output, further reducing the price by 50% while improving performance, and supports bilingual Chinese and English, as well as optimization for vertical fields such as law and medicine.
"Cost is the core constraint on the explosion of AI applications." Li stated frankly that the biggest obstacle for developers building AI applications is the high cost of computing power, and the essence of technological innovation often stems from breakthroughs in cost structure. He used autonomous driving as an example, pointing out that the price of lidar was over $100,000 in 2015, but has now dropped to the level of thousands of dollars, directly promoting the popularization of L4-level technology. The price reduction of the Wenxin large model aims to lower the deployment threshold for enterprises, "allowing SMEs to afford large models, allowing entrepreneurs to dare to experiment, and ultimately giving birth to AI-native applications in all walks of life."
According to Baidu's technical team, Wenxin 4.5 Turbo and X1 Turbo have been simultaneously opened for public cloud and private deployment, supporting on-demand calls and customized training. Market analysis suggests that Baidu is attempting to find a balance between model capabilities and commercialization through technological iteration and cost restructuring, and multi-modal capabilities and price advantages may become its key bargaining chips in competing for enterprise customers.