At the recently concluded Baidu AI DAY, Wen Xiaoyan officially announced its brand refresh and feature upgrades. This upgrade includes not only a new visual identity but also, more importantly, the introduction of multi-model fusion scheduling technology, which significantly enhances its speech recognition and image question-answering capabilities.

Wen Xiaoyan's multi-model fusion scheduling is the core highlight of this upgrade. By integrating Baidu's self-developed models like Wenxin X1 and Wenxin 4.5, and incorporating high-quality third-party models such as DeepSeek-R1 and Keling, users can flexibly choose the most suitable model based on their needs. With a simple click of the "automatic mode," the system intelligently selects the optimal model combination, significantly improving response speed and task processing capabilities, truly achieving a one-click problem-solving ideal experience.

image.png

In terms of voice capabilities, the newly upgraded large-scale voice model supports conversations in multiple dialects, complex knowledge question-answering, and even allows for interruption during conversations. This means users can not only obtain knowledge answers through voice but also engage in fun role-playing, resulting in a richer interactive experience. Baidu's chief voice architect, Jia Lei, pointed out that this model is the industry's first end-to-end speech-language large model based on the novel cross-attention technology. Compared to the industry average, its invocation cost is reduced by 50%-90%. Simultaneously, the model's inference response speed is extremely fast, with waiting time reduced to around 1 second, making the user interaction more fluid.

Furthermore, Wen Xiaoyan has launched an innovative image question-answering function. Users can take photos or upload images and ask questions directly in text or voice to obtain detailed analysis. For example, users can photograph a math problem and receive real-time solutions and video explanations; uploading product images can help compare parameters and prices, facilitating shopping decisions. The newly added "Fun Fact from a Picture" feature is also entertaining, allowing users to preset perspectives such as "history scholar" or "tech expert" to interpret the same image from multiple dimensions, enhancing the fun of interaction.

This upgrade to Baidu Wen Xiaoyan undoubtedly provides users with a more intelligent and convenient experience, and future interactions will be even more diverse.