In the field of artificial intelligence's multimodal capabilities, domestic large models are demonstrating robust strength. The latest SuperCLUE-V ranking of Chinese multimodal large model evaluations shows that Tencent's hunyuan-vision and Shanghai AI Lab's InternVL2-40B have emerged as the top leaders in the domestic closed-source and open-source realms, respectively, surpassing even internationally renowned models like Claude-3.5-Sonnet and Google's Gemini-1.5-Pro.

Tencent's multimodal version of the Hunyuan large model, hunyuan-vision, is not only favored by developers for its API calls but also offers free user experiences in Tencent's Yuanbao APP. Known as a "practical AI companion," Yuanbao APP emphasizes practicality and ease of use, and its breakthrough in multimodal capabilities has earned it the top spot in domestic evaluations.

To more visually demonstrate the progress of domestic multimodal large models, we conducted a series of tests on Tencent Yuanbao. From understanding meme stickers, recognizing photo content, to challenging visual illusions, Yuanbao has shown outstanding performance. In practical application scenarios, whether it's summarizing financial reports, recognizing academic charts, or solving pattern-finding questions in aptitude tests, Yuanbao can accurately understand and provide reasonable answers.

▲ Image source:  CLUE Chinese Language Understanding Evaluation Benchmark

Especially in an additional question that tested understanding of Chinese cultural context, Tencent Yuanbao accurately identified a screenshot from "Calabash Brothers" and correctly answered the related questions, showing its advantage in understanding the Chinese context.

As an "old friend," Tencent's Hunyuan large model has been rapidly iterating since its debut in September last year, now expanded to a trillion-parameter scale, covering text, multimodal understanding, and generation. Among domestic large models, Tencent Hunyuan was the first to complete the upgrade to the MoE architecture, transitioning from a single dense model to a sparse model composed of multiple experts.

Tencent Yuanbao APP, focusing on "practical AI companion," not only excels in multi-terminal synchronization and chat history synchronization but also demonstrates strong multimodal understanding capabilities. Whether it's document screenshots, portraits and landscapes, receipts, or any photo, Yuanbao can provide its own understanding and analysis based on the content of the image.

The Tencent Yuanbao team stated that they will focus more on integrating the model's multimodal capabilities to further enhance user experience. Meanwhile, Tencent has also updated features in deep search and deep long-form reading, reducing the exposure of technical details and simplifying user operations.