Tencent Hunyuan Leads in Multimodal AI: Comprehensive Advantage Over GPT-4/Claude-3.5

AIbase基地

Published inAI News · 4 min read · Aug 8, 2024

421

In the field of artificial intelligence's multimodal capabilities, domestic large models are demonstrating robust strength. The latest SuperCLUE-V ranking of Chinese multimodal large model evaluations shows that Tencent's hunyuan-vision and Shanghai AI Lab's InternVL2-40B have emerged as the top leaders in the domestic closed-source and open-source realms, respectively, surpassing even internationally renowned models like Claude-3.5-Sonnet and Google's Gemini-1.5-Pro.

Tencent's multimodal version of the Hunyuan large model, hunyuan-vision, is not only favored by developers for its API calls but also offers free user experiences in Tencent's Yuanbao APP. Known as a "practical AI companion," Yuanbao APP emphasizes practicality and ease of use, and its breakthrough in multimodal capabilities has earned it the top spot in domestic evaluations.

To more visually demonstrate the progress of domestic multimodal large models, we conducted a series of tests on Tencent Yuanbao. From understanding meme stickers, recognizing photo content, to challenging visual illusions, Yuanbao has shown outstanding performance. In practical application scenarios, whether it's summarizing financial reports, recognizing academic charts, or solving pattern-finding questions in aptitude tests, Yuanbao can accurately understand and provide reasonable answers.

▲ Image source: CLUE Chinese Language Understanding Evaluation Benchmark

Especially in an additional question that tested understanding of Chinese cultural context, Tencent Yuanbao accurately identified a screenshot from "Calabash Brothers" and correctly answered the related questions, showing its advantage in understanding the Chinese context.

As an "old friend," Tencent's Hunyuan large model has been rapidly iterating since its debut in September last year, now expanded to a trillion-parameter scale, covering text, multimodal understanding, and generation. Among domestic large models, Tencent Hunyuan was the first to complete the upgrade to the MoE architecture, transitioning from a single dense model to a sparse model composed of multiple experts.

Tencent Yuanbao APP, focusing on "practical AI companion," not only excels in multi-terminal synchronization and chat history synchronization but also demonstrates strong multimodal understanding capabilities. Whether it's document screenshots, portraits and landscapes, receipts, or any photo, Yuanbao can provide its own understanding and analysis based on the content of the image.

The Tencent Yuanbao team stated that they will focus more on integrating the model's multimodal capabilities to further enhance user experience. Meanwhile, Tencent has also updated features in deep search and deep long-form reading, reducing the exposure of technical details and simplifying user operations.

Artificial Intelligence Tencent hunyuan-vision Yuanbao APP

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Dozens of Works Gain Hundreds of Thousands of Followers Teach You How to Use DeepSeek + Jiameng APP to Create Story Picture Book Short Videos for Monetization

【140-word summary】This tutorial introduces the monetization methods of using AI tools to mass-produce pet story picture book videos. Using DeepSeek to generate story scripts, Jiameng APP to create dynamic videos, and CapCut for post-processing, then publishing them on short video platforms. Monetization methods include traffic sharing, selling pet supplies, and training students. Suitable for short video creators, pet lovers, and those looking for side jobs. The operation threshold is low. Case studies show that a single video can receive tens of thousands of likes. The combination of tools reduces the difficulty of content creation, forming a complete business cycle.

Jul 14, 2025

The Ministry of Industry and Information Technology will release the 'International Artificial Intelligence Open Source Cooperation Initiative' at the 2025 World Artificial Intelligence Conference

The 2025 World AI Conference, themed 'Intelligent Era, Global Collaboration', will be held in Shanghai from July 26-28. It will launch an international AI open-source initiative and showcase latest AI technologies, building on its success since 2018 (300k+ visitors in 2024). China also plans a BRICS AI cooperation center.....

Jul 14, 2025

Study Warns of Major Risks in Using Artificial Intelligence to Treat Chatbots

Stanford study warns of risks in AI therapy chatbots, showing stigmatization of mental conditions and inadequate crisis responses. Some AIs failed to detect dangers, giving mechanical replies. Researchers recommend auxiliary roles over replacing therapists.....

Jul 14, 2025

Tencent Hunyuan-A13B Model API Launches

Recently, Tencent Cloud officially launched the API service for the Tencent Hunyuan A13B model on its official website. The input price is set at 0.5 yuan per million Tokens, and the output price is 2 yuan per million Tokens, which has quickly sparked enthusiastic discussions in the developer community. As the first 13B-level MoE (Mixture of Experts) open-source hybrid inference model in the industry, Hunyuan-A13B features a total of 80B parameters and only 13B activated parameters, achieving performance comparable to leading open-source models of the same architecture, while also demonstrating efficient reasoning capabilities.

Jul 11, 2025

200

Google Announces the Latest Class of Students at the American Artificial Intelligence Infrastructure Institute

Jul 11, 2025

150

Using AI to Simulate User Behavior, Blok Helps Developers Improve App Experience

Blok is a startup that focuses on AI testing tools. Its innovative technology can simulate user roles for application testing, helping developers predict user behavior in advance. The founding team consists of experienced entrepreneurs and has raised 7.5 million USD in funding. Compared to traditional testing tools, Blok is more forward-thinking and can provide improvement suggestions before coding. The product is currently in closed beta, primarily serving industries such as finance and healthcare, which require high testing accuracy. It is expected to generate millions of dollars in revenue this year.

Jul 10, 2025

130

Meituan Invests Again in the Field of Embodied Intelligence, Xinghai Tu Completes Over $100 Million Financing

Jul 9, 2025

140

AI Daily: Tencent Huyaun Launches 3D Generation Large Model Hunyuan3D-PolyGen; DingTalk AI Spreadsheet Makes a Big Entry; Alibaba Launches Multimodal Large Language Model HumanOmniV2

1.Tencent's Hunyuan3D-PolyGen boosts 3D modeling efficiency by 70% with BPT tech. 2.Alibaba's HumanOmniV2 achieves 69.33% accuracy in multilingual input. 3.DingTalk AI processes 1k tasks/hour with 'spreadsheet-as-document'. 4.Baidu PaddleOCR3.1 improves 37-language recognition by 30%. 5.Microsoft Deep Research opens API. 6.HKPolyU & OPPO's DLoRAL speeds video enhancement 10x. 7.Google opens MCP Toolbox for SQL. 8.Microsoft Win11 to add AI dynamic....

Jul 8, 2025

1.6k

Tencent Hunyuan Launches the Industry's First Art-Level 3D Generation Large Model Hunyuan3D-PolyGen

On July 7, the Tencent Hunyuan 3D team announced the launch of the industry's first art-level 3D generation large model, Hunyuan3D-PolyGen. By employing self-developed high-compression representation BPT technology and a autoregressive mesh generation framework, it enables accurate generation of complex geometric models with up to ten thousand faces. The model has breakthrough solutions for core pain points in 3D asset generation, such as poor topology quality, excessive face count, and difficulty in post-editing. It has improved the modeling efficiency of artists by over 70%. The relevant capabilities have been launched on the Tencent Hunyuan 3D AI creation engine and integrated into multiple game pipelines. Traditional

Jul 8, 2025

2.6k

Tencent Sets a New High! The First Art-Level 3D Generation Large Model Makes a Stunning Debut, Enhancing Modeling Efficiency by Over 70%!

Tencent launched Hunyuan3D-PolyGen, the industry's first art-grade 3D generation model, using self-developed BPT technology to enhance wiring quality and complex object modeling. It generates high-precision geometric models, supports multiple surface types, and boosts gaming pipeline efficiency by 70+%.....

Jul 8, 2025

710

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

Tencent Hunyuan Leads in Multimodal AI: Comprehensive Advantage Over GPT-4/Claude-3.5

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Dozens of Works Gain Hundreds of Thousands of Followers Teach You How to Use DeepSeek + Jiameng APP to Create Story Picture Book Short Videos for Monetization

The Ministry of Industry and Information Technology will release the 'International Artificial Intelligence Open Source Cooperation Initiative' at the 2025 World Artificial Intelligence Conference

Study Warns of Major Risks in Using Artificial Intelligence to Treat Chatbots

Tencent Hunyuan-A13B Model API Launches

Google Announces the Latest Class of Students at the American Artificial Intelligence Infrastructure Institute

Using AI to Simulate User Behavior, Blok Helps Developers Improve App Experience

Meituan Invests Again in the Field of Embodied Intelligence, Xinghai Tu Completes Over $100 Million Financing

AI Daily: Tencent Huyaun Launches 3D Generation Large Model Hunyuan3D-PolyGen; DingTalk AI Spreadsheet Makes a Big Entry; Alibaba Launches Multimodal Large Language Model HumanOmniV2

Tencent Hunyuan Launches the Industry's First Art-Level 3D Generation Large Model Hunyuan3D-PolyGen

Tencent Sets a New High! The First Art-Level 3D Generation Large Model Makes a Stunning Debut, Enhancing Modeling Efficiency by Over 70%!