Tencent's Hunyuan Large Model: Ranks First in Domestic Large Models for 'Image to Text' Multimodal Understanding

AIbase基地

Published inAI News · 3 min read · Aug 8, 2024

442

Tencent's Hunyuan Large Model has excelled in the SuperCLUE-V evaluation benchmark for Chinese multimodal large models, topping the August rankings among domestic large models and positioning itself in the outstanding leader quadrant. Multimodal understanding, which requires the model to accurately identify image elements, understand their relationships, and generate natural language descriptions, tests the model's precision in image recognition and its understanding of the complex real world.

This evaluation included 12 representative multimodal understanding large models from both domestic and international sources, assessing capabilities in both basic and application-oriented directions. Tencent's Hunyuan Large Model demonstrated comprehensive advantages in both areas, scoring 71.95. The SuperCLUE evaluation criteria cover aspects such as understanding accuracy, response relevance, and depth of reasoning, ensuring the scientific and impartial nature of the assessment.

WeChat Screenshot_20240808103707.png

The evaluation results indicate that domestic large models have nearly reached the level of top overseas models in basic multimodal understanding capabilities. Tencent's Hunyuan Large Model particularly stood out in application capabilities, benefiting from a deep understanding of the Chinese context and comprehensive abilities across multiple domains.

The technical foundation of Tencent's Hunyuan Large Model supports the AI-native application Tencent Yuanbao, enabling it with multimodal understanding capabilities to comprehend and analyze various types of images. Additionally, the Tencent Hunyuan Multimodal Model is now live on Tencent Cloud, offering capabilities such as image-to-text generation for enterprise and individual developers.

Jiang Jie, Vice President of Tencent, stated that the Hunyuan Large Model is evolving towards full-modal technology. Users will soon be able to experience this technology in the Tencent Yuanbao App and Tencent's internal services, and it will be available for external applications through Tencent Cloud. Currently, the Tencent Hunyuan Large Model has expanded to a trillion-parameter scale, adopting a Mixture of Experts (MoE) structure, with its multimodal understanding capabilities leading domestically.

Tencent Hunyuan Large Model SuperCLUE Multimodal Understanding Large Model

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Open-Source Multimodal Large Model EarthMind: A Revolutionary Tool for Analyzing Earth Observation Data

Jul 7, 2025

Major Transformation in Smart Speakers! Xiaomi Pro Tops AI Large Models Become the New Standard

During the 618 promotion in June 2025, the smart speaker market in China experienced a strong recovery. According to the latest data, online sales reached 802,000 units, an increase of 7.5% compared to last year, while sales revenue grew by 15.2% year-over-year. This significant growth was driven by the widespread application of AI large model technology, making smart speakers more powerful and offering a better user experience. Data shows that smart speakers equipped with AI large models accounted for nearly 40% of the market. Specifically, in the second quarter of 2025, the market...

Jul 7, 2025

AI Daily: Tencent Yuanbao Upgrades for One-Phrase Image and Video Search; WeChat Pay MCP Launches; Google Unveils Veo 3 Globally

Welcome to the [AI Daily] column! This is your guide to exploring the world of artificial intelligence every day. Each day, we present you with the latest content in the AI field, focusing on developers to help you understand technical trends and innovative AI product applications. Click to learn more about new AI products: https://top.aibase.com/1. Tencent Yuanbao upgrades again: one phrase search, images and videos appear instantly, making information retrieval more intuitive! The upgraded features of Tencent Yuanbao make information retrieval more intuitive and efficient. Users just need to ask a question in one phrase to get text and image results.

Jul 4, 2025

Google Launches New Veo 3 Video Generation Model Globally

Google announced the global launch of its latest video generation model, Veo3. This long-anticipated release has generated great excitement among users, as Veo3 is now available to Gemini users in over 159 countries, offering a new video creation experience. The key feature of the Veo3 video generation model is its ability to generate videos up to eight seconds long based on simple text prompts. According to Google, this technology is designed for creative users, especially those on social media who increasingly demand short-form content.

Jul 4, 2025

290

Tencent Yuanbao Upgrades Again: One-Phrase Search, Images and Videos Instantly Displayed, Information Access More Intuitive!

The smart assistant Yuanbao announced today a major upgrade to its core search function, introducing the new feature 'More Can Be Searched with Just One Phrase.' Now, users only need to ask a simple question, and Yuanbao will intelligently match and display content from images and video accounts, making information access more abundant and intuitive than ever before. In the past, Yuanbao could easily handle daily needs such as weather inquiries, stock price checks, and location searches. This upgrade takes Yuanbao's intelligent search capabilities to a new level. Whether you want to learn a new skill or solve a small problem in life, Yuanbao can integrate text

Jul 4, 2025

350

Uncovering the Secrets of Large Models! The 'Thinking Words' Behind Them Contain Astonishing Information

Recently, a research team from Renmin University, Shanghai Artificial Intelligence Laboratory, University College London, and Dalian University of Technology revealed an important finding in the reasoning process of large models: when the model is thinking, the 'thinking words' it uses actually reflect a significant increase in its internal information. This research result provides a new perspective for better understanding the reasoning mechanisms of artificial intelligence through methods of information theory. You may have seen large models output some language that seems human-like when answering questions, such as "Hmm..." or "Let me think...".

Jul 4, 2025

110

DeepMind introduces Crome: Enhancing the Alignment of Large Language Models with Human Feedback

In the field of artificial intelligence, reward models are a critical component for aligning large language models (LLMs) with human feedback, but existing models face the issue of "reward hacking." These models often focus on superficial features, such as the length or format of responses, rather than identifying genuine quality metrics, such as factual accuracy and relevance. The root cause lies in standard training objectives failing to distinguish between spurious associations and true causal drivers present in the training data. This failure leads to fragile reward models (RMs), which generate misaligned policies.

Jul 4, 2025

270

MiniMax Launches the World's First Open-Source Large-Scale AI Model, Technological Breakthrough Attracts Industry Attention

Jul 4, 2025

360

Kunlun Xiwang Once Again Open-Sources the Reward Model Skywork-Reward-V2

On July 4, 2025, Kunlun Xiwang continued to open-source the second-generation reward model Skywork-Reward-V2 series. This series includes 8 reward models based on different foundation models, with parameter sizes ranging from 600 million to 8 billion. Upon its release, it won all seven major reward model evaluation rankings, becoming a focus in the open-source reward model field. Reward models play a key role in the reinforcement learning from human feedback (RLHF) process. To build the next generation of reward models, Kunlun Xiwang has constructed a dataset containing 40 million

Jul 4, 2025

230

Google Veo 3 Video Generation Model Now Available to Pro/Ultra Subscribers, Will Add Photo-to-Video Function

Jul 4, 2025

320

AI News

AI Daily

AI Timeline

Al Hardware

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview

Tencent's Hunyuan Large Model: Ranks First in Domestic Large Models for 'Image to Text' Multimodal Understanding

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Open-Source Multimodal Large Model EarthMind: A Revolutionary Tool for Analyzing Earth Observation Data

Major Transformation in Smart Speakers! Xiaomi Pro Tops AI Large Models Become the New Standard

AI Daily: Tencent Yuanbao Upgrades for One-Phrase Image and Video Search; WeChat Pay MCP Launches; Google Unveils Veo 3 Globally

Google Launches New Veo 3 Video Generation Model Globally

Tencent Yuanbao Upgrades Again: One-Phrase Search, Images and Videos Instantly Displayed, Information Access More Intuitive!

Uncovering the Secrets of Large Models! The 'Thinking Words' Behind Them Contain Astonishing Information

DeepMind introduces Crome: Enhancing the Alignment of Large Language Models with Human Feedback

MiniMax Launches the World's First Open-Source Large-Scale AI Model, Technological Breakthrough Attracts Industry Attention

Kunlun Xiwang Once Again Open-Sources the Reward Model Skywork-Reward-V2

Google Veo 3 Video Generation Model Now Available to Pro/Ultra Subscribers, Will Add Photo-to-Video Function