Alibaba Tongyi Qwen Open Source Visual Reasoning Model QVQ-72B-Preview

AIbase基地

Published inAI News · 3 min read · Dec 25, 2024

805

The Qwen team recently announced the open-source release of their latest multimodal reasoning model, QVQ, marking a significant step forward in artificial intelligence's capabilities in visual understanding and complex problem-solving. This model is built on Qwen2-VL-72B and aims to enhance AI's reasoning abilities by integrating language and visual information. In the MMMU evaluation, QVQ achieved a high score of 70.3, showing significant performance improvements over Qwen2-VL-72B-Instruct in various math-related benchmark tests.

QVQ has demonstrated particular advantages in visual reasoning tasks, especially in areas requiring complex analytical thinking. While QVQ-72B-Preview has performed excellently, the team also pointed out some limitations of the model, including issues with language mixing and code-switching, the potential to fall into circular logic patterns, safety and ethical considerations, as well as performance and benchmark limitations. The team emphasized that although the model has improved in visual reasoning, it cannot fully replace the capabilities of Qwen2-VL-72B; during multi-step visual reasoning processes, the model may gradually lose focus on the image content, leading to hallucinations.

WeChat Screenshot_20241225075810.png

The Qwen team evaluated QVQ-72B-Preview on four datasets, including MMMU, MathVista, MathVision, and OlympiadBench, which are designed to assess the model's comprehensive understanding and reasoning abilities related to visual information. QVQ-72B-Preview performed exceptionally well in these benchmark tests, effectively narrowing the gap with leading models.

To further demonstrate the application of the QVQ model in visual reasoning tasks, the Qwen team provided several examples and shared a link to their technical blog. Additionally, the team offered code examples for model inference and guidance on how to directly call the QVQ-72B-Preview model using the Magic API-Inference. The Magic platform's API-Inference supports the QVQ-72B-Preview model, allowing users to utilize it directly through API calls.

Model Link:

https://modelscope.cn/models/Qwen/QVQ-72B-Preview

Experience Link:

https://modelscope.cn/studios/Qwen/QVQ-72B-preview

Chinese Blog:

https://qwenlm.github.io/zh/blog/qvq-72b-preview

Qwen Chat Desktop Client Released, Supporting One-Click Activation and Invocation of MCP

Recently, Qwen Chat received a major update and made a new appearance, offering users a more intuitive interaction experience and a wider range of functional services, aiming to become the most reliable AI partner for everyone. The updated Qwen Chat has achieved significant improvements in interaction design, allowing users to start a conversation directly on the home page without complicated operations, making chatting more convenient. Its functions have also been significantly expanded, supporting daily questions, meeting users' various information query needs, and assisting in content creation, whether it's writing articles or generating text.

Moonshot AI Releases and Opensources Kimi K2 Model, Strong in Code and Agentic Tasks

Moonshot AI officially released its latest creation - the Kimi K2 model, and simultaneously announced its open source. This foundation model based on the MoE architecture has gained widespread attention in the AI field since its release, thanks to its strong coding capabilities and excellent general Agent task processing abilities. The Kimi K2 model has a total of 1T parameters, with 32B activated parameters. It has achieved top performance among open-source models in a series of benchmark performance tests such as SWE Bench Verified, Tau2, and AceBench.

Tencent Hunyuan-A13B Model API Launches

Recently, Tencent Cloud officially launched the API service for the Tencent Hunyuan A13B model on its official website. The input price is set at 0.5 yuan per million Tokens, and the output price is 2 yuan per million Tokens, which has quickly sparked enthusiastic discussions in the developer community. As the first 13B-level MoE (Mixture of Experts) open-source hybrid inference model in the industry, Hunyuan-A13B features a total of 80B parameters and only 13B activated parameters, achieving performance comparable to leading open-source models of the same architecture, while also demonstrating efficient reasoning capabilities.

AI Daily: Zhipu Launches PPT Generation Function AI Slides; Ke Ling AI Releases Ketur 2.1 Model

1. Zhipu launches free AI Slides for PPT generation. 2. Keling AI introduces KeTu 2.1 with 180 styles. 3. NVIDIA's DiffusionRenderer enables 3D scene editing. 4. Modao AI offers 30-second prototype generation. 5. Higgsfield creates avatars from 10 photos. 6. Google open-sources GenAI Processors. 7. Google Veo3 adds image-to-video. 8. Mistral AI releases Devstral2507 for code generation.....

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

Alibaba Tongyi Qwen Open Source Visual Reasoning Model QVQ-72B-Preview

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Qwen Chat Desktop Client Released, Supporting One-Click Activation and Invocation of MCP

Moonshot AI Releases and Opensources Kimi K2 Model, Strong in Code and Agentic Tasks

Tencent Hunyuan-A13B Model API Launches

AI Daily: Zhipu Launches PPT Generation Function AI Slides; Ke Ling AI Releases Ketur 2.1 Model

Microsoft BioEmu Model Dramatically Shortens Protein Simulation Time

City Commercial Banks Are Launching a Trend of Large Model Bidding, with Million-Level Investments Becoming a New Industry Opportunity!

Kling AI Releases KTu 2.1 Model: Significant Improvement in Image Generation Capabilities, Supports 180 Styles

Keling AI Launches Keltu 2.1 Model, Will Be Free for All Members for 7 Days

vivo New Multimodal Model Launches! AI's Ability to Understand GUI Interfaces is Upgraded Again!

Meta Hires Apple AI Model Head for Over 200 Million USD