Ali Releases Multimodal Inference Model QVQ-72B! Enhanced Visual and Language Capabilities, Solving Complex Problems with Ease

AIbase基地

Published inAI News · 4 min read · Dec 25, 2024

751

Alibaba recently launched a new multimodal reasoning model named QVQ-72B. This model is built on Qwen2-VL-72B and integrates powerful language and visual capabilities, enabling it to handle more complex reasoning and analytical tasks. This marks a new breakthrough for Alibaba in the field of multimodal AI.

QVQ-72B has shown significant improvements in visual reasoning, mathematics, and scientific problem-solving, especially in multi-step reasoning tasks. This means that the model can not only understand textual information but also interpret visual information, solving complex problems through multi-step reasoning, which is challenging for traditional AI models.

A major highlight of this model is its ability to derive causal relationships by combining textual and visual information in physics problems. For example, it can infer the causal relationships of events based on images of physical scenes and relevant textual descriptions, demonstrating a deeper level of understanding.

In mathematical reasoning tasks (such as algebra and calculus), QVQ-72B significantly reduces the error rate through step-by-step reasoning. This indicates that the model can not only perform simple calculations but also engage in complex mathematical reasoning, providing clear problem-solving steps and offering new tools for tackling complicated mathematical issues.

Moreover, QVQ-72B has high accuracy and efficiency in extracting key information from technical reports and complex chart analyses. It can quickly and accurately pull critical information from intricate documents and charts, providing powerful support tools for researchers, analysts, and other professionals.

In terms of image recognition, QVQ-72B can accurately identify details within images, such as object positions, colors, spatial relationships, and complex scenarios. This means the model can be applied to a broader range of contexts, such as intelligent surveillance and autonomous driving.

In summary, Alibaba's QVQ-72B multimodal reasoning model, with its strong visual, linguistic, and reasoning capabilities, offers new ideas and tools for solving complex problems. Its emergence will undoubtedly drive the application of artificial intelligence across various fields, injecting new momentum into the intelligent upgrade of industries.

Try it online: https://huggingface.co/spaces/Qwen/QVQ-72B-preview

For more details: https://qwenlm.github.io/blog/qvq-72b-preview/

QVQ-72B Qwen2-VL-72B Multi-ModelAI RecommendationModel

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Kunlun Tech Launches TianGong 3.1: Introduces New Canvas Design and Multi-Agent Parallel Workflow

Kunlun Tech releases Tiangong 3.1 globally. Core upgrades include Skywork Design canvas and Dynamic Workflows for multi-agent orchestration, advancing AI from demo to high-completion product. One month after Tiangong 3.0, its super agent achieved 3x revenue growth via sticky website scenarios, with deep users averaging 40+ interaction rounds per project.....

Jun 17, 2026

690

JD.com Launches A2P2 Protocol: The First Smart Agent Autonomous Payment Standard, Dividing into Six Levels from L0 to L5

JD.com released the country's first smart agent autonomous payment protocol, A2P2, which for the first time categorizes AI payment capabilities into six levels from L0 to L5. The protocol focuses on the intermediate stages of L3 and L4, achieving a progressive transition from user confirmation to full autonomous decision-making by the smart agent, providing a framework for standardization of AI payments.

Jun 17, 2026

240

Malaysia's AI Chat System Respond.io Secures $62.5 Million in Series B Funding, ARR Reaches $35 Million

Malaysia-based conversation management platform Respond.io secures $62.5M Series B led by Camber Partners. Funds for team expansion, growth, and overseas M&A in North America and Europe. Since its 2022 Series A, it has grown rapidly, with ARR soaring to $35M.....

Jun 17, 2026

180

Google Releases Android 17 and Wear OS 7: Full Integration of Gemini Omni and Lyria3 Multi-Modal Models

On June 16, Google released Android 17 final, Wear OS 7, and Pixel Drop, injecting new AI infrastructure into Pixel devices and deepening the on-device AI ecosystem. The core strategy leverages Gemini Omni multimodal model to fully implement multimodal capabilities and reconstruct underlying system interactions.....

Jun 17, 2026

310

New Model for AI Team Shopping: OpenRouter Launches Fusion API Focusing on Performance and Cost-Effectiveness Optimization

OpenRouter launches Fusion API, using a multi-model collaboration mechanism to process user queries simultaneously, balancing high performance and low cost of AI large models. It sends requests to multiple models in parallel, aggregates results to optimize response quality and cost, offering developers an efficient and economical solution.....

Jun 16, 2026

290

New Engine for Robot Evolution: Alibaba Launches Qwen-Robot Series of Embodied Intelligence Large Models

On June 16, Alibaba released the Qwen-Robot series, embodied AI models designed to tackle key challenges in robots understanding natural language, perceiving 3D environments, and mastering physical laws. The series includes three core models that can work independently or collaboratively, providing a universal technical foundation to advance embodied intelligence from labs to real-world complex environments.....

Jun 16, 2026

310

Alibaba Launches Qwen-Robot Series Embodied Large Models: Three Models Collaborate to Solve the Pain Points of Heterogeneous Robot Adaptation

Alibaba released the Qwen-Robot series of embodied AI models, including operation, mobility, and world models for robot control, navigation, and physical reasoning. Qwen-RobotManip uses 80-dimensional design to address VLA model limitations in hardware and scene transfer, marking a deeper push into embodied AI foundation models.....

Jun 16, 2026

1.1k

Winning Sales Across All Channels! Qwen AI Glasses Spark a New Trend in Smart Wearables

From January to May 2026, Qianwen AI Glasses achieved the top sales in China's AI glasses market across all channels, driven by strong product performance and a leading market share. Retail data from March to April further confirmed its competitive edge, highlighting the potential for new product breakthroughs.....

Jun 15, 2026

290

Doubao Launches Task Mode: Supports Multi-turn Search and Automated PPT Generation

ByteDance's AI assistant, Doubao, has officially launched the Task Mode, upgrading from single-text interaction to an AI agent capable of handling complex workflows. This mode supports multi-turn search, deep reasoning, automatic browser operations, and multimodal content generation. It can proactively break down users' macro goals into sub-tasks and execute them through methods such as online searching, significantly improving automation and task processing efficiency.

Jun 15, 2026

400

AI Circle Twist: Brazilian Upstart Model Rio 3.5 Exposed as a Shell of Domestic Large Model

Rio3.5397B, an open-source model launched by an IT company under the Rio de Janeiro city government, faces originality concerns. Technical analysis by the Nex-AGI team indicates that approximately 60% of its core code and logical architecture show signs of 'stitching,' sparking public controversy.....

Jun 15, 2026

320

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

GEO Brand Visibility

AI Visibility Audit

AI Search Visibility Checker

GEO Ranking Monitor

AI Conversation Insight

GEO Promotion Link Detection

GEO Ranking Optimization System

GEO Ranking Optimization

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

LLM API Hub

AI Models Finder

Model Providers

LLM Leaderboard

LLM API Proxy Checker

Compare LLMs

LLM Cost Calculator

LLM Arena

AI Model Compatibility Checker

AI Deployment Calculator

Ali Releases Multimodal Inference Model QVQ-72B! Enhanced Visual and Language Capabilities, Solving Complex Problems with Ease

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Kunlun Tech Launches TianGong 3.1: Introduces New Canvas Design and Multi-Agent Parallel Workflow

JD.com Launches A2P2 Protocol: The First Smart Agent Autonomous Payment Standard, Dividing into Six Levels from L0 to L5

Malaysia's AI Chat System Respond.io Secures $62.5 Million in Series B Funding, ARR Reaches $35 Million

Google Releases Android 17 and Wear OS 7: Full Integration of Gemini Omni and Lyria3 Multi-Modal Models

New Model for AI Team Shopping: OpenRouter Launches Fusion API Focusing on Performance and Cost-Effectiveness Optimization

New Engine for Robot Evolution: Alibaba Launches Qwen-Robot Series of Embodied Intelligence Large Models

Alibaba Launches Qwen-Robot Series Embodied Large Models: Three Models Collaborate to Solve the Pain Points of Heterogeneous Robot Adaptation

Winning Sales Across All Channels! Qwen AI Glasses Spark a New Trend in Smart Wearables

Doubao Launches Task Mode: Supports Multi-turn Search and Automated PPT Generation

AI Circle Twist: Brazilian Upstart Model Rio 3.5 Exposed as a Shell of Domestic Large Model

AI News Recommendations

Kunlun Tech Launches TianGong 3.1: Introduces New Canvas Design and Multi-Agent Parallel Workflow

JD.com Launches A2P2 Protocol: The First Smart Agent Autonomous Payment Standard, Dividing into Six Levels from L0 to L5

Malaysia's AI Chat System Respond.io Secures $62.5 Million in Series B Funding, ARR Reaches $35 Million

Google Releases Android 17 and Wear OS 7: Full Integration of Gemini Omni and Lyria3 Multi-Modal Models

New Model for AI Team Shopping: OpenRouter Launches Fusion API Focusing on Performance and Cost-Effectiveness Optimization

New Engine for Robot Evolution: Alibaba Launches Qwen-Robot Series of Embodied Intelligence Large Models

Alibaba Launches Qwen-Robot Series Embodied Large Models: Three Models Collaborate to Solve the Pain Points of Heterogeneous Robot Adaptation

Winning Sales Across All Channels! Qwen AI Glasses Spark a New Trend in Smart Wearables

Doubao Launches Task Mode: Supports Multi-turn Search and Automated PPT Generation

AI Circle Twist: Brazilian Upstart Model Rio 3.5 Exposed as a Shell of Domestic Large Model