Alibaba Tongyi Qianwen Team Launches Qwen2-VL Model to Support Real-time Analysis of Dynamic Videos

AIbase基地

Published inAI News · 3 min read · Aug 30, 2024

740

The Tongyi Qianwen team at Alibaba DAMO Academy announced a significant update to their Qwen2-VL model on August 30, 2024. The Qwen2-VL model has seen notable improvements in image understanding, video processing, and multilingual support, setting new benchmarks for key performance indicators.

New features of the Qwen2-VL model include enhanced image understanding capabilities, allowing for more accurate interpretation of visual information; advanced video understanding, enabling real-time analysis of dynamic video content; integrated visual agent functionality, transforming the model into a powerful agent capable of complex reasoning and decision-making; and expanded multilingual support, making it more accessible and effective in different language environments.

WeChat Screenshot_20240830075330.png

In terms of technical architecture, Qwen2-VL has achieved dynamic resolution support, capable of processing images of any resolution without needing to divide them into blocks, ensuring consistency between model input and inherent image information. Additionally, the innovation of Multimodal Rotary Position Embedding (M-ROPE) allows the model to simultaneously capture and integrate 1D text, 2D visual, and 3D video positional information.

The Qwen2-VL-7B model successfully retains support for image, multi-image, and video inputs at the 7B scale, and performs exceptionally well in document understanding tasks and multi-language text understanding of images.

Concurrently, the team has also released a 2B model optimized for mobile deployment, which, despite having only 2B parameters, excels in image, video, and multilingual understanding.

Model Links:

Qwen2-VL-2B-Instruct: https://www.modelscope.cn/models/qwen/Qwen2-VL-2B-Instruct

Qwen2-VL-7B-Instruct: https://www.modelscope.cn/models/qwen/Qwen2-VL-7B-Instruct

Qwen2-VL Alibaba Image Understanding Multilingual Support

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

AI Daily: Luma AI launches Uni-1 image model; Alibaba DAMO Academy releases Xuantie C950; Meituan Longma releases open-source mathematical theorem proving model

Welcome to the 【AI Daily】 segment! Here is your daily guide to exploring the world of artificial intelligence. Every day, we present you with the latest content in the AI field, focusing on developers, helping you understand technical trends and innovative AI product applications. Click to learn more about new AI products: https://app.aibase.com/zh1. Luma AI launched the Uni-1 image generation model, which uses a autoregressive architecture to generate text and pixels simultaneously. Luma AI released the Uni-1 image generation model based on an autoregressive architecture, supporting text

Mar 24, 2026

150

Bidirectional Audio-Visual Separation: Tongyi Lab Releases PrismAudio to Let AI Understand Videos and Revoice Them

Tongyi Lab of Alibaba has launched the PrismAudio framework, which solves the issue of audio-video desynchronization in AI video generation. The technology introduces a 'chain-of-thought' mechanism, analyzing video content first and then generating matching sound effects to enhance immersion. The research has been accepted by ICLR 2026.

Mar 24, 2026

170

Breaks World Record! Alibaba DAMO Academy Releases Xuantie C950: CPU Successfully Supports Billion-Parameter Large Models Natively for the First Time

Alibaba DAMO Academy has released the high-performance RISC-V CPU Xuantie C950, with a single-core score exceeding 70, breaking the global performance record for RISC-V. It achieves native support for billion-parameter large models for the first time, marking a significant improvement in the position of the RISC-V architecture in the computing market.

Mar 24, 2026

160

ChatGPT Transforms into Cloud Storage: Paid Users Gain Access to 'Library' Function for Cross-Conversation Document and Image Usage

OpenAI introduces the 'Library' feature for ChatGPT paid users, breaking conversation limitations and enabling automatic file archiving and cross-conversation usage, transforming it from an instant messaging tool into an intelligent cloud storage with deep storage and management capabilities.

Mar 24, 2026

130

Alibaba DAMO Academy Launches Xuantie C950: Single-Core Performance Exceeds 70 Points, Setting a New Global RISC-V Record

Alibaba DAMO Academy launched the new generation RISC-V processor Xuantie C950, which uses a 5nm process, a clock speed of 3.2GHz, and for the first time breaks through 70 points in single-core performance, setting a new global record. Compared to its predecessor, the overall performance has increased by more than three times, and the memory bandwidth has increased by more than four times, and it also supports confidential computing security isolation natively.

Mar 24, 2026

160

Luma AI releases Uni-1 image generation model with an autoregressive architecture that generates text and pixels simultaneously

Luma Labs released the image generation model Uni-1 on March 23, which is the company's first public model based on the Unified Intelligence architecture. Free trial is now available on the official website, API pricing has been announced, and enterprise access channels will be gradually launched.

Mar 24, 2026

240

Qwen AI Large Model Goes on the Car! The Launch of Zhiyu A10 is Scheduled for March 26: Chen Duling Attends as an Advocate

Zhiyu A10 is scheduled to launch on March 26. The new car focuses on intelligence, and will be integrated with Alibaba Qwen AI large model to create a 'Super Xiao Ling' intelligent brain. The spokesperson Chen Duling will attend the launch event to reveal the benefits.

Mar 24, 2026

130

AI Daily: MiniMax Launches Multimodal Subscription Plan; Qwen Launches Ride-Hailing Skill; Tencent Launches WeChat ClawBot Plugin

Welcome to the [AI Daily] segment! This is your guide to exploring the world of artificial intelligence every day. Every day, we bring you the latest in the AI field, focusing on developers, helping you understand technological trends and innovative AI product applications. Discover new AI products: https://app.aibase.com/zh1. Tencent launches the WeChat ClawBot plugin, allowing the personal AI assistant 'Lobster' to directly connect to WeChat chats. Tencent launches the WeChat ClawBot plugin, enabling the personal AI assistant 'Lobster' to directly connect to WeChat chats.

Mar 23, 2026

240

Just one sentence to call a taxi! Alibaba Qwen launches AI assistance: Local life traffic is about to be reshuffled

Alibaba's Qwen AI assistant has launched the 'AI Taxi' feature, allowing users to request a ride and plan their journey through natural language conversation. This move is not only a technological upgrade but also a key strategy for Alibaba to reshape the local life traffic entrance with AI-native services, showcasing its accelerated strategy toward becoming a comprehensive assistant.

Mar 23, 2026

250

World's First All-Modal Subscription! MiniMax Launches Token Plan: Video, Voice, and Image Generation in One Package

Domestic AI company MiniMax upgrades its programming subscription plan to become the world's first all-modal model subscription service, covering video, voice, music, and image generation, aiming to provide comprehensive services and reduce costs.

Mar 23, 2026

320

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

LLM API Hub

AI Models Finder

Model Providers

LLM Leaderboard

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Brand Visibility

AI Brand Monitoring Tool

AI Search Visibility Checker

GEO Promotion Link Detection

GEO Ranking Optimization System

GEO Services​

AI Model Compatibility Checker

AI Deployment Calculator

Alibaba Tongyi Qianwen Team Launches Qwen2-VL Model to Support Real-time Analysis of Dynamic Videos

AIbase基地

This article is from AIbase Daily

AI News Recommendations

AI Daily: Luma AI launches Uni-1 image model; Alibaba DAMO Academy releases Xuantie C950; Meituan Longma releases open-source mathematical theorem proving model

Bidirectional Audio-Visual Separation: Tongyi Lab Releases PrismAudio to Let AI Understand Videos and Revoice Them

Breaks World Record! Alibaba DAMO Academy Releases Xuantie C950: CPU Successfully Supports Billion-Parameter Large Models Natively for the First Time

ChatGPT Transforms into Cloud Storage: Paid Users Gain Access to 'Library' Function for Cross-Conversation Document and Image Usage

Alibaba DAMO Academy Launches Xuantie C950: Single-Core Performance Exceeds 70 Points, Setting a New Global RISC-V Record

Luma AI releases Uni-1 image generation model with an autoregressive architecture that generates text and pixels simultaneously

Qwen AI Large Model Goes on the Car! The Launch of Zhiyu A10 is Scheduled for March 26: Chen Duling Attends as an Advocate

AI Daily: MiniMax Launches Multimodal Subscription Plan; Qwen Launches Ride-Hailing Skill; Tencent Launches WeChat ClawBot Plugin

Just one sentence to call a taxi! Alibaba Qwen launches AI assistance: Local life traffic is about to be reshuffled

World's First All-Modal Subscription! MiniMax Launches Token Plan: Video, Voice, and Image Generation in One Package

AI News Recommendations

AI Daily: Luma AI launches Uni-1 image model; Alibaba DAMO Academy releases Xuantie C950; Meituan Longma releases open-source mathematical theorem proving model

Bidirectional Audio-Visual Separation: Tongyi Lab Releases PrismAudio to Let AI Understand Videos and Revoice Them

Breaks World Record! Alibaba DAMO Academy Releases Xuantie C950: CPU Successfully Supports Billion-Parameter Large Models Natively for the First Time

ChatGPT Transforms into Cloud Storage: Paid Users Gain Access to 'Library' Function for Cross-Conversation Document and Image Usage

Alibaba DAMO Academy Launches Xuantie C950: Single-Core Performance Exceeds 70 Points, Setting a New Global RISC-V Record

Luma AI releases Uni-1 image generation model with an autoregressive architecture that generates text and pixels simultaneously

Qwen AI Large Model Goes on the Car! The Launch of Zhiyu A10 is Scheduled for March 26: Chen Duling Attends as an Advocate

AI Daily: MiniMax Launches Multimodal Subscription Plan; Qwen Launches Ride-Hailing Skill; Tencent Launches WeChat ClawBot Plugin

Just one sentence to call a taxi! Alibaba Qwen launches AI assistance: Local life traffic is about to be reshuffled

World's First All-Modal Subscription! MiniMax Launches Token Plan: Video, Voice, and Image Generation in One Package

GEO Services