Tongyi Qianwen Can Now Process Images! Alibaba Cloud Open Sources Visual Language Model Qwen-VL, Supporting Multi-Modal Input of Text and Images

AI前线

Published inAI News · 2 min read · Aug 25, 2023

Alibaba Cloud has open-sourced the visual-language model Qwen-VL, following the release of the general-purpose model Qwen-7B and the conversational model Qwen-7B-Chat in August. Qwen-VL supports both Chinese and English and can be used for various applications such as knowledge-based question answering, image caption generation, and visual question answering. Unlike other models, Qwen-VL can perform Chinese open-domain localization, accurately annotating detection boxes in images. Developed based on Qwen-7B, Qwen-VL introduces a visual encoder and supports image input. It has achieved the best results among equivalent models in multiple visual-language task tests. Qwen-VL has been open-sourced on platforms like ModelScope. The development of multi-modal large models is a significant direction, though it still faces certain technical challenges.

Alibaba Cloud Tongyi Qianwen Qwen-VL Open Source Visual Language Model

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

AI Daily: Tencent Launches New IMA 2.0; Microsoft Unveils a Series of Major Updates for Copilot; Alibaba's Quark AI Glasses Go on Pre-sale

[AI Daily] The Kimi k2 model from the company Dark Side of the Moon has received praise for its performance surpassing GPT-5, and the company is about to complete another round of tens of millions of dollars in funding, just months after the last funding round. The domestic AI large model field remains highly active, and developers can learn about the latest product updates through the platform.

Oct 24, 2025

190

China University of Science and Technology and ByteDance Launch MoGA Long Video Generation Model: One-Click Generation of Minute-Level Multi-Shot Short Films

The University of Science and Technology of China and ByteDance jointly launched an end-to-end long video generation model that can directly generate high-quality videos with a duration of minutes, 480p resolution, and 24fps, supporting multi-shot switching. The core innovation is the underlying algorithm MoGA, a novel attention mechanism designed to tackle the challenges of long video generation, marking a key breakthrough in domestic video generation technology.

Oct 24, 2025

260

Baidu PaddleOCR-VL Model Tops Global OCR Rankings, Continues to Lead Huggingface Trending List for Five Consecutive Days

On October 16, Baidu PaddlePaddle released the vision language model PaddleOCR-VL, achieving a score of 92.56 in the authoritative evaluation OmniDocBench V1.5 with 0.9B parameters, surpassing mainstream models such as DeepSeek-OCR and topping the global OCR rankings. As of October 21, the top three positions on the Huggingface trending list were all occupied by OCR models, with Baidu PaddlePaddle ranking first.

Oct 24, 2025

190

Directly on Mac Desktop! OpenAI Acquires Sky Team, ChatGPT to Be Deeply Integrated into macOS Workflow

OpenAI acquires the team behind the AI language application Sky on the Mac platform, aiming to accelerate the deep integration of ChatGPT with macOS workflows. This move will leverage Sky's contextual understanding, user adaptability, and cross-application collaboration capabilities, promoting the natural integration of AI into daily use and enhancing the Mac user experience.

Oct 24, 2025

140

Kimi k2 Performance Praised to Surpass GPT-5, Moonshot AI Secures Another Billion-Dollar Funding Round

Domestic AI company Moonshot AI is about to complete another round of billion-dollar funding, just a few months after its previous $300 million funding round. The capital market continues to show strong confidence in the company, which was once hailed as one of China's most anticipated large model companies.

Oct 24, 2025

290

Alibaba Qwen Launches Deep Research: Generate Reports, Web Pages, and Podcasts with One Click

Alibaba upgraded Qwen Deep Research, enabling one-click generation of cited reports, interactive webpages, and multi-speaker podcasts in Qwen Chat, completing the data-to-content workflow with minimal clicks.....

Oct 23, 2025

170

ByteDance Seed Team Announces the Launch of 3D Generation Large Model Seed 3D 1.0

The ByteDance Seed team recently announced the launch of the 3D generation large model Seed3D1.0, which is capable of generating high-quality, realistic 3D models from a single image in an end-to-end manner, including detailed geometry, realistic textures, and physically based rendering (PBR) materials. This innovative achievement is expected to provide powerful world simulation support for the development of embodied intelligence, addressing bottlenecks in physical interaction capabilities and content diversity in current technologies. During the development process, the Seed team collected and processed a large amount of high-quality 3D data, building a complete three

Oct 23, 2025

540

Chesky: Airbnb Temporarily Pauses Integration with ChatGPT; AI Customer Service Already Uses Qwen

Airbnb CEO Brian Chesky stated the company has not integrated ChatGPT due to immature connection tools and platform stability needs. Emphasizing reliance on identity verification, Airbnb will monitor ChatGPT's progress and may collaborate in the future.....

Oct 23, 2025

180

Hailuo 2.3 is Coming Soon: The Next-Generation AI Video Model That Exceeds Veo, with Enhanced Realism

MiniMax's Hailuo2.3 video generation model achieves breakthroughs in realism, precision, and style diversity, enhancing motion capture to solidify its industry leadership after surpassing Google Veo3.....

Oct 23, 2025

1.3k

Doubao Video Generation Model Seedance 1.0 Pro Launches First and Last Frame Capabilities

Volcano Engine has officially launched the first and last frame capabilities of Doubao-Seedance-1.0-pro, a video generation model from Doubao. This update marks an important step forward in controllability and consistency in AI video creation. With technical advantages such as subject consistency in complex scenes, physical plausibility of large movements, and intelligent video rhythm reasoning, Seedance 1.0 Pro will significantly enhance the main character tracking effect in generated videos, achieve precise narrative guidance, and produce more immersive and expressive video content.

Oct 23, 2025

250

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

Tongyi Qianwen Can Now Process Images! Alibaba Cloud Open Sources Visual Language Model Qwen-VL, Supporting Multi-Modal Input of Text and Images

AI前线

This article is from AIbase Daily

AI News Recommendations

AI Daily: Tencent Launches New IMA 2.0; Microsoft Unveils a Series of Major Updates for Copilot; Alibaba's Quark AI Glasses Go on Pre-sale

China University of Science and Technology and ByteDance Launch MoGA Long Video Generation Model: One-Click Generation of Minute-Level Multi-Shot Short Films

Baidu PaddleOCR-VL Model Tops Global OCR Rankings, Continues to Lead Huggingface Trending List for Five Consecutive Days

Directly on Mac Desktop! OpenAI Acquires Sky Team, ChatGPT to Be Deeply Integrated into macOS Workflow

Kimi k2 Performance Praised to Surpass GPT-5, Moonshot AI Secures Another Billion-Dollar Funding Round

Alibaba Qwen Launches Deep Research: Generate Reports, Web Pages, and Podcasts with One Click

ByteDance Seed Team Announces the Launch of 3D Generation Large Model Seed 3D 1.0

Chesky: Airbnb Temporarily Pauses Integration with ChatGPT; AI Customer Service Already Uses Qwen

Hailuo 2.3 is Coming Soon: The Next-Generation AI Video Model That Exceeds Veo, with Enhanced Realism

Doubao Video Generation Model Seedance 1.0 Pro Launches First and Last Frame Capabilities

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services​

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

Tongyi Qianwen Can Now Process Images! Alibaba Cloud Open Sources Visual Language Model Qwen-VL, Supporting Multi-Modal Input of Text and Images

AI前线

This article is from AIbase Daily

AI News Recommendations

AI Daily: Tencent Launches New IMA 2.0; Microsoft Unveils a Series of Major Updates for Copilot; Alibaba's Quark AI Glasses Go on Pre-sale

China University of Science and Technology and ByteDance Launch MoGA Long Video Generation Model: One-Click Generation of Minute-Level Multi-Shot Short Films

Baidu PaddleOCR-VL Model Tops Global OCR Rankings, Continues to Lead Huggingface Trending List for Five Consecutive Days

Directly on Mac Desktop! OpenAI Acquires Sky Team, ChatGPT to Be Deeply Integrated into macOS Workflow

Kimi k2 Performance Praised to Surpass GPT-5, Moonshot AI Secures Another Billion-Dollar Funding Round

Alibaba Qwen Launches Deep Research: Generate Reports, Web Pages, and Podcasts with One Click

ByteDance Seed Team Announces the Launch of 3D Generation Large Model Seed 3D 1.0

Chesky: Airbnb Temporarily Pauses Integration with ChatGPT; AI Customer Service Already Uses Qwen

Hailuo 2.3 is Coming Soon: The Next-Generation AI Video Model That Exceeds Veo, with Enhanced Realism

Doubao Video Generation Model Seedance 1.0 Pro Launches First and Last Frame Capabilities

GEO Services