Alibaba Cloud has open-sourced the visual-language model Qwen-VL, following the release of the general-purpose model Qwen-7B and the conversational model Qwen-7B-Chat in August. Qwen-VL supports both Chinese and English and can be used for various applications such as knowledge-based question answering, image caption generation, and visual question answering. Unlike other models, Qwen-VL can perform Chinese open-domain localization, accurately annotating detection boxes in images. Developed based on Qwen-7B, Qwen-VL introduces a visual encoder and supports image input. It has achieved the best results among equivalent models in multiple visual-language task tests. Qwen-VL has been open-sourced on platforms like ModelScope. The development of multi-modal large models is a significant direction, though it still faces certain technical challenges.
Tongyi Qianwen Can Now Process Images! Alibaba Cloud Open Sources Visual Language Model Qwen-VL, Supporting Multi-Modal Input of Text and Images

AI前线
This article is from AIbase Daily
Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.