On August 25, Alibaba Cloud introduced Qwen-VL, a large-scale visual language model that supports multiple languages including Chinese and English, and possesses the ability to jointly understand text and images. Based on Alibaba Cloud's previously open-sourced general-purpose language model Qwen-7B, Qwen-VL enhances its capabilities compared to other visual language models by adding features such as visual positioning and understanding of text within images. Qwen-VL has garnered over 3,400 stars on GitHub and has been downloaded more than 400,000 times. Visual language models are considered a significant evolution direction for general AI. The industry believes that models supporting multimodal inputs can enhance the understanding of the world and expand the range of applications. Through the open-sourcing of Qwen-VL, Alibaba Cloud is further advancing the development of general AI technology.