CogAgent is a GUI agent based on visual language models (VLM) that facilitates bilingual (Chinese and English) cloud interaction through screenshots and natural language. CogAgent has made significant advancements in GUI perception, inference prediction accuracy, operational space integrity, and task generalization. The model has been applied in ZhipuAI's GLM-PC product, with the aim of aiding researchers and developers in advancing the research and application of GUI agents based on visual language models.