CogAgent

An open-source end-to-end visual language model (VLM) based GUI agent

CommonProductProgrammingVisual Language ModelGUI Agent
CogAgent is a GUI agent based on visual language models (VLM) that facilitates bilingual (Chinese and English) cloud interaction through screenshots and natural language. CogAgent has made significant advancements in GUI perception, inference prediction accuracy, operational space integrity, and task generalization. The model has been applied in ZhipuAI's GLM-PC product, with the aim of aiding researchers and developers in advancing the research and application of GUI agents based on visual language models.
Visit

CogAgent Visit Over Time

Monthly Visits

494758773

Bounce Rate

37.69%

Page per Visit

5.7

Visit Duration

00:06:29

CogAgent Visit Trend

CogAgent Visit Geography

CogAgent Traffic Sources

CogAgent Alternatives