Translated data: Zhipu AI has open-sourced CogAgent, a vision-language model with a parameter scale of 18 billion. CogAgent excels in GUI understanding and navigation, achieving state-of-the-art (SOTA) general performance on multiple benchmarks. The model supports high-resolution visual inputs and conversational question-answering, and can answer questions about any GUI screenshot. Additionally, CogAgent supports OCR-related tasks, with significant enhancements in capabilities through pre-training and fine-tuning. Users can upload screenshots for task inference and receive information on plans, next actions, and specific coordinates for operations.