Zhipu AI Open Source Agent Task Model CogAgent-9B: Predicting Actions Through Screenshots

AIbase基地

Published inAI News · 3 min read · Dec 27, 2024

325

The GLM-PC foundational model CogAgent-9B from Zhipu AI has now been open-sourced to promote the development of the large model Agent ecosystem. CogAgent-9B is a specialized Agent task model trained based on GLM-4V-9B, capable of predicting the next GUI operation based solely on a screenshot input, combined with historical actions, according to any task specified by the user. The versatility of this model allows it to be widely applied in various GUI interaction scenarios, including personal computers, mobile phones, and in-car devices.

WeChat Screenshot_20241227091131.png

Compared to the first version of the CogAgent model open-sourced in December 2023, CogAgent-9B-20241220 has shown significant improvements in GUI perception, inference prediction accuracy, action space completeness, task universality, and generalization. It supports bilingual interactions in both Chinese and English via screenshots. The input for CogAgent includes only the user's natural language instructions, a record of executed historical actions, and GUI screenshots, without the need for any layout information or additional element tags in text form. The output includes the thought process, a natural language description of the next action, a structured description of the next action, and a sensitivity assessment of the next action.

In performance testing, CogAgent-9B-20241220 achieved leading results across multiple datasets, showcasing its advantages in GUI localization, single-step operations, Chinese step-wise rankings, and multi-step operations. This initiative by Zhipu Technology not only advances the development of large model technology but also provides new tools and possibilities for visually impaired IT professionals.

Code:
https://github.com/THUDM/CogAgent
Model:
Huggingface: https://huggingface.co/THUDM/cogagent-9b-20241220
Modao Community: https://modelscope.cn/models/ZhipuAI/cogagent-9b-20241220

Zhipu AI GLM-PC CogAgent-9B GUI Interaction

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Xiaomi AI Team Collaborates with Peking University to Publish New Paper, 'Talented Girl' Hired by Lei Jun Participates in Research

Xiaomi and Peking University co-published a paper on arXiv. Corresponding author Luo Fuli, noted for Lei Jun's high-salary recruitment, is affiliated with PKU's Computational Linguistics Institute, not Xiaomi's model team.....

Oct 17, 2025

130

Tsinghua Changgeng Hospital Collaborates with Beijing Electronic Information and Intelligence to Develop China's First Pharmaceutical Large Model: Focused on Medication Safety Evaluation for Special Populations

Beijing Tsinghua Changgeng Hospital has collaborated with Beijing Electronic Information and Intelligence to develop China's first pharmaceutical-specific large model, using AI to optimize pharmaceutical processes, improve the efficiency and accuracy of medication safety evaluation for special populations such as the elderly, children, and pregnant women, and address the challenges of rapid updates in drug information and complex individual differences.

Oct 17, 2025

100

AI Music Creation Becomes a New Side Job for Programmers: Single Track Plays Over 2 Million Times, Copyright Revenue Reaches Several Ten Thousand Yuan

In 2025, the popularity of AI music creation tools is changing the industry landscape. In January, a player from Genshin Impact used Suno to create a song with 6.4 million plays, sparking discussions about the capabilities of AI creation. Programmers have become an active group, and in March, Yapie completed a theme song using multiple tools within a few hours.

Oct 17, 2025

120

A Single Sentence Can Change AI's Creative Potential: Study Finds Simple Prompts Can Significantly Improve Output Diversity

A team from Stanford and other universities proposed the 'language sampling' method, which improves the creative diversity of generative AI by asking the model to generate five responses and their probabilities in the prompt. This method applies to both language and image models, and can stimulate richer creative outputs.

Oct 17, 2025

120

Chongqing Strengthens Regulation, Removes Over 10 Non-Compliant AI Products to Ensure Technological Safety

Chongqing removes 10+ non-compliant AI products like 'AI prescriptions' in a crackdown on AI misuse, highlighting the need for regulation amid risks of misinformation and data security.....

Oct 17, 2025

140

AI Daily: Google Gemini 3.0 Pro is being rolled out on a limited scale; Aishike Technology completes B+ round financing of 100 million yuan; Baidu releases document parsing model PaddleOCR-VL

Google Gemini 3.0 Pro begins limited rollout, enhancing reasoning and multimodal capabilities, with full release expected by month-end. DeepMind team is gradually updating users to boost AI performance.....

Oct 17, 2025

220

AI Daily: ByteDance Launches DouBao Large Model 1.6; AiShi Technology Completes 100 Million RMB B+ Funding Round; Baidu Releases Document Parsing Model PaddleOCR-VL

ByteDance launches Doubao 1.6, the first domestic model with adjustable thinking depth, balancing efficiency and quality, plus a lightweight version for enterprises.....

Oct 17, 2025

AI Video Company Ai Shi Technology Completes 100 Million RMB B+ Round Financing: ARR Exceeds 40 Million USD, Users Exceed 100 Million

Aishitech raised 100M yuan in Series B+ funding. With 10M registered and 16M monthly active users, its annual recurring revenue exceeds $40M, growing tenfold since commercialization began in Nov 2024.....

Oct 17, 2025

120

Yingmu Technology Launches New Generation AI Glasses and Expands to 2000+ Experience Stores Nationwide

INMO releases new AI smart glasses at Chengdu conference, partners with LOHO and Asia Optical to establish 2,000+ offline stores. Pop-ups debut in December across four cities. CEO aims to integrate AI into daily life via natural wearables.....

Oct 17, 2025

100

Wikipedia Worries About Sustainability Due to Decline in Traffic from AI Chatbots

Wikimedia Foundation reports AI chatbots and search engines reduce human traffic to Wikipedia, raising sustainability concerns. It urges AI tools and social platforms to encourage direct user visits when using Wikipedia content.....

Oct 17, 2025

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

Zhipu AI Open Source Agent Task Model CogAgent-9B: Predicting Actions Through Screenshots

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Xiaomi AI Team Collaborates with Peking University to Publish New Paper, 'Talented Girl' Hired by Lei Jun Participates in Research

Tsinghua Changgeng Hospital Collaborates with Beijing Electronic Information and Intelligence to Develop China's First Pharmaceutical Large Model: Focused on Medication Safety Evaluation for Special Populations

AI Music Creation Becomes a New Side Job for Programmers: Single Track Plays Over 2 Million Times, Copyright Revenue Reaches Several Ten Thousand Yuan

A Single Sentence Can Change AI's Creative Potential: Study Finds Simple Prompts Can Significantly Improve Output Diversity

Chongqing Strengthens Regulation, Removes Over 10 Non-Compliant AI Products to Ensure Technological Safety

AI Daily: Google Gemini 3.0 Pro is being rolled out on a limited scale; Aishike Technology completes B+ round financing of 100 million yuan; Baidu releases document parsing model PaddleOCR-VL

AI Daily: ByteDance Launches DouBao Large Model 1.6; AiShi Technology Completes 100 Million RMB B+ Funding Round; Baidu Releases Document Parsing Model PaddleOCR-VL

AI Video Company Ai Shi Technology Completes 100 Million RMB B+ Round Financing: ARR Exceeds 40 Million USD, Users Exceed 100 Million

Yingmu Technology Launches New Generation AI Glasses and Expands to 2000+ Experience Stores Nationwide

Wikipedia Worries About Sustainability Due to Decline in Traffic from AI Chatbots

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services​

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

Zhipu AI Open Source Agent Task Model CogAgent-9B: Predicting Actions Through Screenshots

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Xiaomi AI Team Collaborates with Peking University to Publish New Paper, 'Talented Girl' Hired by Lei Jun Participates in Research

Tsinghua Changgeng Hospital Collaborates with Beijing Electronic Information and Intelligence to Develop China's First Pharmaceutical Large Model: Focused on Medication Safety Evaluation for Special Populations

AI Music Creation Becomes a New Side Job for Programmers: Single Track Plays Over 2 Million Times, Copyright Revenue Reaches Several Ten Thousand Yuan

A Single Sentence Can Change AI's Creative Potential: Study Finds Simple Prompts Can Significantly Improve Output Diversity

Chongqing Strengthens Regulation, Removes Over 10 Non-Compliant AI Products to Ensure Technological Safety

AI Daily: Google Gemini 3.0 Pro is being rolled out on a limited scale; Aishike Technology completes B+ round financing of 100 million yuan; Baidu releases document parsing model PaddleOCR-VL

AI Daily: ByteDance Launches DouBao Large Model 1.6; AiShi Technology Completes 100 Million RMB B+ Funding Round; Baidu Releases Document Parsing Model PaddleOCR-VL

AI Video Company Ai Shi Technology Completes 100 Million RMB B+ Round Financing: ARR Exceeds 40 Million USD, Users Exceed 100 Million

Yingmu Technology Launches New Generation AI Glasses and Expands to 2000+ Experience Stores Nationwide

Wikipedia Worries About Sustainability Due to Decline in Traffic from AI Chatbots

GEO Services