Zhipu AI Launches AutoGLM Agent: Simulate Human Phone Operations with Simple Commands

AIbase基地

Published inAI News · 4 min read · Oct 26, 2024

2.6k

The Zhipu Technology team has recently launched a new product based on the research achievements of the GLM technology team – AutoGLM, an intelligent agent capable of simulating human operations on mobile phones to perform various tasks. The introduction of AutoGLM marks a significant advancement in artificial intelligence within the "Phone Use" domain, making AI applications more integrated into people's daily lives.

WeChat Screenshot_20241026150533.png

AutoGLM is capable of executing multiple tasks such as liking and commenting on WeChat Moments, purchasing historical order products on Taobao, booking hotels on Ctrip, buying train tickets on 12306, and ordering takeout on Meituan. Its application scenarios are not limited to these; theoretically, AutoGLM can accomplish anything a human can do on a visual electronic device, with operation logic similar to humans, and without the need for complex workflow setups.

Currently, users can experience AutoGLM-Web by installing the "Zhipu Qingyan" plugin, a browser assistant that can simulate user web browsing, clicking, and automatically completing advanced searches, summarization, and content generation on websites. Additionally, AutoGLM has opened beta testing applications on the Android system and has engaged in deep collaborations with mobile manufacturers such as Honor.

WeChat Screenshot_20241026150714.png

AutoGLM's technology is based on Zhipu's self-developed "Basic Agent Decoupling Middle Interface" and "Self-Evolutionary Online Course Reinforcement Learning Framework," addressing issues such as capability antagonism in large model agent task planning and action execution, scarcity of training tasks and data, sparse feedback signals, and policy distribution drift. AutoGLM can continuously improve itself, steadily enhancing its performance, similar to how humans acquire new skills as they grow.

In terms of technical challenges, AutoGLM has resolved issues of imprecision in "action execution" and inflexibility in "task planning." It achieves this through the design of the "Basic Agent Decoupling Middle Interface," decoupling the "task planning" and "action execution" phases via a natural language middle interface, significantly enhancing agent capabilities. Additionally, AutoGLM employs the "Self-Evolutionary Online Course Reinforcement Learning Framework" to learn and improve the capabilities of large model agents in real online environments.

AutoGLM has achieved notable performance improvements in both Phone Use and Web Browser Use, surpassing the performance of GPT-4o and Claude-3.5-Sonnet on the AndroidLab evaluation benchmark. In the WebArena-Lite evaluation benchmark, AutoGLM has achieved approximately 200% performance improvement over GPT-4o, narrowing the gap in success rates between humans and large model agents in GUI manipulation.

Project Link:https://xiao9905.github.io/AutoGLM

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

Zhipu AI Launches AutoGLM Agent: Simulate Human Phone Operations with Simple Commands

AIbase基地

This article is from AIbase Daily