The Zhipu Technology team has recently launched a new product based on the research achievements of the GLM technology team – AutoGLM, an intelligent agent capable of simulating human operations on mobile phones to perform various tasks. The introduction of AutoGLM marks a significant advancement in artificial intelligence within the "Phone Use" domain, making AI applications more integrated into people's daily lives.

WeChat Screenshot_20241026150533.png

AutoGLM is capable of executing multiple tasks such as liking and commenting on WeChat Moments, purchasing historical order products on Taobao, booking hotels on Ctrip, buying train tickets on 12306, and ordering takeout on Meituan. Its application scenarios are not limited to these; theoretically, AutoGLM can accomplish anything a human can do on a visual electronic device, with operation logic similar to humans, and without the need for complex workflow setups.

Currently, users can experience AutoGLM-Web by installing the "Zhipu Qingyan" plugin, a browser assistant that can simulate user web browsing, clicking, and automatically completing advanced searches, summarization, and content generation on websites. Additionally, AutoGLM has opened beta testing applications on the Android system and has engaged in deep collaborations with mobile manufacturers such as Honor.

WeChat Screenshot_20241026150714.png

AutoGLM's technology is based on Zhipu's self-developed "Basic Agent Decoupling Middle Interface" and "Self-Evolutionary Online Course Reinforcement Learning Framework," addressing issues such as capability antagonism in large model agent task planning and action execution, scarcity of training tasks and data, sparse feedback signals, and policy distribution drift. AutoGLM can continuously improve itself, steadily enhancing its performance, similar to how humans acquire new skills as they grow.

In terms of technical challenges, AutoGLM has resolved issues of imprecision in "action execution" and inflexibility in "task planning." It achieves this through the design of the "Basic Agent Decoupling Middle Interface," decoupling the "task planning" and "action execution" phases via a natural language middle interface, significantly enhancing agent capabilities. Additionally, AutoGLM employs the "Self-Evolutionary Online Course Reinforcement Learning Framework" to learn and improve the capabilities of large model agents in real online environments.

AutoGLM has achieved notable performance improvements in both Phone Use and Web Browser Use, surpassing the performance of GPT-4o and Claude-3.5-Sonnet on the AndroidLab evaluation benchmark. In the WebArena-Lite evaluation benchmark, AutoGLM has achieved approximately 200% performance improvement over GPT-4o, narrowing the gap in success rates between humans and large model agents in GUI manipulation.

Project Link:https://xiao9905.github.io/AutoGLM