Kunlun Tech and the Beijing Academy of Artificial Intelligence, Nanyang Technological University in Singapore, Peking University, and other institutions have jointly released a general computer control framework named Cradle. This AI framework enables intelligent agents (AI Agents) to control keyboards and mice directly like humans without special training, interact with any open-source or closed-source software, and not rely on any internal API. Cradle is the first AI framework that can handle multiple commercial games and operate various software applications simultaneously, and its paper, project, and code are all open-source.

Cradle showcases its exceptional abilities in various games, including completing a 40-minute main quest in "Red Dead Redemption 2," clearing the farm and shopping in "Stardew Valley," building a town of a thousand people in "Cities: Skylines," haggling with customers in "Life After Death 2," and demonstrating application capabilities in daily software such as Chrome, Outlook, and Feishu. It can also perform operations such as photo editing and video editing, making it a versatile AI Agent.

WeChat Screenshot_20240704142116.png

Cradle is composed of six parts: information collection, self-reflection, task inference, skill management, action planning, and memory modules. By reasonably encapsulating and abstracting the original input and output, it achieves interaction with the computer. It uses video images displayed on the screen as input, extracts text and visual information for decision-making, and outputs signals to control the keyboard and mouse. Cradle's decision-making reasoning module can spontaneously interact with software and complete tasks by reflecting on the past, summarizing the present, and planning for the future.

Moreover, Cradle's performance in games and software applications demonstrates its versatility. It can complete complex tasks in various games with different styles and operation methods, and perform various tasks in commonly used software, such as downloading papers, sending emails, editing photos, and video editing. Cradle also defeated the baseline method using true labels on the challenging benchmark OSWorld.

The release of Cradle provides new possibilities for building general computer control intelligent agents (GCC Agents), promotes the development of unified input and output interfaces, lays a foundation for the interaction and self-improvement of intelligent agents in different environments in the future, and is an important step towards achieving General Artificial Intelligence (AGI).

Project Homepage: https://baai-agents.github.io/Cradle

Code Link: https://github.com/BAAI-Agents/Cradle