Microsoft recently announced a major upgrade to its open-source project, AgentUFO, launching the new UFO² version. This version adds operating system functionality and integrates deeply with Windows. This enhancement not only boosts the efficiency of automated tasks but also allows users to perform complex operations more easily.
A key feature of UFO² is its ability to directly call Windows native APIs and COM interfaces. Compared to traditional Robotic Process Automation (RPA), this approach is significantly more efficient and accurate when executing complex business processes. For example, converting tabular data to charts in Excel requires multiple simulated mouse clicks with traditional RPA, while UFO² can accomplish this with a single API call, eliminating the complexities of visual location and mouse simulation.
Test data shows that UFO² boasts a significantly higher success rate for automated tasks than OpenAI's Operator. In different test scenarios, UFO² achieved success rates of 30.5% and 32.7%, while Operator only reached 20.8% and 14.3%. Furthermore, UFO² demonstrates superior performance in handling complex tasks and cross-application operations, exhibiting stronger adaptability to non-standard interfaces.
The core control component, HostAgent, is responsible for parsing user instructions, managing application lifecycles, and coordinating the execution of AppAgents. When a user issues an automation command using natural language, HostAgent decomposes the task into a series of sub-tasks and assigns them to the appropriate AppAgents for execution.
Each AppAgent focuses on a specific Windows application, enabling more efficient task execution. UFO² also introduces a hybrid control detection mechanism, combining visual input with application metadata to enhance the system's perception of GUI elements. This innovation allows AppAgents to function reliably in both standard and non-standard environments.
Another noteworthy innovation is UFO²'s picture-in-picture mode. This feature isolates automated tasks from the user's main desktop, allowing users to work normally on their primary desktop while automation tasks run in a separate virtual desktop. This design improves user experience, reduces system interference, and mitigates potential security risks.
These new features in UFO² showcase Microsoft's latest advancements in automation, providing users with a more efficient and flexible work environment.
Open-source address: https://github.com/microsoft/UFO?tab=readme-ov-file
Key Highlights:
1. 🚀 UFO² deeply integrates with Windows, directly calling native APIs to boost automation efficiency.
2. 📊 UFO² demonstrates significantly higher success rates for automated tasks compared to OpenAI's Operator.
3. 🖥️ The new picture-in-picture mode isolates automated tasks from user interaction, enhancing user experience.