Recently, Tsinghua University's Intelligent Industry Research Institute (AIR) released an AI model named AutoDroid-V2 on December 24, 2024, aimed at optimizing the automation control capabilities of mobile devices. This model significantly enhances the efficiency of user operations through natural language by applying small language models.
AutoDroid-V2 adopts a script-based approach, differing from the traditional method that relies on large cloud-based language models (LLMs). This innovation allows devices to efficiently execute user commands, reducing dependence on cloud services, thus significantly improving privacy and security. Additionally, it lowers the data consumption on the user side and operational costs on the server side, promoting the widespread use of mobile devices.
In terms of project background, the rise of large language models and visual language models in recent years has made it possible to control mobile devices through natural language commands. These technologies provide new avenues for solving complex user tasks. However, the traditional "step-by-step GUI agent" approach faces issues of high data consumption and privacy security risks, posing obstacles to large-scale deployment.
The innovation of AutoDroid-V2 lies in its ability to generate multi-step scripts based on user commands, allowing it to execute multiple GUI operations in one go. This approach significantly reduces query frequency, lowers resource consumption, and enables the direct generation and execution of task scripts on user devices. The model constructs application documentation in offline mode, laying the groundwork for subsequent script generation.
In performance testing, AutoDroid-V2 underwent benchmarking for 226 tasks across 23 mobile applications, achieving a task completion rate improvement of 10.5% to 51.7% compared to previous models like AutoDroid and SeeClick. Moreover, its input and output token consumption decreased to 1/43.5 and 1/5.8, respectively, while model inference latency was reduced to 1/5.7 to 1/13.4 of the original. These results demonstrate the efficiency and reliability of AutoDroid-V2 in practical applications.
Key Points:
🌟 AutoDroid-V2 is a new AI model launched by Tsinghua University, enhancing the efficiency of natural language control for mobile devices.
🔒 The model reduces dependence on cloud services through small language models, enhancing user privacy and security.
📈 Benchmark tests show significant improvements in task completion rates and resource consumption for AutoDroid-V2, showcasing its strong application potential.