OmniParser V2
OmniParser V2 is a technology that transforms any LLM into a computer-using agent.
InternationalSelectionProgrammingArtificial IntelligenceGUI Automation
OmniParser V2 is an advanced artificial intelligence model developed by the Microsoft Research team. It aims to transform large language models (LLMs) into intelligent agents capable of understanding and manipulating graphical user interfaces (GUIs). By converting interface screenshots from pixel space into interpretable structured elements, OmniParser V2 enables LLMs to more accurately identify interactive icons and execute predetermined actions on the screen. OmniParser V2 has achieved significant improvements in detecting small icons and rapid reasoning. Combined with GPT-4o, it achieved an average accuracy of 39.6% on the ScreenSpot Pro benchmark, far exceeding the original model's 0.8%. In addition, OmniParser V2 provides the OmniTool, which supports integration with various LLMs, further promoting the development of GUI automation.
OmniParser V2 Visit Over Time
Monthly Visits
1231713766
Bounce Rate
44.60%
Page per Visit
3.4
Visit Duration
00:03:27