WebVoyager
An end-to-end web agent built on a large multimodal model
CommonProductProductivityWeb AgentMultimodal Model
WebVoyager is an innovative large multimodal model (LMM)-powered web agent that can complete user instructions end-to-end by interacting with real-world websites. We propose a novel web agent evaluation protocol to address the challenge of automatic evaluation for open-world agent tasks, leveraging the powerful multimodal understanding capabilities of GPT-4V. We collected real-world tasks from 15 widely used websites to evaluate our agent. We demonstrate that WebVoyager achieves a 55.7% task success rate, significantly outperforming the performance of GPT-4 (with all tools) and WebVoyager (text only) settings, highlighting WebVoyager's superior capabilities in practical applications. We find that our proposed automatic evaluation achieves 85.3% consistency with human judgment, paving the way for further development of web agents in real-world environments.
WebVoyager Visit Over Time
Monthly Visits
17788201
Bounce Rate
44.87%
Page per Visit
5.4
Visit Duration
00:05:32