AI News

Don't miss any moment of global AI innovation

AI Daily

Daily three-minute AI industry trends

AI Timeline

AI industry milestones

Al Hardware

Lists all AI hardware products.

AI Monetization Guide

Latest Cases

AI monetization case sharing

Image Collection

AI image creation monetization cases

Video Collection

AI video creation monetization cases

Audio Collection

AI audio creation monetization cases

Content Collection

AI content writing monetization cases

AI Tutorials

Latest Tutorials

Free sharing of the latest AI tutorials

AI Product Rankings

AI Product Ranking

Shows total visits ranking of AI websites

AI Traffic Growth Ranking

Track fastest growing AI websites by traffic

AI Traffic Decline Ranking

Focus on AI websites with significant traffic drops

AI Weekly Ranking

Shows weekly visits ranking of AI websites

Popular Country Rankings

United States

AI websites most popular with US users

China

AI websites most popular with Chinese users

India

AI websites most popular with Indian users

Brazil

AI websites most popular with Brazilian users

Popular Category Rankings

Image Generation

Total visits ranking of AI image generation websites

Personal Assistant

Total visits ranking of AI personal assistant websites

Character Generation

Total visits ranking of AI character generation websites

Video Generation

Total visits ranking of AI video generation websites

Popular Open Source Data Rankings

AI Project Ranking

GitHub popular AI projects by total stars

AI Project Growth Ranking

GitHub popular AI projects by growth rate

AI Developer Ranking

GitHub popular AI developer ranking

AI Organization Ranking

GitHub popular AI organization ranking

Popular Open Source Categories

Deepseek

GitHub popular deepseek open source projects

TTS

GitHub popular TTS open source projects

LLM

GitHub popular LLM open source projects

ChatGPT

GitHub popular ChatGPT open source projects

AI Open Source Project Library

Overview

Overview of GitHub popular AI open source projects

Product Library Tool Navigation

WebVoyager

An end-to-end web agent built on a large multimodal model

CommonProductProductivityWeb AgentMultimodal Model

Visit

WebVoyager is an innovative large multimodal model (LMM)-powered web agent that can complete user instructions end-to-end by interacting with real-world websites. We propose a novel web agent evaluation protocol to address the challenge of automatic evaluation for open-world agent tasks, leveraging the powerful multimodal understanding capabilities of GPT-4V. We collected real-world tasks from 15 widely used websites to evaluate our agent. We demonstrate that WebVoyager achieves a 55.7% task success rate, significantly outperforming the performance of GPT-4 (with all tools) and WebVoyager (text only) settings, highlighting WebVoyager's superior capabilities in practical applications. We find that our proposed automatic evaluation achieves 85.3% consistency with human judgment, paving the way for further development of web agents in real-world environments.

Visit

WebVoyager Visit Over Time

Monthly Visits

27175375

Bounce Rate

44.30%

Page per Visit

5.8

Visit Duration

00:04:57

WebVoyager Visit Trend

WebVoyager Visit Geography

WebVoyager Traffic Sources

WebVoyager Alternatives

WebVoyager — An end-to-end web agent built on a large multimodal model

Productivity

•Web Agent•Multimodal Model

300

ultravox-v0_4_1-llama-3_1-8b — Multimodal speech large language model

Productivity

•Speech Recognition•Speech Translation

180

Awesome GPT-4o Images — Showcases a diverse collection of AI art images and prompts generated by OpenAI's GPT-4o.

Image

•AI Art•Image Generation

WeClone — Fine-tune a large language model using WeChat chat logs to achieve high-quality voice cloning.

Productivity

•Digital Cloning•Voice Cloning

Dream 7B — Dream 7B is a state-of-the-art open diffusion large language model.

Productivity

•Diffusion Model•Large Language Model

StarVector — A foundational model for generating high-quality SVG code.

InternationalSelection

•SVG generation•Image processing

606

Argo — Easily build your own large language model. Exclusive intelligence, all locally.

ChineseSelection

•Large Language Model•Local Deployment

1446

NotaGen — NotaGen is a model for symbolic music generation, employing a large language model training paradigm and focusing on generating high-quality classical music scores.

Music

•Music Generation•Large Language Model

1620

AoT — Atom of Thoughts (AoT) is a framework for improving the reasoning performance of large language models.

Programming

•Large Language Model•Reasoning Framework

624

Spark-TTS — Spark-TTS is a highly efficient single-stream decoupled speech synthesis model based on large language models.

Productivity

•Speech Synthesis•Large Language Model

1434

Level-Navi Agent-Search — Level-Navi Agent is a ready-to-use framework that utilizes large language models for in-depth query understanding and precise search.

Programming

•Large Language Model•Web Search

252

M2RAG — A benchmark codebase for retrieval-augmented generation in multimodal contexts.

Programming

•Multimodal•Retrieval-Augmented Generation

294

SWE-RL — Enhancing the reasoning capabilities of large language models in open-source software evolution through reinforcement learning.

Programming

•Reinforcement Learning•Large Language Model

300

TableGPT2-7B — TableGPT2-7B is a large language model specializing in tabular data processing, suitable for data analysis and business intelligence tasks.

Productivity

•Tabular Data•Data Analysis

348

TableGPT-agent — A pre-built agent based on TableGPT2 for table-based question answering tasks.

Programming

•Artificial Intelligence•Natural Language Processing

342

Coding-Tutor — Explores the potential of large language models as programming tutoring tools and proposes the Trace-and-Verify workflow.

Education

•Programming Education•Large Language Model

360

Tbox - AI Powered Intelligent Agent Builder — Leveraging Alipay's lifestyle scenarios and leading large language model technology, Tbox enables businesses to quickly build professional-grade intelligent agents.

ChineseSelection

•Large Language Model•Intelligent Agent

864

MoBA — MoBA is a Mixed Block Attention mechanism for long text contexts designed to improve the efficiency of large language models.

Productivity

•Large Language Model•Attention Mechanism

288

Goedel-Prover — Goedel-Prover is an open-source automated theorem proving model focused on the formal verification of mathematical problems.

Programming

•Automated Theorem Proving•Mathematics

318

OmniParser-v2.0 — OmniParser is a versatile screen parsing tool that converts UI screenshots into a structured format, improving the performance of LLM-based UI agents.

Image

•Screen Parsing•Image Recognition

1122

Mistral-Small-24B-Instruct-2501 — Mistral Small 24B is a multilingual, high-performance instruction-tuned large language model suitable for various application scenarios.

Productivity

•Large Language Model•Multilingual

378

MNN Large Model Android App — A fully functional Android app supporting multimodal capabilities with a large language model.

Productivity

•Large Language Model•Multimodal

2802

Baichuan-M1-14B — An open-source large language model optimized specifically for medical scenarios, developed by Baichuan Intelligent. It demonstrates exceptional general capabilities and performance in the healthcare domain.

Productivity

•Large language model•Healthcare

786

Doubao-1.5-pro — Doubao-1.5-pro is a high-performance sparse Mixture of Experts (MoE) large language model that focuses on achieving an optimal balance between inference performance and model capability.

ChineseSelection

•Large Language Model•Multi-modal

9018

DeepSeek-R1-Distill-Llama-70B — DeepSeek-R1-Distill-Llama-70B is a large language model optimized using reinforcement learning, focusing on reasoning and conversational capabilities.

Programming

•Large Language Model•Reinforcement Learning

984

InternVL2_5-78B-MPO — This is an advanced series of multimodal large language models that demonstrate outstanding overall performance.

Productivity

•Multimodal•Large Language Model

372

InternLM3-8B-Instruct — InternLM3-8B-Instruct is an open-source instruction model with 8 billion parameters designed for general-purpose use and advanced reasoning.

Programming

•Large Language Model•Open Source

276

Dria-Agent-a-3B — A large language model based on the Qwen2.5-Coder series, focused on agent applications.

Programming

•Large Language Model•Agent Applications

192

Dria-Agent-a-7B — A large language model trained on the Qwen2.5-Coder series, focusing on agent applications.

Programming

•Large Language Model•Programming Assistance

252

Dria-Agent-α — Dria-Agent-α is a large language model tool interaction framework based on Python.

Programming

•Large Language Model•Python

300