AI News

Don't miss any moment of global AI innovation

AI Daily

Daily three-minute AI industry trends

AI Timeline

AI industry milestones

Al Hardware

Lists all AI hardware products.

AI Monetization Guide

Latest Cases

AI monetization case sharing

Image Collection

AI image creation monetization cases

Video Collection

AI video creation monetization cases

Audio Collection

AI audio creation monetization cases

Content Collection

AI content writing monetization cases

AI Tutorials

Latest Tutorials

Free sharing of the latest AI tutorials

AI Product Rankings

AI Product Ranking

Shows total visits ranking of AI websites

AI Traffic Growth Ranking

Track fastest growing AI websites by traffic

AI Traffic Decline Ranking

Focus on AI websites with significant traffic drops

AI Weekly Ranking

Shows weekly visits ranking of AI websites

Popular Country Rankings

United States

AI websites most popular with US users

China

AI websites most popular with Chinese users

India

AI websites most popular with Indian users

Brazil

AI websites most popular with Brazilian users

Popular Category Rankings

Image Generation

Total visits ranking of AI image generation websites

Personal Assistant

Total visits ranking of AI personal assistant websites

Character Generation

Total visits ranking of AI character generation websites

Video Generation

Total visits ranking of AI video generation websites

Popular Open Source Data Rankings

AI Project Ranking

GitHub popular AI projects by total stars

AI Project Growth Ranking

GitHub popular AI projects by growth rate

AI Developer Ranking

GitHub popular AI developer ranking

AI Organization Ranking

GitHub popular AI organization ranking

Popular Open Source Categories

Deepseek

GitHub popular deepseek open source projects

TTS

GitHub popular TTS open source projects

LLM

GitHub popular LLM open source projects

ChatGPT

GitHub popular ChatGPT open source projects

AI Open Source Project Library

Overview

Overview of GitHub popular AI open source projects

Product Library Tool Navigation MCP

MMStar

An elite benchmark dataset for evaluating large visual language models

CommonProductProductivityVisual Language ModelsBenchmark

Visit

MMStar is a benchmark dataset designed to assess the multimodal capabilities of large visual language models. It comprises 1500 carefully selected visual language samples, covering 6 core abilities and 18 sub-dimensions. Each sample has undergone human review, ensuring visual dependency, minimizing data leakage, and requiring advanced multimodal capabilities for resolution. In addition to traditional accuracy metrics, MMStar proposes two new metrics to measure data leakage and the practical performance gains of multimodal training. Researchers can use MMStar to evaluate the multimodal capabilities of visual language models across multiple tasks and leverage the new metrics to discover potential issues within models.

Visit

MMStar Visit Over Time

Monthly Visits

712

Bounce Rate

47.43%

Page per Visit

1.4

Visit Duration

00:00:24

MMStar Visit Trend

MMStar Visit Geography

MMStar Traffic Sources

MMStar Alternatives

MMStar — An elite benchmark dataset for evaluating large visual language models

Productivity

•Visual Language Models•Benchmark

312

Visual Sketchpad — A visual reasoning tool for multimodal large language models (LLMs)

Productivity

•Multimodal•Visual Reasoning

336

ZeroBench — ZeroBench is a challenging visual benchmark designed for contemporary large multimodal models.

Image

•Multimodal•Benchmark

348

Humanity's Last Exam — Humanity's Last Exam is a multimodal benchmark test designed to assess large language models' capabilities.

Others

•Artificial Intelligence•Benchmark Testing

294

FACTS Grounding — A cutting-edge benchmark for assessing the factual accuracy of large language models.

Others

•Language Models•Benchmark Testing

240

KnowEdit — A knowledge editing benchmark for evaluating the knowledge editing capabilities of large language models.

Others

•Knowledge Editing•Large Language Models

108

Florence-VL — Enhancement tool for visual language models, combining generative visual encoders and deep breadth fusion technology.

Programming

•Visual Language Models•Multimodal Learning

246

Cantor — Innovative multimodal chain-of-thought framework that enhances visual reasoning capabilities

Productivity

•Multimodal•Visual Reasoning

300

LVBench — Long Video Understanding Benchmark

Video

•Video Understanding•Benchmark

192

DriveVLM — Fusion of Autonomous Driving and Visual Language Models

Others

•Autonomous Driving•Visual Language Models

306

CuMo — An advanced architecture for extending multimodal large language models (LLMs).

Programming

•Multimodal Learning•Large Language Models

270

MATHVERSE — Exploring the capabilities of multimodal large language models in solving visual math problems.

Productivity

•Multimodal Learning•Visual Information Processing

378

InternVL2_5-1B-MPO — A multimodal large language model that enhances integrated understanding of visual and language data.

Productivity

•Multimodal•Large Language Model

396

InternVL2_5-8B-MPO-AWQ — A multimodal large language model enhancing visual and linguistic interaction capabilities.

Image

•Multimodal•Large Language Model

366

EAGLE — Exploration of the design space for multimodal large language models

Programming

•Multimodal Learning•Large Language Models

474

VSP-LLM — A framework that combines Visual Speech Processing with Large Language Models

Programming

•Visual Speech Processing•Large Language Models

2706

DeepSeek-VL2 — An advanced multimodal understanding model that integrates visual and linguistic capabilities.

Image

•\Visual Language Models\•\Multimodal Understanding\

624

SimpleQA — A benchmark test for measuring the ability of language models to answer factual questions.

Others

•Benchmark•Language Model

294

TOFU — The TOFU dataset provides a benchmark for fictional forgetting tasks for large language models.

Productivity

•Language Model•Forgetting

348

M2RAG — A benchmark codebase for retrieval-augmented generation in multimodal contexts.

Programming

•Multimodal•Retrieval-Augmented Generation

294

Qwen-VL — General-purpose Visual Language Model

Productivity

•Visual•Language Model

2592

InternVL2_5-26B — A large multimodal language model that integrates visual and linguistic understanding.

Image

•Multimodal•Large Language Model

354

Large World Models — Large World Models: Understanding Video and Language

Productivity

• Artificial Intelligence•Machine Learning

1014

ColPali — Efficient document retrieval tool based on visual language models

Productivity

•Document Retrieval•Visual Language Models

180

vision-parse — Utilizes visual language models to parse PDFs into Markdown.

Productivity

•PDF Parsing•Markdown Conversion

456

LaVi-Bridge — Connects different language models and generative visual models for text-to-image generation

Image

•Text-to-Image Generation•Language Models

816

Models Table — A comprehensive list and information about large language models

Others

•Large Language Models•Machine Learning

366

AI News

AI Daily

AI Timeline

Al Hardware

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview

MMStar

MMStar Visit Over Time

MMStar Visit Trend

MMStar Visit Geography

MMStar Traffic Sources

MMStar Alternatives

MMStar — An elite benchmark dataset for evaluating large visual language models

Visual Sketchpad — A visual reasoning tool for multimodal large language models (LLMs)

ZeroBench — ZeroBench is a challenging visual benchmark designed for contemporary large multimodal models.

POINTS-Qwen-2-5-7B-Chat — Latest advancements in visual language models

MouSi — Multimodal Visual Language Model

MM1.5 — Optimization and analysis of multimodal large language models

Humanity's Last Exam — Humanity's Last Exam is a multimodal benchmark test designed to assess large language models' capabilities.

FACTS Grounding — A cutting-edge benchmark for assessing the factual accuracy of large language models.

KnowEdit — A knowledge editing benchmark for evaluating the knowledge editing capabilities of large language models.

Florence-VL — Enhancement tool for visual language models, combining generative visual encoders and deep breadth fusion technology.

Cantor — Innovative multimodal chain-of-thought framework that enhances visual reasoning capabilities

LVBench — Long Video Understanding Benchmark

DriveVLM — Fusion of Autonomous Driving and Visual Language Models

CuMo — An advanced architecture for extending multimodal large language models (LLMs).

MATHVERSE — Exploring the capabilities of multimodal large language models in solving visual math problems.

InternVL2_5-1B-MPO — A multimodal large language model that enhances integrated understanding of visual and language data.

InternVL2_5-8B-MPO-AWQ — A multimodal large language model enhancing visual and linguistic interaction capabilities.

EAGLE — Exploration of the design space for multimodal large language models

VSP-LLM — A framework that combines Visual Speech Processing with Large Language Models

DeepSeek-VL2 — An advanced multimodal understanding model that integrates visual and linguistic capabilities.

SimpleQA — A benchmark test for measuring the ability of language models to answer factual questions.

TOFU — The TOFU dataset provides a benchmark for fictional forgetting tasks for large language models.

M2RAG — A benchmark codebase for retrieval-augmented generation in multimodal contexts.

Qwen-VL — General-purpose Visual Language Model

InternVL2_5-26B — A large multimodal language model that integrates visual and linguistic understanding.

Large World Models — Large World Models: Understanding Video and Language

ColPali — Efficient document retrieval tool based on visual language models

vision-parse — Utilizes visual language models to parse PDFs into Markdown.

LaVi-Bridge — Connects different language models and generative visual models for text-to-image generation

Models Table — A comprehensive list and information about large language models