AI News

Don't miss any moment of global AI innovation

AI Daily

Daily three-minute AI industry trends

AI Timeline

AI industry milestones

Al Hardware

Lists all AI hardware products.

AI Monetization Guide

Latest Cases

AI monetization case sharing

Image Collection

AI image creation monetization cases

Video Collection

AI video creation monetization cases

Audio Collection

AI audio creation monetization cases

Content Collection

AI content writing monetization cases

AI Tutorials

Latest Tutorials

Free sharing of the latest AI tutorials

AI Product Rankings

AI Product Ranking

Shows total visits ranking of AI websites

AI Traffic Growth Ranking

Track fastest growing AI websites by traffic

AI Traffic Decline Ranking

Focus on AI websites with significant traffic drops

AI Weekly Ranking

Shows weekly visits ranking of AI websites

Popular Country Rankings

United States

AI websites most popular with US users

China

AI websites most popular with Chinese users

India

AI websites most popular with Indian users

Brazil

AI websites most popular with Brazilian users

Popular Category Rankings

Image Generation

Total visits ranking of AI image generation websites

Personal Assistant

Total visits ranking of AI personal assistant websites

Character Generation

Total visits ranking of AI character generation websites

Video Generation

Total visits ranking of AI video generation websites

Popular Open Source Data Rankings

AI Project Ranking

GitHub popular AI projects by total stars

AI Project Growth Ranking

GitHub popular AI projects by growth rate

AI Developer Ranking

GitHub popular AI developer ranking

AI Organization Ranking

GitHub popular AI organization ranking

Popular Open Source Categories

Deepseek

GitHub popular deepseek open source projects

TTS

GitHub popular TTS open source projects

LLM

GitHub popular LLM open source projects

ChatGPT

GitHub popular ChatGPT open source projects

AI Open Source Project Library

Overview

Overview of GitHub popular AI open source projects

Product Library Tool Navigation

Video-MME

The first comprehensive benchmark for evaluating the performance of Multi-Modal Large Language Models (MLLMs) in video analysis.

CommonProductVideoMulti-modalVideo Analysis

Visit

Video-MME is a benchmark for evaluating the performance of Multi-Modal Large Language Models (MLLMs) in video analysis. It fills the gap in existing evaluation methods regarding the ability of MLLMs to process continuous visual data, providing researchers with a high-quality and comprehensive evaluation platform. The benchmark covers videos of different lengths and evaluates core MLLM capabilities.

Visit

Video-MME Visit Over Time

Monthly Visits

7187

Bounce Rate

38.35%

Page per Visit

2.0

Visit Duration

00:00:07

Video-MME Visit Trend

Video-MME Visit Geography

Video-MME Traffic Sources

Video-MME Alternatives

Video-MME — The first comprehensive benchmark for evaluating the performance of Multi-Modal Large Language Models (MLLMs) in video analysis.

Video

•Multi-modal•Video Analysis

636

OpenCompass Multi-modal Leaderboard — Real-time updated leaderboard of multi-modal model performance

Productivity

•Multi-modal•Performance Evaluation

1788

Kimi-VL — A highly efficient open-source expert-mixed visual language model with multi-modal reasoning capabilities.

ChineseSelection

•Multi-modal•Reasoning

SmolVLM2 — SmolVLM2 is a lightweight language model focused on video content analysis and generation.

Video

•Video Analysis•Text Generation

654

EgoLife — EgoLife is a long-term, multi-modal, multi-view daily life AI assistant project aimed at advancing research in long-term context understanding.

Productivity

•Multi-modal•Multi-view

246

Migician — Migician is a multi-modal large language model focusing on multi-image localization, capable of achieving free-form, precise multi-image localization.

Image

•Multi-modal•Image localization

234

Magma-8B — Magma-8B is a multi-modal AI model developed by Microsoft that processes image and text inputs to generate text outputs.

Image

•Multi-modal•Image

426

FirstHR — FirstHR is an intelligent HR management platform focused on recruitment and team development.

Business

•Human Resources•Recruitment

348

MILS — LLMs can see and hear without any training.

Image

•Artificial Intelligence•Multi-modal

210

Janus-Pro-1B — Janus-Pro-1B is an autoregressive framework for unified multi-modal understanding and generation.

Image

•Multi-modal•Image Generation

822

Doubao-1.5-pro — Doubao-1.5-pro is a high-performance sparse Mixture of Experts (MoE) large language model that focuses on achieving an optimal balance between inference performance and model capability.

ChineseSelection

•Large Language Model•Multi-modal

9018

Procyon AI Computer Vision Benchmark — A benchmarking tool for evaluating the performance of AI inference engines on Windows PCs or Apple Macs.

Others

•AI Benchmarking•Performance Evaluation

294

Procyon AI Image Generation Benchmark — A benchmarking tool used to measure the AI accelerator inference performance of devices.

Others

•Image Generation•Benchmarking

558

InternVL2_5-38B-MPO — The InternVL2.5-MPO series models are based on InternVL2.5 and Hybrid Preference Optimization, showcasing exceptional performance.

chatting

•Multimodal•Large Language Model

462

Valley-Eagle-7B — A multimodal large model that processes text, image, and video data.

Productivity

•Multimodal•Large Model

420

Valley — A large multimodal model that processes text, image, and video data.

Image

•Multimodal•Large Model

420

FlagAI — A comprehensive open-source project for large model algorithms, models, and optimization tools.

Programming

•Artificial Intelligence•Large Models

222

video-analyzer — A video analysis tool that combines Llama's visual model and OpenAI Whisper to generate local video descriptions.

Video

•Video Analysis•Computer Vision

1518

P-MMEval — A multilingual multi-task benchmark for evaluating large language models (LLMs).

Others

•Multilingual•Benchmarking

156

InternVL2_5-38B — Advanced Multimodal Large Language Model Series

Image

•Multimodal•Large Language Models

432

NVIDIA AI Blueprint — Utilize NVIDIA AI to build video search and summarization agents.

Video

•Computer Vision•Video Analysis

306

NVIDIA Video Search and Summarization — Build a video search and summarization agent to extract insights from videos.

Video

•Video Analysis•Artificial Intelligence

612

HOVER — Multi-functional neural full-body controller for humanoid robots

Programming

•humanoid robot•neural networks

294

stable-diffusion-3.5-large-turbo — High-performance text-to-image generation model.

Image

•Text-to-image•Generation model

684

stable-diffusion-3.5-large — High-performance text-to-image generation model

Image

•Image Generation•Text-to-Image

468

Youtube-Whisper — Transcribe YouTube videos utilizing OpenAI's Whisper model.

Productivity

•Artificial Intelligence•Audio Transcription

498

Llama 3.2 — Open-source AI model that can be fine-tuned, distilled, and deployed.

Productivity

•Machine Learning•Open Source

330

AI News

AI Daily

AI Timeline

Al Hardware

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview

Video-MME

Video-MME Visit Over Time

Video-MME Visit Trend

Video-MME Visit Geography

Video-MME Traffic Sources

Video-MME Alternatives

Video-MME — The first comprehensive benchmark for evaluating the performance of Multi-Modal Large Language Models (MLLMs) in video analysis.

OpenCompass Multi-modal Leaderboard — Real-time updated leaderboard of multi-modal model performance

Kimi-VL — A highly efficient open-source expert-mixed visual language model with multi-modal reasoning capabilities.

SmolVLM2 — SmolVLM2 is a lightweight language model focused on video content analysis and generation.

EgoLife — EgoLife is a long-term, multi-modal, multi-view daily life AI assistant project aimed at advancing research in long-term context understanding.

Migician — Migician is a multi-modal large language model focusing on multi-image localization, capable of achieving free-form, precise multi-image localization.

Magma-8B — Magma-8B is a multi-modal AI model developed by Microsoft that processes image and text inputs to generate text outputs.

FirstHR — FirstHR is an intelligent HR management platform focused on recruitment and team development.

MILS — LLMs can see and hear without any training.

Janus-Pro-1B — Janus-Pro-1B is an autoregressive framework for unified multi-modal understanding and generation.

Doubao-1.5-pro — Doubao-1.5-pro is a high-performance sparse Mixture of Experts (MoE) large language model that focuses on achieving an optimal balance between inference performance and model capability.

Procyon AI Computer Vision Benchmark — A benchmarking tool for evaluating the performance of AI inference engines on Windows PCs or Apple Macs.

Procyon AI Image Generation Benchmark — A benchmarking tool used to measure the AI accelerator inference performance of devices.

InternVL2_5-38B-MPO — The InternVL2.5-MPO series models are based on InternVL2.5 and Hybrid Preference Optimization, showcasing exceptional performance.

Valley-Eagle-7B — A multimodal large model that processes text, image, and video data.

Valley — A large multimodal model that processes text, image, and video data.

FlagAI — A comprehensive open-source project for large model algorithms, models, and optimization tools.

video-analyzer — A video analysis tool that combines Llama's visual model and OpenAI Whisper to generate local video descriptions.

P-MMEval — A multilingual multi-task benchmark for evaluating large language models (LLMs).

InternVL2_5-38B — Advanced Multimodal Large Language Model Series

NVIDIA AI Blueprint — Utilize NVIDIA AI to build video search and summarization agents.

NVIDIA Video Search and Summarization — Build a video search and summarization agent to extract insights from videos.

HOVER — Multi-functional neural full-body controller for humanoid robots

stable-diffusion-3.5-large-turbo — High-performance text-to-image generation model.

stable-diffusion-3.5-large — High-performance text-to-image generation model

Youtube-Whisper — Transcribe YouTube videos utilizing OpenAI's Whisper model.

Llama 3.2 — Open-source AI model that can be fine-tuned, distilled, and deployed.

MyLens.ai — AI helps you gain deeper understanding of YouTube videos

Open Source Computer Vision Library — Open Source Computer Vision Library

doesVideoContain — Automatically detect video content in the browser using AI.