AI News

Don't miss any moment of global AI innovation

AI Daily

Daily three-minute AI industry trends

AI Timeline

AI industry milestones

AI Monetization Guide

Latest Cases

AI monetization case sharing

Image Collection

AI image creation monetization cases

Video Collection

AI video creation monetization cases

Audio Collection

AI audio creation monetization cases

Content Collection

AI content writing monetization cases

AI Tutorials

Latest Tutorials

Free sharing of the latest AI tutorials

AI Product Rankings

AI Product Ranking

Shows total visits ranking of AI websites

AI Traffic Growth Ranking

Track fastest growing AI websites by traffic

AI Traffic Decline Ranking

Focus on AI websites with significant traffic drops

AI Weekly Ranking

Shows weekly visits ranking of AI websites

Popular Country Rankings

United States

AI websites most popular with US users

China

AI websites most popular with Chinese users

India

AI websites most popular with Indian users

Brazil

AI websites most popular with Brazilian users

Popular Category Rankings

Image Generation

Total visits ranking of AI image generation websites

Personal Assistant

Total visits ranking of AI personal assistant websites

Character Generation

Total visits ranking of AI character generation websites

Video Generation

Total visits ranking of AI video generation websites

Popular Open Source Data Rankings

AI Project Ranking

GitHub popular AI projects by total stars

AI Project Growth Ranking

GitHub popular AI projects by growth rate

AI Developer Ranking

GitHub popular AI developer ranking

AI Organization Ranking

GitHub popular AI organization ranking

Popular Open Source Categories

Deepseek

GitHub popular deepseek open source projects

TTS

GitHub popular TTS open source projects

LLM

GitHub popular LLM open source projects

ChatGPT

GitHub popular ChatGPT open source projects

AI Open Source Project Library

Overview

Overview of GitHub popular AI open source projects

Product Library Tool Navigation

ml-ferret

End-to-end MLLM, enabling precise referencing and localization.

CommonProductProgrammingMachine LearningLanguage Model

Visit

ml-ferret is an end-to-end machine learning language model (MLLM) that can accept various forms of references and respond with precise localization in multimodal environments. It combines mixed regional representations and spatially aware visual samplers, supporting fine-grained and open-vocabulary referencing and localization. Additionally, ml-ferret includes the GRIT dataset (approximately 1.1 million samples) and the Ferret-Bench evaluation benchmark.

Visit

ml-ferret Visit Over Time

Monthly Visits

521149929

Bounce Rate

35.96%

Page per Visit

6.1

Visit Duration

00:06:29

ml-ferret Visit Trend

ml-ferret Visit Geography

ml-ferret Traffic Sources

ml-ferret Alternatives

ml-ferret — End-to-end MLLM, enabling precise referencing and localization.

Programming

•Machine Learning•Language Model

1104

Inception Labs — Inception Labs launches a new generation of diffusion-based large language models, offering extremely fast, efficient, and high-quality language generation capabilities.

InternationalSelection

•Artificial Intelligence•Language Model

648

DeepSeek Japanese — DeepSeek is an advanced AI language model excelling in logical reasoning, mathematics, and programming tasks. It is available for free.

Productivity

•Language Model•Programming Assistance

384

MiniCPM-o-2_6 — MiniCPM-o 2.6 is a powerful multimodal large language model designed for visual, speech, and multimodal live applications.

Others

•Multimodal•Language Model

690

MiniCPM-o — MiniCPM-o 2.6: An MLLM capable of delivering visual, voice, and multimodal interactions at GPT-4o level on mobile devices.

Others

•Multimodal•Language Model

558

InternVL2_5-8B-MPO — A large multimodal language model showcasing exceptional overall performance.

Image

•Multimodal•Large Language Model

630

InternVL2_5-4B-MPO-AWQ — A multimodal large language model designed to enhance image and text interaction capabilities.

Image

•Multimodal•Large Language Model

222

Valley 2.0 — A multimodal large language model that enhances the ability to process text, image, and video data.

Others

•Multimodal•Large Language Model

420

The Language of Motion — A unified model for verbal and non-verbal communication in 3D human motion.

Others

•3D Human Motion•Multimodal

222

Phi-4 — Microsoft's latest small language model focused on complex reasoning.

InternationalSelection

•Machine Learning•Language Model

1110

InternVL 2.5 — Open-source multimodal large language model series

Productivity

•multimodal•large language model

396

InternVL2_5-1B — A large multimodal language model that supports image and text understanding.

Image

•Multimodal•Large Language Model

282

InternVL2_5-78B — Advanced multimodal large language model series

Image

•Multimodal•Large Language Model

462

Amazon Nova — Amazon Nova is Amazon's next-generation foundational model, offering cutting-edge intelligence and industry-leading cost-effectiveness.

InternationalSelection

•AWS•Artificial Intelligence

348

OLMo-2-1124-13B-DPO — High-performance English language model suitable for diverse tasks.

Programming

•Language Model•Natural Language Processing

240

OpenScholar — A retrieval-augmented language model for synthesizing scientific literature.

Education

•Scientific Literature•Retrieval Augmentation

258

OLMo 2 — State-of-the-art fully open language model

Programming

•Language Model•Natural Language Processing

372

DataChain — A modern Python data frame library designed specifically for artificial intelligence.

Productivity

•Machine Learning•Artificial Intelligence

282

Aquila-VL-2B-llava-qwen — A visual-language model that intelligently processes both image and text information.

Image

•Visual Language Model•Multimodal

306

Pixtral 12B — The first multimodal Mistral model, supporting hybrid task processing for images and text.

Productivity

•Multimodal•AI Model

180

pixtral-12b-240910 — A multimodal large language model that supports understanding of both images and text.

Image

•Multimodal•Image Processing

324

ell — A lightweight programming library for language models, treating prompts as functions.

InternationalSelection

•Language Model•Programming Library

330

West Lake AI Model — A multimodal model with high emotional and intellectual intelligence

ChineseSelection

•Artificial Intelligence•Multimodal

540

MiniCPM3-4B — High-performance third-generation MiniCPM series model.

Productivity

•Language Model•Text Generation

378

Phi-3.5-vision — An advanced multimodal model that supports image and text understanding.

Programming

•Multimodal•Image Understanding

390

AI News

AI Daily

AI Timeline

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview

ml-ferret

ml-ferret Visit Over Time

ml-ferret Visit Trend

ml-ferret Visit Geography

ml-ferret Traffic Sources

ml-ferret Alternatives

ml-ferret — End-to-end MLLM, enabling precise referencing and localization.

Inception Labs — Inception Labs launches a new generation of diffusion-based large language models, offering extremely fast, efficient, and high-quality language generation capabilities.

DeepSeek Japanese — DeepSeek is an advanced AI language model excelling in logical reasoning, mathematics, and programming tasks. It is available for free.

MiniCPM-o-2_6 — MiniCPM-o 2.6 is a powerful multimodal large language model designed for visual, speech, and multimodal live applications.

MiniCPM-o — MiniCPM-o 2.6: An MLLM capable of delivering visual, voice, and multimodal interactions at GPT-4o level on mobile devices.

InternVL2_5-8B-MPO — A large multimodal language model showcasing exceptional overall performance.

InternVL2_5-4B-MPO-AWQ — A multimodal large language model designed to enhance image and text interaction capabilities.

Valley 2.0 — A multimodal large language model that enhances the ability to process text, image, and video data.

The Language of Motion — A unified model for verbal and non-verbal communication in 3D human motion.

Phi-4 — Microsoft's latest small language model focused on complex reasoning.

InternVL 2.5 — Open-source multimodal large language model series

InternVL2_5-1B — A large multimodal language model that supports image and text understanding.

InternVL2_5-78B — Advanced multimodal large language model series

Amazon Nova — Amazon Nova is Amazon's next-generation foundational model, offering cutting-edge intelligence and industry-leading cost-effectiveness.

OLMo-2-1124-13B-DPO — High-performance English language model suitable for diverse tasks.

OpenScholar — A retrieval-augmented language model for synthesizing scientific literature.

OLMo 2 — State-of-the-art fully open language model

DataChain — A modern Python data frame library designed specifically for artificial intelligence.

Aquila-VL-2B-llava-qwen — A visual-language model that intelligently processes both image and text information.

Spirit LM — Multimodal language model that integrates text and speech

Zamba2-7B — High-performance small language model

UniMuMo — Unified model for text, music, and motion generation.

AMD-Llama-135m — A high-performance language model trained by AMD

Molmo — Advanced Multimodal AI Model Family

Pixtral 12B — The first multimodal Mistral model, supporting hybrid task processing for images and text.

pixtral-12b-240910 — A multimodal large language model that supports understanding of both images and text.

ell — A lightweight programming library for language models, treating prompts as functions.

West Lake AI Model — A multimodal model with high emotional and intellectual intelligence

MiniCPM3-4B — High-performance third-generation MiniCPM series model.

Phi-3.5-vision — An advanced multimodal model that supports image and text understanding.