AI News

Don't miss any moment of global AI innovation

AI Daily

Daily three-minute AI industry trends

AI Timeline

AI industry milestones

Al Hardware

Lists all AI hardware products.

AI Monetization Guide

Latest Cases

AI monetization case sharing

Image Collection

AI image creation monetization cases

Video Collection

AI video creation monetization cases

Audio Collection

AI audio creation monetization cases

Content Collection

AI content writing monetization cases

AI Tutorials

Latest Tutorials

Free sharing of the latest AI tutorials

AI Product Rankings

AI Product Ranking

Shows total visits ranking of AI websites

AI Traffic Growth Ranking

Track fastest growing AI websites by traffic

AI Traffic Decline Ranking

Focus on AI websites with significant traffic drops

AI Weekly Ranking

Shows weekly visits ranking of AI websites

Popular Country Rankings

United States

AI websites most popular with US users

China

AI websites most popular with Chinese users

India

AI websites most popular with Indian users

Brazil

AI websites most popular with Brazilian users

Popular Category Rankings

Image Generation

Total visits ranking of AI image generation websites

Personal Assistant

Total visits ranking of AI personal assistant websites

Character Generation

Total visits ranking of AI character generation websites

Video Generation

Total visits ranking of AI video generation websites

Popular Open Source Data Rankings

AI Project Ranking

GitHub popular AI projects by total stars

AI Project Growth Ranking

GitHub popular AI projects by growth rate

AI Developer Ranking

GitHub popular AI developer ranking

AI Organization Ranking

GitHub popular AI organization ranking

Popular Open Source Categories

Deepseek

GitHub popular deepseek open source projects

TTS

GitHub popular TTS open source projects

LLM

GitHub popular LLM open source projects

ChatGPT

GitHub popular ChatGPT open source projects

AI Open Source Project Library

Overview

Overview of GitHub popular AI open source projects

Product Library Tool Navigation

FineWeb

High-quality English webpage dataset

CommonProductProgrammingNatural Language ProcessingDataset

Visit

The FineWeb dataset contains over 150 billion web pages of cleaned and deduplicated English text sourced from CommonCrawl. Designed specifically for pre-training large language models, it aims to advance the development of open-source models. The dataset has been meticulously processed and filtered to ensure high quality, making it suitable for a variety of natural language processing tasks.

Visit

FineWeb Visit Over Time

Monthly Visits

27175375

Bounce Rate

44.30%

Page per Visit

5.8

Visit Duration

00:04:57

FineWeb Visit Trend

FineWeb Visit Geography

FineWeb Traffic Sources

FineWeb Alternatives

Nemotron-CC — Transforms Common Crawl into a refined long-term pre-training dataset.

Programming

•Artificial Intelligence•Dataset

270

FineWeb — High-quality English webpage dataset

Programming

•Natural Language Processing•Dataset

522

Dolphin R1 — Dolphin R1 is a dataset for training reasoning models, containing 800,000 samples.

Programming

•Natural Language Processing•Reasoning Models

384

OLMo 2 1124 13B Preference Mixture — Large-scale multilingual preference mixture dataset

Others

•Dataset•Multilingual

168

dolmino-mix-1124 — A high-quality dataset for the second phase of OLMo2 training.

Programming

•Dataset•Natural Language Processing

252

MedTrinity-25M — A large-scale multimodal medical dataset

Others

•Medical Imaging•Multimodal

1032

Data-Juicer — A one-stop data processing system that provides high-quality data for large language models.

Productivity

•Machine Learning•Data Science

492

ImageInWords — A model for generating highly detailed image descriptions, designed for training visual language models.

Image

•Artificial Intelligence•Image Recognition

384

WildChat — A corpus composed of real-world user interactions with ChatGPT.

InternationalSelection

•Chatbot•Dataset

612

MNBVC — MNBVC is a massive Chinese corpus comparable to the 40T data used to train ChatGPT.

OpenSource

•Natural Language Processing•Chinese Language Dataset

1740

En3D — 3D Character Generation Model

Productivity

•Natural Language Processing•Model

288

Distil-Whisper — A collection of models and datasets.

Productivity

•Model•Dataset

714

RoleLLM — Role-playing framework for large language models

Entertainment

•Natural Language Processing•Role-Playing

2232

I2VGen-XL — AI Model Library and Dataset Platform

Productivity

•AI Model•Dataset

5622

Inst-Inpaint — An image restoration algorithm based on natural language input

Image

•Image Restoration•Natural Language Processing

4920

Describe Anything — A deep learning-based image and video description model.

Productivity

•Image Description•Video Processing

Search-R1 — A highly efficient reinforcement learning framework for training language models that perform reasoning and call search engines.

Productivity

•Reinforcement Learning•Natural Language Processing

d1 — Improving the reasoning capabilities of diffusion large language models using reinforcement learning.

Productivity

•Reasoning•Reinforcement Learning

GLM-4-32B — A powerful language model supporting various natural language processing tasks.

ChineseSelection

•Natural Language Processing•Deep Learning

Amazon Nova Sonic — Amazon's new foundational model understands tone, intonation, and rhythm, enhancing the naturalness of human-computer dialogue.

Productivity

•Speech Recognition•Artificial Intelligence

Agno — A lightweight library for building multimodal agents.

Productivity

•Multimodal Agent•Open Source

DeepSeek-V3-0324 — A powerful text generation model suitable for various dialogue applications.

GlobalTrending

•Text Generation•Dialogue System

516

HunYuan T1 — An industry-leading deep reasoning large model, optimized for human preferences.

ChineseSelection

•Deep Learning•Reasoning Model

780

Reka Flash 3 — A 21B general-purpose reasoning model suitable for low-latency applications.

Productivity

•Artificial Intelligence•Natural Language Processing

528

o1-pro — The o1-pro model enhances complex reasoning capabilities through reinforcement learning, providing superior answers.

960

Light-R1-14B-DS — An open-source 14B-parameter mathematical model, trained using reinforcement learning, with excellent performance.

Productivity

•Reinforcement Learning•Mathematical Model

612

Ideal Student Web Version — Ideal Student is an intelligent chat assistant that provides convenient conversational services and an intelligent interactive experience.

ChineseSelection

•Intelligent Chat•Artificial Intelligence

510

AI News

AI Daily

AI Timeline

Al Hardware

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview

FineWeb

FineWeb Visit Over Time

FineWeb Visit Trend

FineWeb Visit Geography

FineWeb Traffic Sources

FineWeb Alternatives

Nemotron-CC — Transforms Common Crawl into a refined long-term pre-training dataset.

FineWeb — High-quality English webpage dataset

Dolphin R1 — Dolphin R1 is a dataset for training reasoning models, containing 800,000 samples.

OLMo 2 1124 13B Preference Mixture — Large-scale multilingual preference mixture dataset

dolmino-mix-1124 — A high-quality dataset for the second phase of OLMo2 training.

MedTrinity-25M — A large-scale multimodal medical dataset

Data-Juicer — A one-stop data processing system that provides high-quality data for large language models.

DCLM-baseline — High-performance language model benchmark dataset

emo-visual-data — Emoji Visual Annotation Dataset

FlashRAG — Efficient Toolkit for Retrieval-Augmented Generation Research

ImageInWords — A model for generating highly detailed image descriptions, designed for training visual language models.

WildChat — A corpus composed of real-world user interactions with ChatGPT.

MNBVC — MNBVC is a massive Chinese corpus comparable to the 40T data used to train ChatGPT.

En3D — 3D Character Generation Model

Distil-Whisper — A collection of models and datasets.

RoleLLM — Role-playing framework for large language models

I2VGen-XL — AI Model Library and Dataset Platform

Inst-Inpaint — An image restoration algorithm based on natural language input

Describe Anything — A deep learning-based image and video description model.

Search-R1 — A highly efficient reinforcement learning framework for training language models that perform reasoning and call search engines.

d1 — Improving the reasoning capabilities of diffusion large language models using reinforcement learning.

GLM-4-32B — A powerful language model supporting various natural language processing tasks.

Amazon Nova Sonic — Amazon's new foundational model understands tone, intonation, and rhythm, enhancing the naturalness of human-computer dialogue.

Agno — A lightweight library for building multimodal agents.

DeepSeek-V3-0324 — A powerful text generation model suitable for various dialogue applications.

HunYuan T1 — An industry-leading deep reasoning large model, optimized for human preferences.

Reka Flash 3 — A 21B general-purpose reasoning model suitable for low-latency applications.

o1-pro — The o1-pro model enhances complex reasoning capabilities through reinforcement learning, providing superior answers.

Light-R1-14B-DS — An open-source 14B-parameter mathematical model, trained using reinforcement learning, with excellent performance.

Ideal Student Web Version — Ideal Student is an intelligent chat assistant that provides convenient conversational services and an intelligent interactive experience.