AI News

Don't miss any moment of global AI innovation

AI Daily

Daily three-minute AI industry trends

AI Timeline

AI industry milestones

Al Hardware

Lists all AI hardware products.

AI Monetization Guide

Latest Cases

AI monetization case sharing

Image Collection

AI image creation monetization cases

Video Collection

AI video creation monetization cases

Audio Collection

AI audio creation monetization cases

Content Collection

AI content writing monetization cases

AI Tutorials

Latest Tutorials

Free sharing of the latest AI tutorials

AI Product Rankings

AI Product Ranking

Shows total visits ranking of AI websites

AI Traffic Growth Ranking

Track fastest growing AI websites by traffic

AI Traffic Decline Ranking

Focus on AI websites with significant traffic drops

AI Weekly Ranking

Shows weekly visits ranking of AI websites

Popular Country Rankings

United States

AI websites most popular with US users

China

AI websites most popular with Chinese users

India

AI websites most popular with Indian users

Brazil

AI websites most popular with Brazilian users

Popular Category Rankings

Image Generation

Total visits ranking of AI image generation websites

Personal Assistant

Total visits ranking of AI personal assistant websites

Character Generation

Total visits ranking of AI character generation websites

Video Generation

Total visits ranking of AI video generation websites

Popular Open Source Data Rankings

AI Project Ranking

GitHub popular AI projects by total stars

AI Project Growth Ranking

GitHub popular AI projects by growth rate

AI Developer Ranking

GitHub popular AI developer ranking

AI Organization Ranking

GitHub popular AI organization ranking

Popular Open Source Categories

Deepseek

GitHub popular deepseek open source projects

TTS

GitHub popular TTS open source projects

LLM

GitHub popular LLM open source projects

ChatGPT

GitHub popular ChatGPT open source projects

AI Open Source Project Library

Overview

Overview of GitHub popular AI open source projects

Product Library Tool Navigation

Crawl4LLM

An efficient web crawler for LLM pre-training, focused on crawling high-quality web data effectively.

CommonProductProgrammingLLMWeb Crawler

Visit

Crawl4LLM is an open-source web crawling project designed to provide an efficient data crawling solution for the pre-training of Large Language Models (LLMs). It helps researchers and developers obtain high-quality training corpora through intelligent selection and crawling of web data. The tool supports various document scoring methods and allows flexible adjustment of crawling strategies based on configurations to meet different pre-training needs. Developed in Python, the project boasts good scalability and ease of use, making it suitable for both academic research and industrial applications.

Visit

Crawl4LLM Visit Over Time

Monthly Visits

521149929

Bounce Rate

35.96%

Page per Visit

6.1

Visit Duration

00:06:29

Crawl4LLM Visit Trend

Crawl4LLM Visit Geography

Crawl4LLM Traffic Sources

Crawl4LLM Alternatives

Crawl4LLM — An efficient web crawler for LLM pre-training, focused on crawling high-quality web data effectively.

Programming

•LLM•Web Crawler

456

Data-Juicer — A one-stop data processing system that provides high-quality data for large language models.

Productivity

•Machine Learning•Data Science

492

mcp-use — mcp-use is the simplest way to interact with MCP tools and supports custom agents.

Productivity

•Open Source•MCP

Basic Memory — Build persistent knowledge through conversations with LLMs, stored in local Markdown files.

Productivity

•Knowledge Management•LLM

132

openai-agents-python — A lightweight and powerful multi-agent workflow framework

Programming

•Artificial Intelligence•Multi-agent

612

Awesome-LLM-Nachtraining — Ein Tutorial-, Untersuchungs- und Leitfaden-Repository zu Methoden des Nachtrainings großer Sprachmodelle (LLM).

Productivity

•LLM•Nachtraining

300

l1m — A proxy API for extracting structured data from text and images, implemented based on LLMs.

Programming

•Data Extraction•LLM

474

Firecrawl LLMs.txt generator — A tool for generating website-integrated text files for LLM training and inference.

Productivity

•LLM•Text Generation

438

Hugo Translator — An LLM-based article translation tool that automatically translates and creates multilingual Markdown files.

Productivity

•LLM•Translation

462

Aviator Agents — An LLM-based agent framework for performing large-scale code migration within codebases.

Programming

•Code Migration•LLM

324

llm-commit — Un plugin para generar mensajes de commit de Git con LLM

Programming

•LLM•Git

150

hallucination-leaderboard — A leaderboard for comparing the hallucination rates of large language models when summarizing short documents.

Others

•LLM•Hallucination Detection

546

VisionAgent — VisionAgent is a library for generating code to solve vision tasks, supporting multiple LLM providers.

Image

•Artificial Intelligence•Vision Tasks

372

OmniParser V2 — OmniParser V2 is a technology that transforms any LLM into a computer-using agent.

InternationalSelection

•Artificial Intelligence•GUI Automation

960

Supametas.AI — A platform for unstructured data processing that helps businesses quickly build industry datasets and integrate them into LLM RAG knowledge bases

Productivity

•Data Processing•LLM

336

stocks-insights-ai-agent — A full-stack application based on LLM and LangChain for retrieving stock data and news.

Business

•LLM•LangChain

510

OpenDeepResearcher — An AI-based deep research tool that continuously searches for information until it meets user query needs.

Programming

•Research Tools•Iterative Search

546

DocETL — A data processing system driven by LLM.

Productivity

•Data Processing•LLM

210

DocWrangler — An open-source interactive development environment for building and optimizing LLM-based data processing pipelines.

Productivity

•LLM•Data Processing

228

Nemotron-CC — Transforms Common Crawl into a refined long-term pre-training dataset.

Programming

•Artificial Intelligence•Dataset

270

Chinese Internet Corpus Resource Platform — Providing high-quality Chinese language corpus resources to assist in the pre-training of large AI models.

Others

•Artificial Intelligence•Corpus

1386

FlashInfer — FlashInfer is a high-performance GPU kernel library designed for serving large language models.

Programming

•LLM•GPU

618

llmstxt-generator — A tool for generating text files that consolidate web content for LLM training and inference.

Programming

•LLM•Text Generation

426

CodebaseToPrompt — A tool that converts local files into structured prompts for large language models.

Programming

•Programming•LLM

294

Document Inlining — Leveraging composite AI technologies, Document Inlining bridges the modality gap.

Productivity

•LLM•Visual Model

306

IdentityRAG — A powerful LLM tool for searching, unifying, and retrieving customer data.

Productivity

•Customer Data•LLM

288

LangWatch — Monitor, evaluate, and optimize your LLM applications

Programming

•LLM•Optimization

258

PromptWizard — Task-aware prompt optimization framework

Programming

•Microsoft•LLM

696

POINTS-Yi-1.5-9B-Chat — Latest advancements in visual language models, integrating new technologies from WeChat AI.

Productivity

•Visual Language Model•WeChat AI

162

GraphRAG Visualizer — A web-based tool for visualizing and exploring Microsoft's GraphRAG framework.

Programming

•gpt•graph-visualization

864