AI News

Don't miss any moment of global AI innovation

AI Daily

Daily three-minute AI industry trends

AI Timeline

AI industry milestones

Al Hardware

Lists all AI hardware products.

AI Monetization Guide

Latest Cases

AI monetization case sharing

Image Collection

AI image creation monetization cases

Video Collection

AI video creation monetization cases

Audio Collection

AI audio creation monetization cases

Content Collection

AI content writing monetization cases

AI Tutorials

Latest Tutorials

Free sharing of the latest AI tutorials

AI Product Rankings

AI Product Ranking

Shows total visits ranking of AI websites

AI Traffic Growth Ranking

Track fastest growing AI websites by traffic

AI Traffic Decline Ranking

Focus on AI websites with significant traffic drops

AI Weekly Ranking

Shows weekly visits ranking of AI websites

Popular Country Rankings

United States

AI websites most popular with US users

China

AI websites most popular with Chinese users

India

AI websites most popular with Indian users

Brazil

AI websites most popular with Brazilian users

Popular Category Rankings

Image Generation

Total visits ranking of AI image generation websites

Personal Assistant

Total visits ranking of AI personal assistant websites

Character Generation

Total visits ranking of AI character generation websites

Video Generation

Total visits ranking of AI video generation websites

Popular Open Source Data Rankings

AI Project Ranking

GitHub popular AI projects by total stars

AI Project Growth Ranking

GitHub popular AI projects by growth rate

AI Developer Ranking

GitHub popular AI developer ranking

AI Organization Ranking

GitHub popular AI organization ranking

Popular Open Source Categories

Deepseek

GitHub popular deepseek open source projects

TTS

GitHub popular TTS open source projects

LLM

GitHub popular LLM open source projects

ChatGPT

GitHub popular ChatGPT open source projects

AI Open Source Project Library

Overview

Overview of GitHub popular AI open source projects

Product Library Tool Navigation

Together AI Releases RedPajama v2 Dataset for Large Language Model Training

站长之家

Published inAI News · 1 min read · Nov 6, 2023

Translation: Together AI releases the RedPajama v2 dataset, comprising 30 trillion tokens, designed for training large language models. This dataset aims to support the successful development of large language models by providing high-quality data resources. The dataset is sourced from CommonCrawl and other public web data, including over 40 clusters of quality annotations and deduplication. The RedPajama v2 dataset undergoes minimal processing, preserving the original data for subsequent processing by model builders. This initiative will provide more resources for the development and research of language models, and is expected to further advance the field of AI.

AI Dataset Large Language Model Together AI

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Zhipu Announces Price Cuts for Multiple Large Language Models, with GLM-4-Plus Dropping 90%

Zhipu BigModel's open platform has adjusted prices for several of its model offerings. GLM-4-FlashX, for example, is now priced at just 10 RMB per 100 million tokens. Built on a powerful pre-trained base, this model boasts exceptionally fast inference speeds and functional capabilities comparable to GPT-4, excelling in data extraction, generation, and translation.

Apr 24, 2025

190

NVIDIA Unveils Multimodal LLM Describe Anything: Generating Detailed Descriptions of Specific Regions

The NVIDIA AI team has released a revolutionary multimodal large language model—Describe Anything 3B (DAM-3B)—designed for detailed, region-specific descriptions of images and videos. This model, with its innovative technology and superior performance, has generated significant discussion in the multimodal learning field, marking another milestone in AI development. Below, AIBase outlines the model's core highlights and industry impact. A breakthrough in region-specific descriptions, DAM-3B stands out for its unique ability to...

Apr 24, 2025

100

AWS Releases SWE-PolyBench: A New Open-Source Benchmark for Evaluating AI Programming Assistants

AWS AI Labs recently introduced SWE-PolyBench, a multilingual open-source benchmark designed to provide a more comprehensive framework for evaluating AI programming assistants. With advancements in large language models (LLMs), AI programming assistants capable of generating, modifying, and understanding software code have shown significant progress. However, current evaluation methods remain limited, with many benchmarks focusing solely on single languages like Python, failing to offer a complete picture.

Apr 24, 2025

170

Fujitsu and Nutanix Launch Takane, a Japanese Large Language Model, Targeting the Enterprise Private AI Market

Fujitsu and Nutanix have collaborated to release Takane, a powerful Japanese large language model designed for enterprise private cloud deployments. This collaboration aims to provide businesses with a secure and efficient solution for leveraging AI within their own infrastructure.

Apr 23, 2025

190

NodeRAG: Revolutionizing AI Retrieval with a 30% Efficiency Boost!

With the rapid advancement of generative AI, Retrieval-Augmented Generation (RAG) systems are becoming crucial for enhancing the accuracy and context relevance of Large Language Models (LLMs). Recently, an innovative RAG enhancement system called NodeRAG has garnered significant attention in the industry, its unique heterogeneous graph structure bringing a revolutionary breakthrough to RAG workflows. NodeRAG: A New Paradigm of Heterogeneous Graph-Driven RAG. NodeRAG is...

Apr 22, 2025

240

Anthropic Releases Best Practices Guide for Claude Code, Seamlessly Integrating AI into Developer Workflows

Anthropic recently released a comprehensive best practices guide for Claude Code, providing developers with a low-level, command-line interface (CLI)-centric tool to seamlessly integrate the Claude large language model into their daily programming tasks. Based on Anthropic's internal best practices, this guide emphasizes flexible, secure, and efficient coding patterns, offering valuable guidance for engineers looking to incorporate AI into their existing development environments.

Apr 22, 2025

4.4k

GLM-4-32B and GLM-Z1-32B Launched on OpenRouter, Free and Open to All

The Tsinghua University KEG Lab (THUDM) has launched its cutting-edge large language models (LLMs), GLM-4-32B and GLM-Z1-32B, on the OpenRouter platform, completely free and open to global users. This milestone event represents a significant step towards the widespread adoption of high-performance AI models, providing developers, researchers, and AI enthusiasts with powerful tools to drive further innovation in AI applications. Model launch: Powerful performance, free access.

Apr 22, 2025

400

UIUC and Google Release Search-R1: A Large Language Model That Can Search and Answer Questions

A groundbreaking new AI technology allows language models to search the internet for information! Not only has this resulted in a 41% increase in exam scores, but it also unlocks a new level of reasoning and search capabilities. Learn about this academic 'cheat code' evolution and why you might want to get your AI a library card! Paper: https://arxiv.org/abs/2503.09516 Code: https://github.com/PeterGriffinJin/Search-R

Apr 21, 2025

350

Google Releases Gemma 3 QAT Model: Runable on a Single RTX 3090

Google recently released a new version of its Gemma3 series, exciting many AI enthusiasts. Just a month after its initial launch, Google released a Quantization Aware Training (QAT) optimized version of Gemma3, aiming to significantly reduce memory requirements while maintaining model quality. Specifically, the QAT-optimized Gemma3 27B model reduces VRAM requirements from 54GB to 14.1GB, meaning users can now run it on a single NVIDIA RTX 3090.

Apr 21, 2025

650

iFlytek's StarFire X1 Receives Major Upgrade: Aims to Rival OpenAI in AI

On April 21st, iFlytek officially announced a significant upgrade to its AI model, StarFire X1, aiming to compete with OpenAI's models in intelligent reasoning and multi-tasking capabilities. This domestically-trained large language model excels in various general tasks, including mathematics, programming, logical reasoning, text generation, language understanding, and knowledge question answering. This upgrade incorporates data from more complex scenarios, significantly improving the model's performance.

Apr 21, 2025

400