AI2 Releases Open Dataset Dolma: Breaking Down Data Barriers for AI Language Models

站长之家

Published inAI News · 2 min read · Aug 21, 2023

The Allen Institute for AI (AI2) has released an open-source text dataset named Dolma, designed to enhance the transparency and innovation of AI language models. As a centerpiece of AI2's Open Language Model (OLMo) initiative, Dolma will provide researchers and developers with free access to data resources, supporting a broader range of AI research. Not only is Dolma a vast open dataset with 3 billion tokens, but it also features straightforward usage and licensing terms. AI2 has adopted the "ImpACT License for Moderate-Risk Work" and encourages users to provide contact information and usage details. The openness of this dataset offers researchers and developers more resources, propelling the AI field towards a more transparent and collaborative future.

AI2 Dolma Language Model

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Xiaomi AI Glasses Firmware Update: New Features for Live Streaming and Voice Control

Xiaomi AI Glasses firmware has been updated to version 1.4.16.0, adding support for Douyin live streaming, the Xiaoai Companion car control feature, and introducing the 'English Speaking Practice' service. Users can activate it through voice commands, interact naturally with Xiaoai Companion, and improve their English speaking skills, enhancing the device's intelligence and user experience.

Nov 19, 2025

AI Revolution of the 70 Million Question Bank: Fenshu Intelligent Question-Solving Sold 14,000 Units in One Day, Efficiency Increased by 40%

Fenshu launched the AI Question-Solving System Class, which sold 14,000 units in one day, with an average improvement of 29% in user learning efficiency. The system is based on data from 70 million users, providing personalized question-solving paths and dynamically adjusting difficulty. AI covers the entire question bank, live streaming, and interview process, intelligently diagnosing weak points, with AI interview point evaluation supporting handwriting recognition.

Nov 19, 2025

Microsoft Tests AI File Connector in Win11: Claude Can Directly Request Local Files

Microsoft is testing a new feature in Win11 that allows third-party AI (such as Claude) to request access to local files through File Explorer. After the user grants permission, the AI can directly read local content to complete tasks without uploading to the cloud. Demonstration cases include generating a real estate website using local photos and summarizing folder contents to create a PPT.

Nov 19, 2025

NVIDIA and Microsoft Jointly Bet on AI: $1.5 Billion Investment in Anthropic

Nvidia and Microsoft invest up to $15B in AI startup Anthropic, developer of ChatGPT rival Claude, amid AI boom and bubble concerns. Microsoft also holds 27% stake in OpenAI.....

Nov 19, 2025

Google Video Editing Platform Vids Opens New Features to Everyone, Including AI Voice Dubbing, Removing Redundant Spoken Words, AI Image Editing, etc.

Google's Vids video editor now offers free AI features like voice dubbing, automatic filler word removal, and image editing for all users.....

Nov 19, 2025

Poe App from Quora Launches AI Group Chat Feature Supporting Up to 200 People

Quora's AI platform Poe has launched a group chat feature that allows up to 200 people to interact with various AI models simultaneously, covering text, image, video, and audio generation. This move coincides with OpenAI's ChatGPT group chat pilot, driving a transformation in AI interaction and enhancing collaboration and communication between users and their friends, family, or colleagues.

Nov 19, 2025

Baidu's AI Revenue Reaches 9.6 Billion Yuan in Q3, Up 50% YoY; Luobo Kuaipao Trip Volume Doubles

Baidu's total revenue in the third quarter reached 31.2 billion yuan, down 7% YoY, but its AI business, which was disclosed independently for the first time, achieved high growth: Infrastructure of Intelligent Cloud at 4.2 billion yuan (+33%), AI applications at 2.6 billion yuan (+6%), and AI-native marketing at 2.8 billion yuan (+262%). The three segments totaled 9.6 billion yuan, up over 50% YoY, effectively offsetting a 18% decline in online marketing revenue. The autonomous driving platform Luobo Kuaipao had 3.1 million orders in the quarter, up 212%, covering 22 cities, with a total mileage exceeding 240 million kilometers. Li Yanhong said starting from 2025

Nov 19, 2025

Stack Overflow launches Stack Internal: Turning Enterprise Q&A into an AI Trustworthy Knowledge Base

Stack Overflow has launched the enterprise product Stack Internal, which provides technical Q&A metadata and reliability scores through the MCP interface, helping AI agents avoid generating incorrect information. The CEO revealed that large customers have already paid to use it, with a business model similar to Reddit's content licensing.

Nov 19, 2025

Google Launches Generative UI: AI Will Generate Interactive Interfaces in Real Time

Google's Generative UI enables AI to create interactive visual interfaces, providing dynamic responses and operable UIs beyond text answers.....

Nov 19, 2025

110

Google Releases Its Smartest Model Gemini 3 Pro Advanced Coding Support Deep Understanding of Images and Videos

Google DeepMind's Gemini 3 Pro, touted as the most intelligent AI model, excels in reasoning, learning, and planning. It provides clear, concise answers on complex topics, enhances understanding through diverse responses, and helps turn ideas into reality.....

Nov 19, 2025

110

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

AI Brand Monitoring Tool

GEO Services​

AI Search Visibility Checker

AI Model Compatibility Checker

AI Deployment Calculator

AI Dataset Collection

Intelligent Document Recognition

AI2 Releases Open Dataset Dolma: Breaking Down Data Barriers for AI Language Models

站长之家

This article is from AIbase Daily

AI News Recommendations

Xiaomi AI Glasses Firmware Update: New Features for Live Streaming and Voice Control

AI Revolution of the 70 Million Question Bank: Fenshu Intelligent Question-Solving Sold 14,000 Units in One Day, Efficiency Increased by 40%

Microsoft Tests AI File Connector in Win11: Claude Can Directly Request Local Files

NVIDIA and Microsoft Jointly Bet on AI: $1.5 Billion Investment in Anthropic

Google Video Editing Platform Vids Opens New Features to Everyone, Including AI Voice Dubbing, Removing Redundant Spoken Words, AI Image Editing, etc.

Poe App from Quora Launches AI Group Chat Feature Supporting Up to 200 People

Baidu's AI Revenue Reaches 9.6 Billion Yuan in Q3, Up 50% YoY; Luobo Kuaipao Trip Volume Doubles

Stack Overflow launches Stack Internal: Turning Enterprise Q&A into an AI Trustworthy Knowledge Base

Google Launches Generative UI: AI Will Generate Interactive Interfaces in Real Time

Google Releases Its Smartest Model Gemini 3 Pro Advanced Coding Support Deep Understanding of Images and Videos

GEO Services