dolmino-mix-1124
A high-quality dataset for the second phase of OLMo2 training.
CommonProductProgrammingDatasetNatural Language Processing
The DOLMino dataset mix for OLMo2 stage 2 annealing training is a compilation of various high-quality data sources, designed for the second phase of training the OLMo2 model. This dataset encompasses diverse types of data such as web pages, STEM papers, and encyclopedic entries, aimed at enhancing model performance in text generation tasks. Its significance lies in providing rich training resources for the development of smarter and more accurate NLP models.
dolmino-mix-1124 Visit Over Time
Monthly Visits
19075321
Bounce Rate
45.07%
Page per Visit
5.5
Visit Duration
00:05:32