dolmino-mix-1124

A high-quality dataset for the second phase of OLMo2 training.

CommonProductProgrammingDatasetNatural Language Processing
The DOLMino dataset mix for OLMo2 stage 2 annealing training is a compilation of various high-quality data sources, designed for the second phase of training the OLMo2 model. This dataset encompasses diverse types of data such as web pages, STEM papers, and encyclopedic entries, aimed at enhancing model performance in text generation tasks. Its significance lies in providing rich training resources for the development of smarter and more accurate NLP models.
Visit

dolmino-mix-1124 Visit Over Time

Monthly Visits

19075321

Bounce Rate

45.07%

Page per Visit

5.5

Visit Duration

00:05:32

dolmino-mix-1124 Visit Trend

dolmino-mix-1124 Visit Geography

dolmino-mix-1124 Traffic Sources

dolmino-mix-1124 Alternatives