AIbase
ProduktbibliothekTool Navigation

c4-dataset-script

Public

Inspired by google c4, here is a series of colossal clean data cleaning scripts focused on CommonCrawl data processing. Including Chinese data processing and cleaning methods in MassiveText.

Erstellungszeit2022-05-27T18:15:11
Aktualisierungszeit2025-03-24T14:30:29
122
Stars
0
Stars Increase