MNBVC

MNBVC is a massive Chinese corpus comparable to the 40T data used to train ChatGPT.

CommonProductOpenSourceNatural Language ProcessingChinese Language Dataset
MNBVC (Massive Never-ending BT Vast Chinese corpus) is a project aimed at providing rich Chinese data for AI. It includes not only mainstream cultural content but also niche cultures and internet slang. The dataset encompasses various forms of pure text Chinese data, such as news, essays, novels, books, magazines, papers, dialogues, posts, wikis, ancient poems, lyrics, product descriptions, jokes, anecdotes, and chat logs.
Visit

MNBVC Visit Over Time

Monthly Visits

499904316

Bounce Rate

37.31%

Page per Visit

5.8

Visit Duration

00:06:52

MNBVC Visit Trend

MNBVC Visit Geography

MNBVC Traffic Sources

MNBVC Alternatives