MNBVC
MNBVC is a massive Chinese corpus comparable to the 40T data used to train ChatGPT.
CommonProductOpenSourceNatural Language ProcessingChinese Language Dataset
MNBVC (Massive Never-ending BT Vast Chinese corpus) is a project aimed at providing rich Chinese data for AI. It includes not only mainstream cultural content but also niche cultures and internet slang. The dataset encompasses various forms of pure text Chinese data, such as news, essays, novels, books, magazines, papers, dialogues, posts, wikis, ancient poems, lyrics, product descriptions, jokes, anecdotes, and chat logs.
MNBVC Visit Over Time
Monthly Visits
494758773
Bounce Rate
37.69%
Page per Visit
5.7
Visit Duration
00:06:29