ZhiYuan Research Institute Jointly Builds Chinese Internet Corpus CCI to Provide Resources for Big Data and Artificial Intelligence Industries
站长之家
161
The Beijing Academy of Artificial Intelligence (BAAI), in collaboration with TALIS and iFLYTEK, has established the "Chinese Internet Corpus" (CCI). This corpus, meticulously screened and cleaned, has initially released data amounting to 104GB, spanning from 2001 to 2023. The BAAI has indicated plans to expand data sources, refine data processing procedures, and open additional high-quality Chinese datasets such as WUDAO corpora, COIG, and MTP. This initiative aims to provide the big data and artificial intelligence industries with secure and reliable corpus resources.
© Copyright AIbase Base 2024, Click to View Source - https://www.aibase.com/news/3677