ZhiYuan Releases the World's Largest Chinese-English Semantic Vector Model Training Dataset MTP
站长之家
66
The Beijing Academy of Artificial Intelligence (BAAI) has released the MTP dataset, the world's largest training dataset for Chinese-English semantic vector models, comprising 300 million pairs of data. This dataset includes Chinese-English text pairs from various sources, providing a crucial foundation for training Chinese-English semantic vector models. The BAAI has indicated that data plays a vital role in the training of large models and will promote collaborative innovation in artificial intelligence. The release of this dataset is expected to address the scarcity of training datasets for Chinese models.
© Copyright AIbase Base 2024, Click to View Source - https://www.aibase.com/news/1393