MINT-1T
A multimodal dataset comprising one trillion tokens and 3.4 billion images.
PremiumNewProductOpenSourceMultimodalDataset
MINT-1T is a multimodal dataset open-sourced by Salesforce AI, containing one trillion text tokens and 3.4 billion images, making it ten times larger than existing open-source datasets. It includes not only HTML documents but also PDF documents and ArXiv papers, enriching the dataset's diversity. The construction of MINT-1T involves multiple data collection, processing, and filtering steps to ensure high quality and diversity of the data.
MINT-1T Visit Over Time
Monthly Visits
8724
Bounce Rate
53.42%
Page per Visit
1.4
Visit Duration
00:02:06
MINT-1T Visit Trend
MINT-1T Visit Geography
No Geography Data