MM1.5
Optimization and analysis of multimodal large language models
CommonProductProductivityMultimodalLarge Language Models
MM1.5 is a series of multimodal large language models (MLLMs) designed to enhance capabilities in understanding text-rich images, visual reference grounding, and multi-image reasoning. Based on the MM1 architecture, the model adopts a data-centric training approach and systematically explores the impact of different data mixes throughout the model training lifecycle. The MM1.5 model varies from 1B to 30B parameters and includes both dense and mixture of experts (MoE) variants, providing valuable guidance for future MLLM development research through extensive empirical and ablation studies that detail the training processes and decision insights.
MM1.5 Visit Over Time
Monthly Visits
19075321
Bounce Rate
45.07%
Page per Visit
5.5
Visit Duration
00:05:32