MM1.5

Optimization and analysis of multimodal large language models

CommonProductProductivityMultimodalLarge Language Models
MM1.5 is a series of multimodal large language models (MLLMs) designed to enhance capabilities in understanding text-rich images, visual reference grounding, and multi-image reasoning. Based on the MM1 architecture, the model adopts a data-centric training approach and systematically explores the impact of different data mixes throughout the model training lifecycle. The MM1.5 model varies from 1B to 30B parameters and includes both dense and mixture of experts (MoE) variants, providing valuable guidance for future MLLM development research through extensive empirical and ablation studies that detail the training processes and decision insights.
Visit

MM1.5 Visit Over Time

Monthly Visits

17788201

Bounce Rate

44.87%

Page per Visit

5.4

Visit Duration

00:05:32

MM1.5 Visit Trend

MM1.5 Visit Geography

MM1.5 Traffic Sources

MM1.5 Alternatives