CuMo

An advanced architecture for extending multimodal large language models (LLMs).

CommonProductProgrammingMultimodal LearningLarge Language Models
CuMo is an extension architecture for multimodal large language models (LLMs). It enhances model scalability by incorporating sparse Top-K gated expert-mixing (MoE) blocks within both the visual encoder and MLP connector, while adding virtually no activation parameters during inference. CuMo pre-trains MLP blocks and initializes experts within the MoE blocks, utilizing auxiliary loss during the visual instruction fine-tuning stage to ensure balanced expert loading. CuMo outperforms other similar models on various VQA and visual instruction following benchmarks, trained entirely on open-source datasets.
Visit

CuMo Visit Over Time

Monthly Visits

1030

Bounce Rate

52.96%

Page per Visit

1.2

Visit Duration

00:00:00

CuMo Visit Trend

CuMo Visit Geography

CuMo Traffic Sources

CuMo Alternatives