CuMo
An advanced architecture for extending multimodal large language models (LLMs).
CommonProductProgrammingMultimodal LearningLarge Language Models
CuMo is an extension architecture for multimodal large language models (LLMs). It enhances model scalability by incorporating sparse Top-K gated expert-mixing (MoE) blocks within both the visual encoder and MLP connector, while adding virtually no activation parameters during inference. CuMo pre-trains MLP blocks and initializes experts within the MoE blocks, utilizing auxiliary loss during the visual instruction fine-tuning stage to ensure balanced expert loading. CuMo outperforms other similar models on various VQA and visual instruction following benchmarks, trained entirely on open-source datasets.
CuMo Visit Over Time
Monthly Visits
1030
Bounce Rate
52.96%
Page per Visit
1.2
Visit Duration
00:00:00