MG-LLaVA
Innovative MLLM with Multi-Granularity Visual Instruction Tuning
CommonProductProgrammingMachine LearningVisual Processing
MG-LLaVA is a machine learning language model (MLLM) designed to enhance the visual processing capabilities of models. It achieves this by incorporating a multi-granularity visual pipeline, encompassing low-resolution, high-resolution, and object-centric features. An additional high-resolution visual encoder is introduced to capture finer details, and a Conv-Gate fusion network is used to integrate these high-resolution features with the base visual features. Furthermore, object-level features derived from offline detector bounding boxes are integrated to further refine the model's object recognition abilities. Trained via instruction tuning on publicly available multimodal data, MG-LLaVA exhibits exceptional perceptual skills.
MG-LLaVA Visit Over Time
Monthly Visits
494758773
Bounce Rate
37.69%
Page per Visit
5.7
Visit Duration
00:06:29