MG-LLaVA

Innovative MLLM with Multi-Granularity Visual Instruction Tuning

CommonProductProgrammingMachine LearningVisual Processing
MG-LLaVA is a machine learning language model (MLLM) designed to enhance the visual processing capabilities of models. It achieves this by incorporating a multi-granularity visual pipeline, encompassing low-resolution, high-resolution, and object-centric features. An additional high-resolution visual encoder is introduced to capture finer details, and a Conv-Gate fusion network is used to integrate these high-resolution features with the base visual features. Furthermore, object-level features derived from offline detector bounding boxes are integrated to further refine the model's object recognition abilities. Trained via instruction tuning on publicly available multimodal data, MG-LLaVA exhibits exceptional perceptual skills.
Visit

MG-LLaVA Visit Over Time

Monthly Visits

499904316

Bounce Rate

37.31%

Page per Visit

5.8

Visit Duration

00:06:52

MG-LLaVA Visit Trend

MG-LLaVA Visit Geography

MG-LLaVA Traffic Sources

MG-LLaVA Alternatives