VCoder

VCoder is a visual perception model that can improve the performance of multi-modal large language models on object-level visual tasks.

CommonProductImageComputer VisionNatural Language Processing
VCoder is an adapter that can improve the performance of multi-modal large language models on object-level visual tasks by using auxiliary perception modes as control input. VCoder LLaVA is built based on LLaVA-1.5. VCoder does not fine-tune the parameters of LLaVA-1.5, so its performance on general question answering benchmarks is the same as LLaVA-1.5. VCoder has been benchmarked on the COST dataset and has achieved good performance on semantic, instance, and panoramic segmentation tasks. The authors also released the model's detection results and pre-trained models.
Visit

VCoder Visit Over Time

Monthly Visits

494758773

Bounce Rate

37.69%

Page per Visit

5.7

Visit Duration

00:06:29

VCoder Visit Trend

VCoder Visit Geography

VCoder Traffic Sources

VCoder Alternatives