VCoder
VCoder is a visual perception model that can improve the performance of multi-modal large language models on object-level visual tasks.
CommonProductImageComputer VisionNatural Language Processing
VCoder is an adapter that can improve the performance of multi-modal large language models on object-level visual tasks by using auxiliary perception modes as control input. VCoder LLaVA is built based on LLaVA-1.5. VCoder does not fine-tune the parameters of LLaVA-1.5, so its performance on general question answering benchmarks is the same as LLaVA-1.5. VCoder has been benchmarked on the COST dataset and has achieved good performance on semantic, instance, and panoramic segmentation tasks. The authors also released the model's detection results and pre-trained models.
VCoder Visit Over Time
Monthly Visits
515580771
Bounce Rate
37.20%
Page per Visit
5.8
Visit Duration
00:06:42