VCoder
VCoder is a visual perception model that can improve the performance of multi-modal large language models on object-level visual tasks.
CommonProductImageComputer VisionNatural Language Processing
VCoder is an adapter that can improve the performance of multi-modal large language models on object-level visual tasks by using auxiliary perception modes as control input. VCoder LLaVA is built based on LLaVA-1.5. VCoder does not fine-tune the parameters of LLaVA-1.5, so its performance on general question answering benchmarks is the same as LLaVA-1.5. VCoder has been benchmarked on the COST dataset and has achieved good performance on semantic, instance, and panoramic segmentation tasks. The authors also released the model's detection results and pre-trained models.
VCoder Visit Over Time
Monthly Visits
488643166
Bounce Rate
37.28%
Page per Visit
5.7
Visit Duration
00:06:37