AIbase
Product LibraryTool Navigation

VisionQA-Llama2-OWLViT

Public

This is a multimodal model design for the Vision Question Answering (VQA) task. It integrates the Llama2 13B, OWL-ViT, and YOLOv8 models.

Creat2024-06-06T19:00:05
Update2025-02-19T17:24:03
4
Stars
0
Stars Increase