LLaVA
Large Language and Vision Assistant, enabling multimodal chat and scientific question answering
CommonProductImageMultimodalChat
LLaVA is a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna, achieving impressive chat capabilities, emulating the spirit of multimodal GPT-4, and achieving new highest accuracy in scientific question answering. LLaVA's use cases include multimodal chat in daily user applications and multimodal reasoning in the scientific domain. LLaVA's data, code, and checkpoints are limited to research use and follow the licenses of CLIP, LLaMA, Vicuna, and GPT-4.
LLaVA Visit Over Time
Monthly Visits
74242
Bounce Rate
57.36%
Page per Visit
1.3
Visit Duration
00:00:33