LLaVA

Large Language and Vision Assistant, enabling multimodal chat and scientific question answering

CommonProductImageMultimodalChat
LLaVA is a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna, achieving impressive chat capabilities, emulating the spirit of multimodal GPT-4, and achieving new highest accuracy in scientific question answering. LLaVA's use cases include multimodal chat in daily user applications and multimodal reasoning in the scientific domain. LLaVA's data, code, and checkpoints are limited to research use and follow the licenses of CLIP, LLaMA, Vicuna, and GPT-4.
Visit

LLaVA Visit Over Time

Monthly Visits

74242

Bounce Rate

57.36%

Page per Visit

1.3

Visit Duration

00:00:33

LLaVA Visit Trend

LLaVA Visit Geography

LLaVA Traffic Sources

LLaVA Alternatives