LLaVA-Video
Research on video instruction tuning and synthetic data.
CommonProductVideoVideo UnderstandingMultimodal Learning
LLaVA-Video is a large multimodal model (LMM) focused on video instruction tuning, addressing the challenge of acquiring high-quality raw data from the internet by creating a high-quality synthetic dataset, LLaVA-Video-178K. This dataset includes detailed video descriptions, open-ended questions, and multiple-choice questions, aimed at enhancing the understanding and reasoning capabilities of video language models. The LLaVA-Video model has demonstrated outstanding performance across various video benchmarks, validating the effectiveness of its dataset.
LLaVA-Video Visit Over Time
Monthly Visits
74242
Bounce Rate
57.36%
Page per Visit
1.3
Visit Duration
00:00:33