Video-LLaVA
Learns joint visual representations through prefix projection alignment.
CommonProductVideoMachine LearningVisual Understanding
Video-LLaVA is a model for learning joint visual representations by training through prefix projection alignment. It aligns video and image representations, leading to better visual understanding. The model boasts efficient learning and inference speeds, making it suitable for video processing and visual tasks.
Video-LLaVA Visit Over Time
Monthly Visits
2110044
Bounce Rate
35.99%
Page per Visit
6.5
Visit Duration
00:07:04