LLaVA-OneVision
An efficient model for multimodal vision task transformation.
CommonProductImageMultimodalVisual Recognition
LLaVA-OneVision is a large multimodal model (LMM) collaboratively developed by ByteDance and several universities. It pushes the performance boundaries of open large multimodal models across single images, multiple images, and video scenarios. The model's design facilitates powerful transfer learning across different modalities/scenarios, showcasing new integrated capabilities, particularly in video understanding and cross-scenario abilities, demonstrated through task conversion from images to videos.
LLaVA-OneVision Visit Over Time
Monthly Visits
74242
Bounce Rate
57.36%
Page per Visit
1.3
Visit Duration
00:00:33