LLaVA-OneVision

An efficient model for multimodal vision task transformation.

CommonProductImageMultimodalVisual Recognition
LLaVA-OneVision is a large multimodal model (LMM) collaboratively developed by ByteDance and several universities. It pushes the performance boundaries of open large multimodal models across single images, multiple images, and video scenarios. The model's design facilitates powerful transfer learning across different modalities/scenarios, showcasing new integrated capabilities, particularly in video understanding and cross-scenario abilities, demonstrated through task conversion from images to videos.
Visit

LLaVA-OneVision Visit Over Time

Monthly Visits

74242

Bounce Rate

57.36%

Page per Visit

1.3

Visit Duration

00:00:33

LLaVA-OneVision Visit Trend

LLaVA-OneVision Visit Geography

LLaVA-OneVision Traffic Sources

LLaVA-OneVision Alternatives