LLaVA-NeXT
A large multimodal model that processes multi-image, video, and 3D data.
CommonProductImageMultimodalImage recognition
LLaVA-NeXT is a large multimodal model that handles multi-image, video, 3D, and single-image data through a unified interleaved data format, demonstrating its joint training abilities across different visual data modalities. The model has achieved leading results in multi-image benchmarks and has increased the performance or maintained performance of previous stand-alone tasks through appropriate data mixing in various scenarios.
LLaVA-NeXT Visit Over Time
Monthly Visits
107241
Bounce Rate
47.27%
Page per Visit
1.4
Visit Duration
00:00:18