LLaVA-NeXT
A large multimodal model that processes multi-image, video, and 3D data.
CommonProductImageMultimodalImage recognition
LLaVA-NeXT is a large multimodal model that handles multi-image, video, 3D, and single-image data through a unified interleaved data format, demonstrating its joint training abilities across different visual data modalities. The model has achieved leading results in multi-image benchmarks and has increased the performance or maintained performance of previous stand-alone tasks through appropriate data mixing in various scenarios.
LLaVA-NeXT Visit Over Time
Monthly Visits
88929
Bounce Rate
52.22%
Page per Visit
1.3
Visit Duration
00:00:17