LLaVA-NeXT

A large multimodal model that processes multi-image, video, and 3D data.

CommonProductImageMultimodalImage recognition
LLaVA-NeXT is a large multimodal model that handles multi-image, video, 3D, and single-image data through a unified interleaved data format, demonstrating its joint training abilities across different visual data modalities. The model has achieved leading results in multi-image benchmarks and has increased the performance or maintained performance of previous stand-alone tasks through appropriate data mixing in various scenarios.
Visit

LLaVA-NeXT Visit Over Time

Monthly Visits

120028

Bounce Rate

54.88%

Page per Visit

1.5

Visit Duration

00:00:50

LLaVA-NeXT Visit Trend

LLaVA-NeXT Visit Geography

LLaVA-NeXT Traffic Sources

LLaVA-NeXT Alternatives