LLaVA-NeXT

A large multimodal model that processes multi-image, video, and 3D data.

CommonProductImageMultimodalImage recognition
LLaVA-NeXT is a large multimodal model that handles multi-image, video, 3D, and single-image data through a unified interleaved data format, demonstrating its joint training abilities across different visual data modalities. The model has achieved leading results in multi-image benchmarks and has increased the performance or maintained performance of previous stand-alone tasks through appropriate data mixing in various scenarios.
Visit

LLaVA-NeXT Visit Over Time

Monthly Visits

107241

Bounce Rate

47.27%

Page per Visit

1.4

Visit Duration

00:00:18

LLaVA-NeXT Visit Trend

LLaVA-NeXT Visit Geography

LLaVA-NeXT Traffic Sources

LLaVA-NeXT Alternatives