LongVA

Long Contextual Transformer Model from Language to Vision

CommonProductImageLong ContextVisual Model
LongVA is a long context transformer model capable of processing over 2000 frames or 200K visual tokens. It achieves leading performance in Video-MME among 7B models. The model is tested on CUDA 11.8 and A100-SXM-80G and can be quickly deployed and used through the Hugging Face platform.
Visit

LongVA Visit Over Time

Monthly Visits

494758773

Bounce Rate

37.69%

Page per Visit

5.7

Visit Duration

00:06:29

LongVA Visit Trend

LongVA Visit Geography

LongVA Traffic Sources

LongVA Alternatives