PPLLaVA

GPU implementation model for video sequence understanding

CommonProductVideoVideo UnderstandingLarge Language Model
PPLLaVA is an efficient large-scale video language model that combines fine-grained visual prompt alignment, a convolutional-style pooling mechanism for visual token compression based on user instructions, and CLIP context extension. This model has achieved new state-of-the-art results on datasets such as VideoMME, MVBench, VideoChatGPT Bench, and VideoQA Bench, using only 1024 visual tokens, achieving an 8-fold improvement in throughput.
Visit

PPLLaVA Visit Over Time

Monthly Visits

490881889

Bounce Rate

37.92%

Page per Visit

5.6

Visit Duration

00:06:18

PPLLaVA Visit Trend

PPLLaVA Visit Geography

PPLLaVA Traffic Sources

PPLLaVA Alternatives