PPLLaVA

GPU implementation model for video sequence understanding

CommonProductVideoVideo UnderstandingLarge Language Model
PPLLaVA is an efficient large-scale video language model that combines fine-grained visual prompt alignment, a convolutional-style pooling mechanism for visual token compression based on user instructions, and CLIP context extension. This model has achieved new state-of-the-art results on datasets such as VideoMME, MVBench, VideoChatGPT Bench, and VideoQA Bench, using only 1024 visual tokens, achieving an 8-fold improvement in throughput.
Visit

PPLLaVA Visit Over Time

Monthly Visits

494758773

Bounce Rate

37.69%

Page per Visit

5.7

Visit Duration

00:06:29

PPLLaVA Visit Trend

PPLLaVA Visit Geography

PPLLaVA Traffic Sources

PPLLaVA Alternatives