VideoPrism
Video Understanding Basic Model
CommonProductVideo\[\\\Video Understanding\\\\\\Encoder\\\
VideoPrism is a general-purpose video coding model that achieves leading performance across various video understanding tasks, including classification, localization, retrieval, subtitle generation, and Q&A. Its innovation lies in the very large and diverse pre-training dataset, which contains 36 million high-quality video-text pairs and 582 million video clips with noisy text. The pre-training uses a two-phase strategy: initially, it employs contrastive learning to match videos with text, followed by predicting masked video blocks to fully utilize different supervisory signals. A fixed VideoPrism model can be directly adapted to downstream tasks and has refreshed state-of-the-art scores on 30 video understanding benchmarks.
VideoPrism Visit Over Time
Monthly Visits
1120132
Bounce Rate
53.39%
Page per Visit
2.2
Visit Duration
00:00:41