Goldfish
Advanced model for video understanding
CommonProductVideoVideo understandingLong video processing
Goldfish is a methodological approach designed for understanding videos of arbitrary length. It collects the top k video segments related to the instruction in an efficient retrieval mechanism, and then provides the required response. This design allows Goldfish to handle arbitrary long video sequences effectively, suitable for scenarios such as movies or TV series. To facilitate retrieval, MiniGPT4-Video is developed to generate detailed descriptions for video segments. Goldfish achieves an accuracy of 41.78% on the long video benchmark of TVQA-long, surpassing the previous methods by 14.94%. Moreover, MiniGPT4-Video also performs outstandingly in understanding short videos, surpassing the existing best methods by 3.23%, 2.03%, 16.5%, and 23.59% respectively on the short video benchmarks of MSVD, MSRVTT, TGIF, and TVQA. These results demonstrate that the Goldfish model has significantly improved in both long video and short video understanding.
Goldfish Visit Over Time
Monthly Visits
2397
Bounce Rate
35.21%
Page per Visit
1.6
Visit Duration
00:02:11