MiniGPT4-Video
MiniGPT4-Video is a multimodal AI video model for understanding complex videos and generating poetic captions.
CommonProductVideoVideo UnderstandingVideo Question Answering
MiniGPT4-Video is a multimodal large model designed for video understanding. It can process temporal visual data and text data, generate captions and slogans, and is suitable for video question answering. Based on MiniGPT-v2, it incorporates the visual backbone EVA-CLIP and undergoes multi-stage training, including large-scale video-text pre-training and video question-answering fine-tuning. It achieves significant improvements on benchmarks such as MSVD, MSRVTT, TGIF, and TVQA. The pricing is currently unknown.
MiniGPT4-Video Visit Over Time
Monthly Visits
2397
Bounce Rate
35.21%
Page per Visit
1.6
Visit Duration
00:02:11