MiniGPT4-Video
MiniGPT4-Video is a multimodal AI video model for understanding complex videos and generating poetic captions.
CommonProductVideoVideo UnderstandingVideo Question Answering
MiniGPT4-Video is a multimodal large model designed for video understanding. It can process temporal visual data and text data, generate captions and slogans, and is suitable for video question answering. Based on MiniGPT-v2, it incorporates the visual backbone EVA-CLIP and undergoes multi-stage training, including large-scale video-text pre-training and video question-answering fine-tuning. It achieves significant improvements on benchmarks such as MSVD, MSRVTT, TGIF, and TVQA. The pricing is currently unknown.
MiniGPT4-Video Visit Over Time
Monthly Visits
8453
Bounce Rate
43.97%
Page per Visit
2.8
Visit Duration
00:09:09