MA-LMM
MA-LMM is a large-scale multimodal model for long-term video understanding.
CommonProductVideoVideo UnderstandingMultimodal
MA-LMM is a large-scale multimodal model based on a large language model, primarily designed for long-term video understanding. It employs an online video processing approach and utilizes a memory store to retain past video information. This enables it to conduct long-term analysis of video content without exceeding the limitations of language model context length or GPU memory. MA-LMM can seamlessly integrate with existing multimodal language models and has achieved state-of-the-art performance in tasks such as long video understanding, video question answering, and video captioning.
MA-LMM Visit Over Time
Monthly Visits
735
Bounce Rate
41.23%
Page per Visit
1.0
Visit Duration
00:00:00