2024-10-28 14:42:03.AIbase.12.8k
Meta Open Sources Long Video LLM Project LongVU: Filters Duplicate Frames for Efficient and Accurate Understanding of Long Video Content
Recently, the Meta AI team introduced LongVU, a novel spatio-temporal adaptive compression mechanism aimed at enhancing the language understanding capabilities of long videos. Traditional multimodal large language models (MLLMs) face limitations in context length when processing long videos, and LongVU was created to address this challenge. LongVU operates primarily by filtering duplicate frames and employing inter-frame token compression techniques to efficiently utilize context length, allowing it to reduce video data while preserving visual details.