Zhipu AI Unveils Open-Source Video Understanding Model CogVLM2-Video

AIbase

Published inAI News · 3 min read · Jul 12, 2024

268

Intelligent Spectrum AI has announced the open-source upgrade of the CogVLM2-Video model, a significant advancement in the field of video understanding. The CogVLM2-Video model addresses the limitations of existing video understanding models in handling missing temporal information by introducing multi-frame video images and timestamps as input to the encoder. Utilizing an automated method for constructing time localization data, it has generated 30,000 temporal-related video question and answer data, thereby training a model that achieves the latest performance on public video understanding benchmarks. The CogVLM2-Video model excels in video subtitle generation and time localization, providing powerful tools for video generation and summarization tasks.

The CogVLM2-Video model achieves time localization and related question answering by extracting frames from the input video and annotating timestamp information, enabling the language model to accurately know the corresponding time of each frame.

WeChat Screenshot_20240712135239.png

To facilitate large-scale training, an automated video question and answer data generation process has been developed. By combining the use of an image understanding model and a large language model, it reduces annotation costs and improves data quality. The constructed Temporal Grounding Question and Answer (TQA) dataset contains 30,000 records, providing rich time localization data for model training.

The CogVLM2-Video model has demonstrated excellent performance on multiple public evaluation datasets, including outstanding results in quantitative assessment indicators such as VideoChatGPT-Bench, Zero-shot QA, and MVBench.

Code: https://github.com/THUDM/CogVLM2

Project Website: https://cogvlm2-video.github.io

Online Trial: http://36.103.203.44:7868/

AI Model Video Understanding Temporal Localization Video Question Answering Data Generation

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Tencent Video Launches AI Repair for Classic Film and Television Works to Restore 4K Quality

Tencent Video launched AI-enhanced 4K classic shows and movies for SVIP users, including 'Home with Kids' and 'Ne Zha Legend', accessible via 'MAX' option.....

Sep 19, 2025

140

Hong Kong's Ultrasound Field Sees an AI Revolution! New Large Model Helps Doctors Diagnose Easily

Hong Kong launched EchoCare, the first ultrasound AI model trained on 4M+ images, to address the shortage of 150K sonographers in China, improving efficiency and diagnosis.....

Sep 19, 2025

110

Volc Engine Dominates the Market! Analysis of Large Model Services in China's Public Cloud in 2025

IDC predicts China's public cloud AI model calls will hit 536.7T tokens by H1 2025, with Volcano Engine leading at 49.2% market share, followed by Alibaba Cloud (27%) and Baidu AI Cloud (17%).....

Sep 19, 2025

180

Alibaba Cloud Launches the Next-Generation Action Generation Model Wan2.2-Animate, Fully Open-Sourced!

Alibaba Cloud opens its Wan2.2-Animate model, enhancing short video and animation creation with improved motion imitation and role-play features, available on GitHub, HuggingFace, and its API platform.....

Sep 19, 2025

120

AI Daily: Xiaomi Opensources Its First Native End-to-End Speech Large Model; Tongyi Wanxiang Wan2.2-Animate Officially Open-Sourced; Suno v5 to Launch Soon

Xiaomi open-sourced its first native end-to-end speech model, Xiaomi-MiMo-Audio, showcasing breakthroughs in speech tech with strong few-shot generalization and outperforming proprietary models in benchmarks.....

Sep 19, 2025

180

Shengshu Technology Secures Several Billion Yuan in Funding, Driving New Trends in AI Commercialization through Video Generation

Recently, Shengshu Technology, a leading company in the field of multimodal AI, announced the successful completion of an A-round funding round worth several billion yuan. This round was led by Bohua Capital, with existing investors such as Baidu's strategic investment division and the Beijing Artificial Intelligence Industry Investment Fund continuing to participate, demonstrating strong market recognition of Shengshu Technology. The company plans to use the funds to further advance model R&D and technological innovation, explore the potential of multimodal large models, and accelerate product expansion and user services. Multimodal technology, especially in the field of video generation, is currently experiencing rapid development.

Sep 19, 2025

Google Chrome Browser Adds New AI Features, How Should Internet Users Respond?

Google recently announced that the Chrome browser will undergo its largest update to date, primarily by adding AI features to enhance user experience. This update will be rolled out today to macOS and Windows users in the United States, with users who have English settings being the first to experience these new features. Mike Torres, Vice President of Google Products, stated that the core of this update is 'Geminiization,' and users can now access AI capabilities for web pages through a newly added Gemini button.

Sep 19, 2025

Suno v5 Music Model is About to Launch, AI Music Creation is About to Experience a Revolutionary Upgrade

Suno recently sparked global discussions through a mysterious teaser video: its fifth-generation music model 'v5' is about to be released. This announcement is seen by the industry as a 'revolutionary' milestone in AI music creation, and is expected to further blur the boundaries between human composition and machine-generated music, significantly lowering the barriers to entry for creators from amateur enthusiasts to professional producers. Suno officially posted a 15-second short video on social media at night on September 18th, showing abstract notes and interwoven light and shadow, accompanied by a deep electronic melody, ending with 'coming soon'.

Sep 19, 2025

580

Tongyi Wanxiang's New Action Generation Model Wan2.2-Animate Officially Open-Sourced

On September 19, 2025, Alibaba Cloud announced the official open-sourcing of Tongyi Wanxiang's new action generation model Wan2.2-Animate. This model can drive photos of people, anime characters, and animals, and is widely applied in short video creation, dance template generation, and animation production. Users can download the model and code on GitHub, HuggingFace, and the Moda Community, or call the API through the Alibaba Cloud BaiLian platform or experience it directly on the Tongyi Wanxiang website. Wan2.2-Animate

Sep 19, 2025

200

Tencent HuanYuan 3D Studio Makes a Stunning Debut: 3D Creation Speeds Up from Days to Minutes

On September 19, 2025, Tencent launched HuanYuan 3D Studio, an AI workbench specifically designed for 3D designers, game developers, and modelers. This is Tencent's second major release within a week. The platform reduces the 3D asset production cycle from days to minutes, achieving a revolutionary improvement in production efficiency. A one-stop platform covers the entire creative process. The initial version of HuanYuan 3D Studio has been launched, featuring character and prop creation pipelines, integrating the entire workflow from concept design, geometric modeling, to texture mapping, skinning, and animation production. The platform is based on

Sep 19, 2025

370

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

Zhipu AI Unveils Open-Source Video Understanding Model CogVLM2-Video

AIbase

This article is from AIbase Daily

AI News Recommendations

Tencent Video Launches AI Repair for Classic Film and Television Works to Restore 4K Quality

Hong Kong's Ultrasound Field Sees an AI Revolution! New Large Model Helps Doctors Diagnose Easily

Volc Engine Dominates the Market! Analysis of Large Model Services in China's Public Cloud in 2025

Alibaba Cloud Launches the Next-Generation Action Generation Model Wan2.2-Animate, Fully Open-Sourced!

AI Daily: Xiaomi Opensources Its First Native End-to-End Speech Large Model; Tongyi Wanxiang Wan2.2-Animate Officially Open-Sourced; Suno v5 to Launch Soon

Shengshu Technology Secures Several Billion Yuan in Funding, Driving New Trends in AI Commercialization through Video Generation

Google Chrome Browser Adds New AI Features, How Should Internet Users Respond?

Suno v5 Music Model is About to Launch, AI Music Creation is About to Experience a Revolutionary Upgrade

Tongyi Wanxiang's New Action Generation Model Wan2.2-Animate Officially Open-Sourced

Tencent HuanYuan 3D Studio Makes a Stunning Debut: 3D Creation Speeds Up from Days to Minutes

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services​

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

Zhipu AI Unveils Open-Source Video Understanding Model CogVLM2-Video

AIbase

This article is from AIbase Daily

AI News Recommendations

Tencent Video Launches AI Repair for Classic Film and Television Works to Restore 4K Quality

Hong Kong's Ultrasound Field Sees an AI Revolution! New Large Model Helps Doctors Diagnose Easily

Volc Engine Dominates the Market! Analysis of Large Model Services in China's Public Cloud in 2025

Alibaba Cloud Launches the Next-Generation Action Generation Model Wan2.2-Animate, Fully Open-Sourced!

AI Daily: Xiaomi Opensources Its First Native End-to-End Speech Large Model; Tongyi Wanxiang Wan2.2-Animate Officially Open-Sourced; Suno v5 to Launch Soon

Shengshu Technology Secures Several Billion Yuan in Funding, Driving New Trends in AI Commercialization through Video Generation

Google Chrome Browser Adds New AI Features, How Should Internet Users Respond?

Suno v5 Music Model is About to Launch, AI Music Creation is About to Experience a Revolutionary Upgrade

Tongyi Wanxiang's New Action Generation Model Wan2.2-Animate Officially Open-Sourced

Tencent HuanYuan 3D Studio Makes a Stunning Debut: 3D Creation Speeds Up from Days to Minutes

GEO Services