Tsinghua and Peking University Collaborate to Release LVBench: A Long Video Understanding Benchmark

AIbase

Published inAI News · 3 min read · Jun 17, 2024

221

Website Master (ChinaZ.com) June 17 News: Recently, Zhipu, Tsinghua University, and Peking University have collaborated to launch LVBench, a long-form video understanding benchmark project. While existing multimodal large language models have made significant strides in short video comprehension, they still face challenges when dealing with lengthy videos spanning hours. To fill this gap, LVBench has emerged.

QQ Screenshot 20240617145826.png

This project includes hours of QA data across 6 main categories and 21 subcategories, covering various types of video content such as TV dramas, sports broadcasts, and daily surveillance footage sourced from public domains. All data has been meticulously annotated and challenging questions have been selected using large language models. It is reported that the LVBench dataset covers tasks such as video summarization, event detection, character recognition, and scene understanding.

QQ Screenshot 20240617145801.png

The launch of the LVBench benchmark aims not only to test models' reasoning and operational capabilities in long-form video scenarios but also to drive breakthroughs and innovations in related technologies, injecting new momentum into applications such as embodied intelligent decision-making in long videos, in-depth film reviews, and professional sports commentary.

Many research institutions are already working on the LVBench dataset, gradually expanding the boundaries of artificial intelligence in understanding long-term information streams by developing large models for long-form video tasks, and injecting new vitality into the ongoing exploration in video understanding and multimodal learning.

GitHub: https://github.com/THUDM/LVBench

Project: https://lvbench.github.io

Paper: https://arxiv.org/abs/2406.08035

LVBench Long-video understanding Multi-modal large language model Intelligent AI

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Major Breakthrough! Research Team Reveals the Hidden Reward Mechanism Inside Large Language Models

Jul 2, 2025

150

Institution: Downgrade the year-over-year growth rate of AI server shipments in 2025

North American large CSPs remain the main driving force behind the demand for AI servers, supported by tier-2 data centers as well as sovereign cloud projects in the Middle East and Europe. Overall demand remains stable. Driven by the demand from North American CSPs and OEMs, it is expected that AI server shipments will continue to grow at double-digit rates in 2025. However, due to changes in the international situation, the year-over-year growth rate of global AI server shipments in 2025 has been revised downward to 24.3%.

Jul 2, 2025

160

Baidu Launches the World's First Chinese Audio-Visual Generation Model MuseSteamer, Revolutionizing the Creative Process

Jul 2, 2025

210

WeChat AI Search Accused of Forced 'Opening the Box' to Name, Turning into Hyperlink Instantly - Tencent Responds: Only Integrates Public Information

The newly launched AI search function in WeChat has attracted widespread attention due to allegations of leaking personal privacy. Recently, many users reported on social platforms that this function can generate a personal resume with a name hyperlink in one click, causing concerns about privacy security among users. According to user feedback, the controversy surrounding WeChat AI Search mainly focuses on its automatic identification mechanism. When users encounter names in WeChat official account articles, the system automatically converts the name into a blue hyperlink. Clicking this link will force the AI system to generate a detailed information page containing personal resume, as well as display all

Jul 2, 2025

200

JD.com's Embodied Intelligence Strategy Accelerates Rapidly, JoyInside Collaboration Map Exposed

According to NetEase Technology, JD.com's layout in the field of embodied intelligence is accelerating rapidly. The embodied intelligence brand JoyInside under JD.com has reached cooperation with more than ten leading robot companies, becoming the core engine for JD.com to seize the smart robot market. According to insiders, JoyInside is supported by JD's large model technology, focusing on providing smart interaction capabilities between robots and consumers. Its product strategy focuses on scenario-based applications such as one person, one dog, and one toy. Since its launch, the brand has successfully attracted leading enterprises from multiple niche fields to join.

Jul 2, 2025

230

Foxconn Launches Its First AI Inference Large Model FoxBrain, Trademark Application Submitted

Recently, Hon Hai Precision Industrial Co., Ltd. (commonly known as Foxconn) submitted a trademark registration application for "FoxBrain" to the Trademark Office of the National Intellectual Property Administration. This AI inference large model is not only Foxconn's first attempt but also the first AI model of this type in Taiwan. According to public information, the international classification of this trademark is scientific instruments, and it is currently in the "waiting for substantive examination" status. "FoxBrain" is an AI inference large model launched by the Hon Hai Research Institute, covering data analysis

Jul 2, 2025

240

Zhipu AI Launches GLM-4.1V-Thinking Open Source! A New Leader in Multimodal Reasoning, Challenging Top Models Worldwide

Jul 2, 2025

260

Zhipu AI Open Sources GLM-4.1V-Thinking: A Breakthrough in Multimodal Reasoning

Zhipu AI officially open-sources its latest general vision model, GLM-4.1V-Thinking, based on the GLM-4V architecture, which introduces a chain-of-thought reasoning mechanism, significantly enhancing its capabilities for complex cognitive tasks. The model supports multimodal inputs such as images, videos, and documents, and excels in diverse scenarios including long video understanding, image question answering, subject problem-solving, text recognition, document interpretation, grounding, GUI Agent, and code generation, covering a wide range of industry application needs. GLM-4.1V-9B-Thinking

Jul 2, 2025

300

AI Daily: Baidu Launches Drawn-Imagine Platform and MuseSteamer; Alibaba's Audio-Driven Full-Body Digital Human Model OmniAvatar

Welcome to the [AI Daily] section! Here is your guide to exploring the world of artificial intelligence every day. Every day, we present you with the latest content in the AI field, focusing on developers, helping you understand technical trends and learn about innovative AI product applications. Click to learn more about new AI products: https://top.aibase.com/1、Open Source End-to-End Speech Large Model Step-Audio-AQAA: Understand audio and directly generate natural speech. Step-Audio-AQAA is an open source end-to-end speech large model,

Jul 2, 2025

240

Open Source End-to-End Speech Large Model Step-Audio-AQAA: Understand Audio and Generate Natural Speech Directly

Jul 2, 2025

220

AI News

AI Daily

AI Timeline

Al Hardware

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview

Tsinghua and Peking University Collaborate to Release LVBench: A Long Video Understanding Benchmark

AIbase

This article is from AIbase Daily

AI News Recommendations

Major Breakthrough! Research Team Reveals the Hidden Reward Mechanism Inside Large Language Models

Institution: Downgrade the year-over-year growth rate of AI server shipments in 2025

Baidu Launches the World's First Chinese Audio-Visual Generation Model MuseSteamer, Revolutionizing the Creative Process

WeChat AI Search Accused of Forced 'Opening the Box' to Name, Turning into Hyperlink Instantly - Tencent Responds: Only Integrates Public Information

JD.com's Embodied Intelligence Strategy Accelerates Rapidly, JoyInside Collaboration Map Exposed

Foxconn Launches Its First AI Inference Large Model FoxBrain, Trademark Application Submitted

Zhipu AI Launches GLM-4.1V-Thinking Open Source! A New Leader in Multimodal Reasoning, Challenging Top Models Worldwide

Zhipu AI Open Sources GLM-4.1V-Thinking: A Breakthrough in Multimodal Reasoning

AI Daily: Baidu Launches Drawn-Imagine Platform and MuseSteamer; Alibaba's Audio-Driven Full-Body Digital Human Model OmniAvatar

Open Source End-to-End Speech Large Model Step-Audio-AQAA: Understand Audio and Generate Natural Speech Directly