New Breakthrough in Video Understanding! Google Unveils Universal Video Model VideoPrism for Precise Classification, Localization, and Retrieval All in One!

AIbase

Published inAI News · 4 min read · Jul 14, 2024

331

In the world of AI, making machines understand videos is much harder than understanding images. Videos are dynamic, with sound, movement, and a myriad of complex scenes. In the past, AI viewed videos as if they were reading ancient scrolls, often leaving them baffled.

But the introduction of VideoPrism might change everything. It's a video encoder developed by Google's research team that can achieve state-of-the-art levels on a variety of video understanding tasks with a single model. Whether it's classifying videos, locating objects, generating subtitles, or answering questions about videos, VideoPrism can handle it effortlessly.

How to Train VideoPrism?

The process of training VideoPrism is like teaching a child to observe the world. First, you need to show it a variety of videos, ranging from everyday life to scientific observations, and everything in between. Then, you train it using "high-quality" video-subtitle pairs, as well as noisy parallel texts (such as text from automatic speech recognition).

Pre-training Methods

Data: VideoPrism uses 36 million high-quality video-subtitle pairs and 582 million video segments with noisy parallel texts.

Model Architecture: Based on the standard Vision Transformer (ViT), employing a factored design in both space and time.

Training Algorithms: Include video-text contrastive training and masked video modeling.

During the training process, VideoPrism undergoes two stages. In the first stage, it learns the relationship between videos and texts through contrastive learning and global-local distillation. In the second stage, it further enhances its understanding of video content through masked video modeling.

Researchers tested VideoPrism on multiple video understanding tasks, and the results were impressive. In 33 benchmark tests, VideoPrism achieved the state-of-the-art level in 30 of them. Whether it's answering questions about online videos or computer vision tasks in scientific fields, VideoPrism has demonstrated strong capabilities.

The birth of VideoPrism brings new possibilities to the field of AI video understanding. It not only helps AI understand video content better but may also play a significant role in various fields such as education, entertainment, and security.

However, VideoPrism also faces some challenges, such as how to handle long videos and how to avoid introducing biases during training. These are issues that future research needs to address.

Paper Address: https://arxiv.org/pdf/2402.13217

Shortcut Makes Its Debut! AI Excel Assistant Surpasses Human Champions by 10 Times, Task Automation Efficiency Soars

Recently, an AI Excel assistant called Shortcut has sparked heated discussions on social media. It enables users to effortlessly complete Excel tasks without writing complex formulas or VBA code through natural language processing (NLP) technology. The AIbase editorial team has compiled the latest information from social media to provide an in-depth analysis of Shortcut's powerful features and its potential impact on the fields of data processing and financial modeling. Shortcut: An Excel Revolution Driven by Natural Language

Claude Code Upgraded Again! Hooks Feature Unlocks a New Dimension in AI Programming, Making Automation Smarter

With the deep application of artificial intelligence technology in the field of programming, Claude Code launched by Anthropic has become a reliable assistant for many developers, thanks to its powerful code comprehension and automation capabilities. Just yesterday, Claude Code received an important update, introducing the Hooks feature, which provides developers with more precise control and a more efficient development experience. What is the Hooks feature? The Hooks feature is a user-defined shell introduced by Claude Code.

KPMG Report: China Leads in Medical Large Models, Accounting for 70% of the Global Total

A recent report titled "Health Tech 50 - The First Edition" released by KPMG China reveals that China has taken a leading position in the field of medical large models globally. The report indicates that the number of medical large models launched in China accounts for more than 70% of the global total, far surpassing other countries and regions. In terms of model categories, large language models (LLMs) are the most numerous, accounting for nearly 65%. Moreover, the report also highlights the strong growth momentum of the intelligent medical devices market in China. It is expected that by 2025, the scale of the intelligent medical devices market in China will reach 24.23 billion yuan, and it will continue to grow.

Xiaomi App Store Launches AI Intelligent Agent Zone, First Collaboration with Baidu Wenyi Intelligent Agent Platform

Since July, Xiaomi App Store will gradually open AI intelligent agent distribution services to users. After opening the Xiaomi App Store APP, users can click on the newly added 【Intelligent Agent】 entry at the bottom to directly access the special zone and browse and experience various practical and interesting AI intelligent agent products. At the same time, users can also quickly locate the desired service through the search function. The entire process does not require downloading or installing, truly achieving the convenient experience of "instant use." This innovative model not only lowers the threshold for users to access AI services, but also enhances the efficiency of service delivery through scenario-based recommendations. It is worth noting that

Google Launches Gemini for Education! Free AI Tools Sweep the Global Education Sector

Google recently announced the launch of a new AI tool suite called Gemini for Education, based on its latest generation Gemini 2.5 Pro model and the LearnLM learning large model specifically optimized for education, providing free, powerful, and efficient learning and teaching support for teachers and students around the world. This move marks another major breakthrough for Google in the field of educational technology, aiming to empower educators and students through AI technology, creating a more personalized and efficient learning experience. Gemini for Educa

AI News

AI Daily

AI Timeline

Al Hardware

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview

New Breakthrough in Video Understanding! Google Unveils Universal Video Model VideoPrism for Precise Classification, Localization, and Retrieval All in One!

AIbase

This article is from AIbase Daily

AI News Recommendations

Shortcut Makes Its Debut! AI Excel Assistant Surpasses Human Champions by 10 Times, Task Automation Efficiency Soars

2025 Global AI Talent Rankings: The Rise of Chinese Experts and Emerging Forces

A Daily: Bilibili Upgrades Anime Video Generation Model AniSora V3; ByteDance Open Sources 4D Video Generation Framework EX-4D; DeepSWE Open Sources AI Agent System Rises to the Top

Claude Code Upgraded Again! Hooks Feature Unlocks a New Dimension in AI Programming, Making Automation Smarter

Perplexity Launches Max Subscription Plan: Unlock Unlimited AI Productivity for $200 per Month

KPMG Report: China Leads in Medical Large Models, Accounting for 70% of the Global Total

Topview Avatar 2 Shakes the Market! AI Digital Humans Revolution E-commerce Live Streaming, Will the Era of Models Come to an End?

Perplexity Launches Monthly $200 Max Subscription Service to Unlock Advanced AI Models and Exclusive Features

Xiaomi App Store Launches AI Intelligent Agent Zone, First Collaboration with Baidu Wenyi Intelligent Agent Platform

Google Launches Gemini for Education! Free AI Tools Sweep the Global Education Sector