AI News

Don't miss any moment of global AI innovation

AI Daily

Daily three-minute AI industry trends

AI Timeline

AI industry milestones

Al Hardware

Lists all AI hardware products.

AI Monetization Guide

Latest Cases

AI monetization case sharing

Image Collection

AI image creation monetization cases

Video Collection

AI video creation monetization cases

Audio Collection

AI audio creation monetization cases

Content Collection

AI content writing monetization cases

AI Tutorials

Latest Tutorials

Free sharing of the latest AI tutorials

AI Product Rankings

AI Product Ranking

Shows total visits ranking of AI websites

AI Traffic Growth Ranking

Track fastest growing AI websites by traffic

AI Traffic Decline Ranking

Focus on AI websites with significant traffic drops

AI Weekly Ranking

Shows weekly visits ranking of AI websites

Popular Country Rankings

United States

AI websites most popular with US users

China

AI websites most popular with Chinese users

India

AI websites most popular with Indian users

Brazil

AI websites most popular with Brazilian users

Popular Category Rankings

Image Generation

Total visits ranking of AI image generation websites

Personal Assistant

Total visits ranking of AI personal assistant websites

Character Generation

Total visits ranking of AI character generation websites

Video Generation

Total visits ranking of AI video generation websites

Popular Open Source Data Rankings

AI Project Ranking

GitHub popular AI projects by total stars

AI Project Growth Ranking

GitHub popular AI projects by growth rate

AI Developer Ranking

GitHub popular AI developer ranking

AI Organization Ranking

GitHub popular AI organization ranking

Popular Open Source Categories

Deepseek

GitHub popular deepseek open source projects

TTS

GitHub popular TTS open source projects

LLM

GitHub popular LLM open source projects

ChatGPT

GitHub popular ChatGPT open source projects

AI Open Source Project Library

Overview

Overview of GitHub popular AI open source projects

Product Library Tool Navigation MCP

Breakthrough Innovation by Renmin University and Beijing University of Posts and Telecommunications: Ref-AVS Technology Makes AI Better Understand the Human World

AIbase基地

Published inAI News · 6 min read · Aug 30, 2024

In the field of artificial intelligence, enabling machines to understand the complex physical world like humans has always been a significant challenge. Recently, a research team consisting of institutions such as Renmin University of China, Beijing University of Posts and Telecommunications, and Shanghai AI Lab, has proposed a groundbreaking technology—Ref-AVS, which brings new hope to solving this problem.

The core of the Ref-AVS technology lies in its unique multi-modal fusion method. It ingeniously integrates various modalities such as Video Object Segmentation (VOS), Reference Video Object Segmentation (Ref-VOS), and Audio-Visual Segmentation (AVS). This innovative fusion allows AI systems not only to process objects that are producing sounds but also to identify silent but equally important objects in the scene. This breakthrough enables AI to more accurately understand instructions described in natural language and to precisely locate specific objects in complex audio-visual scenes.

To support the research and validation of the Ref-AVS technology, the research team constructed a large-scale dataset named Ref-AVS Bench. This dataset includes 40,020 video frames, covering 6,888 objects and 20,261 referential expressions. Each video frame is accompanied by corresponding audio and pixel-level detailed annotations. This rich and diverse dataset provides a solid foundation for multi-modal research and opens up new possibilities for future studies in related fields.

In a series of rigorous quantitative and qualitative experiments, the Ref-AVS technology demonstrated outstanding performance. Particularly on the Seen subset, Ref-AVS outperformed existing methods, fully proving its powerful segmentation capabilities. More notably, the test results on the Unseen and Null subsets further validated the excellent generalization ability and robustness to null references of the Ref-AVS technology, which is crucial for real-world applications.

The success of the Ref-AVS technology has not only garnered widespread attention in the academic community but also paved the way for future practical applications. We can foresee that this technology will play a significant role in various fields such as video analysis, medical image processing, autonomous driving, and robotics navigation. For example, in the medical field, Ref-AVS may help doctors interpret complex medical images more accurately; in autonomous driving, it may enhance vehicles' perception of the surrounding environment; in robotics, it may enable robots to better understand and execute human verbal instructions.

This research has been presented at ECCV2024, and related papers and project information are publicly available, providing valuable learning and exploration resources for researchers and developers worldwide interested in this field. This open-sharing attitude not only reflects the academic spirit of Chinese research teams but also will accelerate the rapid development of the entire AI field.

The emergence of the Ref-AVS technology marks an important step forward in artificial intelligence's multi-modal understanding. It not only showcases the innovative capabilities of Chinese research teams in the AI field but also paints a more intelligent and natural future for human-computer interaction. With the continuous improvement and application of this technology, we have reason to expect that future AI systems will be better able to understand and adapt to the complex world of humans, bringing revolutionary changes to various industries.

Paper link: https://arxiv.org/abs/2407.10957

Project homepage: https://gewu-lab.github.io/Ref-AVS/

Ref-AVS Multimodal Fusion Video Object Segmentation Ref-AVSBench

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Google Veo 3 Video Generation Model Now Available to Pro/Ultra Subscribers, Will Add Photo-to-Video Function

Jul 4, 2025

A Daily: Bilibili Upgrades Anime Video Generation Model AniSora V3; ByteDance Open Sources 4D Video Generation Framework EX-4D; DeepSWE Open Sources AI Agent System Rises to the Top

Jul 3, 2025

130

Bilibili Open-Sourced Anime Video Generation Model AniSora V3 Version - One-Click Generation of Various Style Anime Video Shots

Jul 3, 2025

270

Byte EX-4D Technology Achieves Monocular Video 4D Conversion, Unlocking High-Quality Content Generation Under Extreme Perspectives

The EX-4D (Extreme Viewpoint 4D Video Generation) technology, developed by the research team tau-yihouxiang, is a groundbreaking innovation in video generation that is gaining widespread attention globally. This technology aims to transform monocular videos into controllable 4D experiences, particularly demonstrating excellent performance under extreme camera angles. The core of the EX-4D technology lies in its unique 'depth watertight mesh' construction method. This novel geometric representation

Jul 3, 2025

100

ByteDance EX-4D Shakes Open Source: Turn Monocular Video into Free Perspective 4D Movie

Jul 3, 2025

260

Zhipu AI Launches GLM-4.1V-Thinking Open Source! A New Leader in Multimodal Reasoning, Challenging Top Models Worldwide

Jul 2, 2025

600

Zhipu AI Open Sources GLM-4.1V-Thinking: A Breakthrough in Multimodal Reasoning

Zhipu AI officially open-sources its latest general vision model, GLM-4.1V-Thinking, based on the GLM-4V architecture, which introduces a chain-of-thought reasoning mechanism, significantly enhancing its capabilities for complex cognitive tasks. The model supports multimodal inputs such as images, videos, and documents, and excels in diverse scenarios including long video understanding, image question answering, subject problem-solving, text recognition, document interpretation, grounding, GUI Agent, and code generation, covering a wide range of industry application needs. GLM-4.1V-9B-Thinking

Jul 2, 2025

570

Baidu Launches the HuiXiang Platform and MuseSteamer: AI-Generated Video with a Single Image to Create Professional-Level Movies!

At today's Baidu AI DAY technology open day, Baidu's commercial R&D team officially launched its self-developed video generation model MuseSteamer and the accompanying video product platform **HuiXiang**. This innovation aims to create a comprehensive video generation solution by combining generative AI and multimodal technology, to meet the strong demand for native content production in scenarios such as search, advertising, and recommendations. The MuseSteamer video generation model series is rich, currently including Turbo, Lite, Pro, and

Jul 2, 2025

1.2k

Baidu Launches Self-Developed Video Generation Model MuseSteamer and Video Product Platform HuiXiang

At the recent Baidu AIDAY Technology Open Day event, the Baidu Commercial R&D team officially announced two major innovative achievements: the self-developed video generation model MuseSteamer and the new video product platform "HuiXiang." MuseSteamer, as Baidu's self-developed video generation model, marks a significant progress in Baidu's artificial intelligence generated content (AIGC) field, especially in video creation. The simultaneous release of the video product platform HuiXiang will provide users with an integrated tool.

Jul 2, 2025

300

Tesla Full Self-Driving Delivery Video Shocks the World: Fully Autonomous from Factory to Customer's Home!

Tesla once again leads the automotive industry's technological revolution! Recently, Tesla released the world's first artificial intelligence (AI) full self-driving (FSD) delivery video from factory to customer's home, showcasing the latest breakthroughs in its autonomous driving technology. This 17-mile journey, lasting about 30 minutes, spans parking lots, highways, and city roads, ultimately delivering the vehicle accurately to the new owner's home. Full autonomous driving, a technological milestone. The video released by Tesla demonstrates the impressive performance of its FSD system in real-world scenarios. Starting from the factory, the car

Jul 1, 2025

150