A research team composed of Renmin University of China, Beijing University of Posts and Telecommunications, and Shanghai AI Lab has developed a technology called Ref-AVS, aimed at solving the challenge of artificial intelligence understanding the complex physical world. This technology utilizes a unique multimodal fusion method that integrates Video Object Segmentation (VOS), Video Object Reference Segmentation (Ref-VOS), and audiovisual segmentation (AVS) information, enabling AI systems to accurately identify and locate specific objects in audiovisual scenes, regardless of whether those objects produce sound. To validate the effectiveness of the technology, the research team...