2025-04-23 10:22:50.AIbase.17.4k
Revolutionizing Video Creation! Alibaba's VACE Model Unifies Text, Image, and Video Inputs
Scientists at Alibaba Group have introduced VACE, a universal AI model designed to unify a wide range of video generation and editing tasks. At the heart of VACE is an enhanced Diffusion Transformer architecture, innovating with a novel input format called "Video Conditional Unit" (VCU). VCU distills diverse modalities such as text prompts, reference images or video sequences, and spatial masks into a unified representation, and through a specialized mechanism coordinates different inputs to avoid conflicts. Concept decoupling enables fine-grained control.