Today, Alibaba officially announced the open-sourcing of its video generation model, Wanxiang 2.1, in two versions: 14B and 1.3B. The professional 14B version boasts high performance and industry-leading expressiveness, ideal for scenarios demanding superior video quality. The faster 1.3B version is optimized for consumer-grade GPUs, requiring only 8.2GB of VRAM to generate high-quality 480P videos, making it suitable for secondary model development and academic research.

Image

According to the official introduction, the open-sourced Wanxiang 2.1 demonstrates significant advantages in handling complex movements, reproducing realistic physics, enhancing cinematic quality, and improving instruction following. It caters to the diverse needs of creators, developers, and enterprise users. With Tongyi Wanxiang, users can easily generate high-quality videos, particularly meeting the high creative demands of advertising and short video production.

In the authoritative benchmark VBench, Tongyi Wanxiang achieved a top score of 86.22%, significantly outperforming other domestic and international video generation models such as Sora, Minimax, and Luma. This evaluation is based on the mainstream DiT and linear noise trajectory Flow Matching paradigms, leveraging several technical innovations to enhance the model's generation capabilities. Particularly noteworthy is the self-developed high-efficiency 3D causal VAE module, which achieves 256x lossless video latent space compression, supporting efficient encoding and decoding of videos of any length.

QQ_1740534242356.png

Tongyi Wanxiang employs a Full Attention mechanism based on the mainstream DiT architecture during video generation, effectively modeling spatiotemporal dependencies to ensure high-quality and consistent video output. The model's training strategy utilizes a six-stage stepwise training method, gradually introducing high-resolution data from initial low-resolution data training to guarantee excellent performance under various conditions. Furthermore, Tongyi Wanxiang employs a rigorous data cleaning process to ensure high-quality training data.

In terms of training and inference efficiency optimization, Tongyi Wanxiang utilizes various advanced techniques such as distributed training strategies, activation value optimization, and memory management to ensure stable model training and efficient inference. Combined with Alibaba Cloud's intelligent training cluster scheduling, the model can automatically identify and quickly restart in case of failures during training, ensuring a smooth training process.

Tongyi Wanxiang 2.1 has been open-sourced on platforms like GitHub and Hugging Face, supporting multiple mainstream frameworks and providing a convenient user experience for developers and researchers. Whether for rapid prototyping or efficient production deployment, Tongyi Wanxiang meets the needs of diverse users and injects new vitality into the development of video generation technology.

QQ_1740534298370.png

Modelscope Community Entrance: https://modelscope.cn/organization/Wan-AI

Key Highlights:

🌟 Tongyi Wanxiang 2.1 is open-sourced, supporting diverse video generation needs.

🏆 Achieved a high score of 86.22% in the VBench benchmark, surpassing other models.

🚀 Stepwise training and multiple technical optimizations enhance generation efficiency and quality.