Alibaba recently unveiled its new open-source video generation model, Wan2.1, late at night. Boasting 14 billion parameters, it quickly ascended to the top of the VBench leaderboard, becoming a leading model in the video generation field. Unlike its predecessor, QwQ-Max, Wan2.1 excels in handling complex movements, smoothly rendering synchronized dances of multiple characters, a truly impressive feat.
In official demonstrations, Wan2.1 not only overcame challenges inherent in static image generation but also achieved new heights in text processing. While deploying the 14B parameter model on consumer-grade GPUs is difficult for average users, Alibaba also released a smaller 1.3B parameter version supporting 480P resolution, smoothly running on a 4070 GPU with 12GB of VRAM.
Image Source Note: Image generated by AI, licensed through Midjourney.
Besides the 14B and 1.3B versions, Alibaba released two additional video generation models, all under the Apache2.0 license, allowing for free commercial use. Users can access the model via Alibaba's platform for quick video generation, though high user volume may result in waiting times. Users with technical expertise can also install and debug the model through HuggingFace, MoDa community, and other channels.
Wan2.1's biggest highlight is its technological innovation. It utilizes a Diffusion Transformer architecture and a 3D variational autoencoder, specifically designed for video generation. By incorporating various compression and parallelization strategies, the model significantly improves generation efficiency without compromising quality. Studies show Wan's reconstruction speed is 2.5 times faster than current comparable technologies, greatly saving computational resources.
Wan2.1 has received positive user feedback. The model's detailed rendering of dynamic scenes and natural physics effects are particularly noteworthy. Users can create high-quality videos and easily achieve dynamic text presentation, opening up new creative possibilities.
Alibaba's Wan2.1 model is not only technologically advanced but also empowers creators with greater freedom, marking a significant breakthrough in video generation technology.