The Open-Sora Plan has been upgraded once again! The latest release, Open-Sora Plan v1.2, introduces a new 3D full attention architecture, significantly enhancing the AI's understanding of the physical world.

Key Highlights of This Update:

New 3D Full Attention Architecture: This new architecture has led to a qualitative leap in the AI's understanding of the physical world. It's no longer just a "QR code" that can only think in two dimensions; now, it can understand the three-dimensional world from every angle!

Improved Text-to-Video Generation: Type out a piece of text, and the AI will create a vivid video for you.

Improved Clarity and Consistency: Thanks to the new architecture and optimized VAE structure, Open-Sora's video output is clearer and more coherent. Say goodbye to blurry images!

Perfect Integration of Space and Time: The new 3D full attention architecture has solved a major issue with previous versions – simultaneously handling spatial and temporal dimensions. What does this mean? It means that the generated videos will see significant improvements in spatial representation and temporal smoothness!

Dramatically Improved Inference Speed: The optimized CausalVideoVAE structure not only enhances the model's performance but also makes the inference speed incredibly fast. Efficiency enthusiasts, rejoice!

image.png

Looking back at the development of Open-Sora, its progress is astonishing. In May 2024, the v1.1.0 version was still using a 2+1D model architecture, primarily for exploratory training. Now, just a few months later, it has evolved into a "creator" capable of building a 3D world! This speed might even make Darwin exclaim, "The theory of evolution needs to be rewritten!"

Best of all, the Open-Sora team is generous with sharing! Code, data, models – they're all open source, as if they've put the "how to create a world" manual right in front of you. Their goal is simple: to enable everyone to become a "god" of video creation! This open and shared attitude will undoubtedly accelerate the progress of AI video generation technology.

The release of Open-Sora Plan v1.2.0 marks a new era for video generation models. It not only significantly improves visual representation compression and inference efficiency but also points the way for future development.

Project Address: https://github.com/PKU-YuanGroup/Open-Sora-Plan/blob/main/docs/Report-v1.2.0.md