Recently, the LuChen Open-Sora team has made groundbreaking progress in both 720p high-definition video quality and generation duration. Not only have they made a significant impact in these areas, but they've also open-sourced this valuable project, causing a stir in the community!

image.png

Without exaggeration, their open-source project has made video generation as simple as ordering takeout. Since its debut in March, it has garnered 17.5K stars on GitHub, becoming incredibly popular!

Open-source address: https://github.com/hpcaitech/Open-Sora

Open-Sora can generate 16 seconds of 720p high-definition video with a single click, whether it's exquisite portraits, cool sci-fi blockbusters, or lively animations, it can handle them all with smooth zoom effects. Even NVIDIA-backed AI company Lambda Labs has built a digital LEGO universe based on Open-Sora model weights, providing LEGO fans with a new creative paradise.

The LuChen team has not only open-sourced the model weights but also shared their technical roadmap on GitHub, allowing every player to become the master of video models. This technical report deeply analyzes the core and key points of model training, from video compression networks to diffusion model algorithms, and controllability, addressing the pain points of video model training with their 1.1B diffusion generation model.

image.png

Report address: https://github.com/hpcaitech/Open-Sora/blob/main/docs/report_03.md

The introduction of the video compression network is the same method used by OpenAI's Sora. It can compress in the time dimension by 4 times without frame skipping, allowing video generation at the original FPS. The team also proposed a simple video compression network (VAE), which first compresses in the spatial dimension by 8x8 times, and then compresses by 4 times in the time dimension.

The latest diffusion model of Stable Diffusion3, enhanced by rectified flow technology, has improved generation quality. The techniques provided by the LuChen team include rectified training, Logit-norm time step sampling, etc., which accelerate model training and reduce inference waiting time.

The report also reveals core details of model training, including data cleaning, model tuning techniques, and the construction of the model evaluation system. They even provide a one-click deployment Gradio application, supporting multiple parameter adjustments.

The open-source of LuChen Open-Sora has broken the closed loop, injecting vitality into the innovation and development of text-to-video. Users have shifted from content consumers to creators, and enterprise users have unlocked new self-development skills.