Colossal-AI has open-sourced the complete Sora replication architecture solution, Open-Sora, claiming a 46% reduction in replication costs and the ability to expand model training input sequence lengths up to 819K patches. The Sora algorithm replication solution, detailed in Sora's technical report, uses a video compression network to compress videos of various sizes into a sequence of spatiotemporal blocks in a latent space, followed by denoising using a Diffusion Transformer, and finally decoding to generate videos. Open-Sora encapsulates the potential training pipeline for Sora, providing a comprehensive replication architecture solution that includes the entire process from data processing to training and inference. Currently, Open-Sora covers the complete Sora replication architecture solution, supporting dynamic resolution, multiple model structures, various video compression methods, and multiple parallel training optimizations. In terms of performance, taking the performance test of the DiT-XL/2 model on a single H800 SXM 8*80GB GPU as an example, at a sequence length of 600K, Open-Sora's solution shows over a 40% performance improvement and cost reduction compared to the baseline solution. Open-Sora's open-source address: https://github.com/hpcaitech/Open-Sora.
Open-source Sora Reproduction Scheme, Cost Reduced by 46%, Sequence Extended to 819K Patches
开源中国
109
© Copyright AIbase Base 2024, Click to View Source - https://www.aibase.com/news/6392