Open-source Sora Reproduction Scheme, Cost Reduced by 46%, Sequence Extended to 819K Patches

开源中国

Published inAI News · 2 min read · Mar 6, 2024

136

Colossal-AI has open-sourced the complete Sora replication architecture solution, Open-Sora, claiming a 46% reduction in replication costs and the ability to expand model training input sequence lengths up to 819K patches. The Sora algorithm replication solution, detailed in Sora's technical report, uses a video compression network to compress videos of various sizes into a sequence of spatiotemporal blocks in a latent space, followed by denoising using a Diffusion Transformer, and finally decoding to generate videos. Open-Sora encapsulates the potential training pipeline for Sora, providing a comprehensive replication architecture solution that includes the entire process from data processing to training and inference. Currently, Open-Sora covers the complete Sora replication architecture solution, supporting dynamic resolution, multiple model structures, various video compression methods, and multiple parallel training optimizations. In terms of performance, taking the performance test of the DiT-XL/2 model on a single H800 SXM 8*80GB GPU as an example, at a sequence length of 600K, Open-Sora's solution shows over a 40% performance improvement and cost reduction compared to the baseline solution. Open-Sora's open-source address: https://github.com/hpcaitech/Open-Sora.

Open-Sora Sora Reproduction Scheme Performance Improvement

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Musk Announces Open-Sourcing of Grokipedia and Plans to Etch and Launch It into Space, Aiming to Create a Modern Library of Alexandria

Elon Musk referred to xAI's AI encyclopedia Grokipedia as the 'Modern Library of Alexandria,' fully open-sourced, with plans to etch the knowledge base onto stable media and send it to deep space such as the Moon and Mars, to prevent loss of knowledge in case of civilization collapse. It currently includes approximately 885,000 articles generated by Grok, with real-time verification timestamps, allowing users to highlight, interact, and correct content.

Nov 19, 2025

Weibo Open Sources Vibe Thinker: 1.5 Billion Parameters Outperform DeepSeek R1 with a Training Cost of Only $7,800

Weibo launches the open-source large model Vibe Thinker, which has only 1.5 billion parameters but outperforms the 671 billion parameter DeepSeek R1 in mathematical competition benchmarks, with higher accuracy and a training cost of only $7,800. It adopts a lightweight MoE architecture and knowledge distillation technology, requiring only 5GB of mathematical corpus for fine-tuning. It supports downloading from Hugging Face and commercial use. The model performs outstandingly in international math competitions such as AIME.

Nov 18, 2025

120

Open Source Intelligent Agent MiroThinker v1.0 Released: 256K Context Support for 600 Tool Calls, Proposes a Deep Interaction Scaling Framework

MiroMind releases MiroThinker v1.0 with 256K context and 600 tool calls. It introduces 'Deep Interactive Scaling' for self-evolution via real-time feedback, integrating tools like search and code execution to autonomously handle complex tasks in hours.....

Nov 17, 2025

160

Google DeepMind Releases Preview of SIMA 2, Performance Doubles on the Way to General Robots

Google DeepMind launches SIMA2, a multimodal agent based on Gemini 2.5 Flash-lite, doubling task success rates. It executes complex instructions in unfamiliar environments with self-improvement via auto-generated data cycles. Released as a research preview to advance AGI capabilities.....

Nov 14, 2025

180

Weibo Launches VibeThinker-1.5B, a Low-Cost AI Model Challenging Large Language Models

The Weibo AI department has launched the open-source large model VibeThinker-1.5B, which has 1.5 billion parameters. The model is optimized based on Alibaba's Qwen2.5-Math-1.5B and performs well in math and code tasks. It is now freely available on platforms such as Hugging Face, and it follows the MIT license, supporting commercial use.

Nov 13, 2025

320

OpenAI bets 15 million dollars a day on Sora in the AI video race, 4 million users can't hide business model crisis

The daily operating cost of OpenAI's Sora video generation app reaches 15 million dollars, and the annual expenditure could exceed 5 billion dollars. Within a month of launch, the download count exceeded 4 million, with millions of short videos produced daily. However, high traffic comes with massive losses, exposing a financial sustainability crisis, requiring urgent strategic adjustments. The viral spread conceals the essence of false prosperity.

Nov 12, 2025

230

New Open-Source Speech Model Maya1: Achieving Real-Time, Expressive Text-to-Speech

Maya Research launches the Maya1 text-to-speech model, with 3 billion parameters, which can run in real-time on a single GPU. The model generates controllable and expressive speech through natural language descriptions and text input, accurately simulating human emotions and voice details, such as specifying age, accent, or character traits.

Nov 12, 2025

240

Johns Hopkins University Releases EGO-Prompt Framework to Help Small AI Models Achieve Performance Improvements of Large Models

JHU's EGO-Prompt framework boosts small language models' performance by nearly 50% and cuts costs by 83% in specialized fields like healthcare and transportation, enabling them to rival large reasoning models through optimized prompt design.....

Nov 12, 2025

220

New Version of Firefox Is Accused of Having AI Features Enabled by Default, Sparking Ongoing Debate on Privacy and Performance

The new version of Firefox has sparked controversy by enabling AI features by default, with users concerned about privacy and performance issues. Tests show that enabling it significantly increases CPU and memory usage, affecting the browsing experience, and most users were unaware of this.

Nov 11, 2025

270

SenseNova-SI Model Released by SenseTime, Spatial Intelligence Performance Exceeds GPT-5

SenseTime Technology released the open-source SenseNova-SI series model, achieving breakthroughs in the field of spatial intelligence. The model exceeded international top closed-source models such as GPT-5 in authoritative evaluations, compensating for the current shortcomings of large models in spatial understanding and reasoning, and demonstrating exceptional performance.

Nov 11, 2025

240

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

AI Brand Monitoring Tool

GEO Services​

AI Search Visibility Checker

AI Model Compatibility Checker

AI Deployment Calculator

AI Dataset Collection

Intelligent Document Recognition

Open-source Sora Reproduction Scheme, Cost Reduced by 46%, Sequence Extended to 819K Patches

开源中国

This article is from AIbase Daily

AI News Recommendations

Musk Announces Open-Sourcing of Grokipedia and Plans to Etch and Launch It into Space, Aiming to Create a Modern Library of Alexandria

Weibo Open Sources Vibe Thinker: 1.5 Billion Parameters Outperform DeepSeek R1 with a Training Cost of Only $7,800

Open Source Intelligent Agent MiroThinker v1.0 Released: 256K Context Support for 600 Tool Calls, Proposes a Deep Interaction Scaling Framework

Google DeepMind Releases Preview of SIMA 2, Performance Doubles on the Way to General Robots

Weibo Launches VibeThinker-1.5B, a Low-Cost AI Model Challenging Large Language Models

OpenAI bets 15 million dollars a day on Sora in the AI video race, 4 million users can't hide business model crisis

New Open-Source Speech Model Maya1: Achieving Real-Time, Expressive Text-to-Speech

Johns Hopkins University Releases EGO-Prompt Framework to Help Small AI Models Achieve Performance Improvements of Large Models

New Version of Firefox Is Accused of Having AI Features Enabled by Default, Sparking Ongoing Debate on Privacy and Performance

SenseNova-SI Model Released by SenseTime, Spatial Intelligence Performance Exceeds GPT-5

GEO Services