Sub-Hundred-Dollar Open-Source Video Model Pusa: High-Quality Reproduction at Low Cost via Mochi Fine-tuning

AIbase基地

Published inAI News · 4 min read · Apr 14, 2025

AI-powered video generation technology is rapidly advancing. Recently, an open-source video model called Pusa has garnered significant industry attention. This model, fine-tuned from the leading open-source video generation system Mochi, not only demonstrates acceptable results but, more importantly, has fully open-sourced the entire fine-tuning process, including training tools and the dataset, with a training cost of approximately $100, opening up new possibilities for research and application in the field of video generation.

Mochi-Based Fine-tuning, Showcasing Initial Video Generation Capabilities

Pusa-V0.5 is a preliminary version of the Pusa model, with Mochi1-Preview, a leading open-source video generation system on the Artificial Analysis Leaderboard, as its base model. Thanks to fine-tuning from Mochi, Pusa can support various video generation tasks, including text-to-video generation, image-to-video conversion, frame interpolation, video transitions, seamless looping, and extended video generation. Although the resolution of currently generated videos is relatively low (480p), it shows potential in terms of motion fidelity and prompt adherence.

Completely Open-Sourced Fine-tuning Workflow, Driving Community Collaboration

One of the most remarkable features of the Pusa project is its complete open-sourcing. Developers can not only access the complete code repository and detailed architecture specifications but also learn about the complete training methods. This means that researchers and developers can thoroughly understand Pusa's fine-tuning process, reproduce experiments, and, on this basis, carry out innovations and improvements. This open approach will undoubtedly greatly boost community cooperation and development.

Surprisingly Low Training Cost

Compared to large video models whose training often costs tens of thousands or even hundreds of thousands of dollars, Pusa's training cost is particularly striking. Reportedly, Pusa only used 16 H800 GPUs, completing training after approximately 500 iterations, with a total training time of only 0.1k H800 GPU hours and a total cost of approximately $0.1k (i.e., $100). This low training cost provides opportunities for more research institutions and individual developers to participate in video model research and development. The project team also indicated that efficiency can be further improved through single-node training and more advanced parallelization techniques.

Pusa utilizes a novel diffusion paradigm based on frame-level noise control and vectorized time steps, a method initially proposed in the FVDM paper, bringing unprecedented flexibility and scalability to video diffusion modeling. Furthermore, the adjustments made to the base model in Pusa are non-destructive, meaning it retains the text-to-video generation capabilities of the original Mochi, requiring only light fine-tuning.

Project: https://top.aibase.com/tool/pusa

ByteDance Releases Seaweed-7B Video Model: AI Video Generation Reaches New Heights

A new milestone has been reached in the field of AI video generation. AIbase learned from social media that ByteDance recently released a paper and demo of its new video generation model, Seaweed-7B, showcasing groundbreaking capabilities including synchronized audio and video generation, long-shot storytelling, and real-time high-resolution generation. This release signifies ByteDance's accelerated deployment in AI video technology. Below is AIbase's in-depth report on Seaweed-7B, analyzing its technological highlights and industry impact. Seaweed-7B is groundbreaking.

ModelScope Launches MCP Square, a New AI Open-Source Community Hub

ModelScope, Alibaba Cloud's AI open-source community, has officially launched its new MCP (Model Context Protocol) Square, becoming the largest Chinese MCP community. The platform features over a thousand popular MCP services and boasts exclusive premieres of new MCP services from Alipay, MiniMax, and others. It provides AI developers with abundant resources and tools, fostering innovation and the practical application of AI.

Hugging Face Acquires Pollen Robotics to Accelerate Open-Source Robotics

Hugging Face, the AI development platform, has announced the acquisition of French robotics startup Pollen Robotics for an undisclosed sum. This marks Hugging Face's first foray into hardware and aims to promote the global adoption and development of open-source robotics. Pollen Robotics, founded in 2016 and based in Bordeaux, France, is known for its open-source humanoid robot, Reachy2. Priced at approximately $70,000, Reachy2 has been adopted by institutions such as Cornell University.

Zhipu AI Launches New Domain Z.ai and Open-Sources 32B/9B GLM Model Series

Zhipu AI's technology team has announced the open-sourcing of its 32B and 9B GLM (General Language Model) model series, and the official launch of its new interactive platform, Z.ai. This model series includes base models, inference models, and contemplative models, all under a permissive MIT license. This grants developers extensive freedom for use and development, allowing free use for commercial purposes and free distribution.

AI Daily: Kunlun Wanwei Open-Sources Skywork-OR1 Series Models; iFlytek Xingchen Agent Platform Fully Supports MCP; Kimi Open-Sources Vision-Language Model Kimi-VL

Welcome to the 【AI Daily】column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with the hottest content in the AI field, focusing on developers, helping you understand technology trends and learn about innovative AI product applications. Discover new AI products here: https://top.aibase.com/ 1、Kimi open-sources vision-language models Kimi-VL and Kimi-VL-Thinking, surpassing GPT-4oMoonshot AI on multiple benchmarks.

Kunlun Wanwei Open-Sources Skywork-OR1 Series Models: Exceptional Mathematical and Coding Capabilities

Kunlun Wanwei's Tiangong team announced the launch of the upgraded Skywork-OR1 (Open Reasoner 1) series of models. This follows the February 2025 release of Skywork-o1, the company's first Chinese large language model focused on logical reasoning. The Skywork-OR1 series achieves industry-leading reasoning performance with the same parameter scale, further breaking through the bottlenecks in large model capabilities for logical understanding and complex task solving.