Generating high-quality, temporally continuous videos requires substantial computational resources, particularly for longer time spans. The latest Diffusion Transformer models (DiTs) have made significant strides in video generation, but their reliance on larger models and more complex attention mechanisms leads to slower inference speeds, exacerbating this challenge. To address this issue, researchers at Meta AI have proposed a training-free method called AdaCache to accelerate video DiTs.
The core idea of AdaCache is based on the premise that "not all videos are the same," meaning that some videos require fewer denoising steps to achieve reasonable quality. Accordingly, this method not only caches computational results during the diffusion process but also designs customized caching strategies for each video, thereby optimizing the trade-off between quality and latency.
Researchers further introduced a Motion Regularization (MoReg) scheme, leveraging video information in AdaCache to control the allocation of computational resources based on motion content. Since video sequences with high-frequency textures and substantial motion require more diffusion steps to achieve reasonable quality, MoReg can better allocate computational resources.
Experimental results show that AdaCache can significantly enhance inference speed (e.g., up to 4.7 times faster in Open-Sora720p -2s video generation) without compromising the quality of the generated videos. Additionally, AdaCache boasts good generalization capabilities, applicable to different video DiT models such as Open-Sora, Open-Sora-Plan, and Latte. Compared to other training-free acceleration methods (e.g., ∆-DiT, T-GATE, and PAB), AdaCache offers significant advantages in both speed and quality.
User studies indicate that AdaCache-generated videos are preferred by users compared to other methods, and the perceived quality is on par with benchmark models. This research confirms the effectiveness of AdaCache and makes a significant contribution to the field of efficient video generation. Meta AI believes that AdaCache can be widely adopted and drive the democratization of high-fidelity long video generation.
Paper: https://arxiv.org/abs/2411.02397
Project Page:
https://adacache-dit.github.io/
GitHub:
https://github.com/AdaCache-DiT/AdaCache