Kunlun Wanwei's SkyReels team has officially released and open-sourced SkyReels-V2, the world's first unlimited-length movie generation model using a Diffusion-forcing framework. This model achieves synergistic optimization by combining a Multimodal Large Language Model (MLLM), multi-stage pre-training, reinforcement learning, and the Diffusion-forcing framework, marking a new stage in video generation technology.
The release of SkyReels-V2 aims to address significant challenges in existing video generation technologies, including prompt adherence, visual quality, motion dynamics, and video length coordination. The model not only represents a technological breakthrough but also offers diverse applications, including story generation, image-to-video synthesis, cinematography expertise, and multi-agent consistent video generation (SkyReels-A2). SkyReels-V2 currently supports the generation of 30-second and 40-second videos and has the capability to generate high-motion quality, high-consistency, and high-fidelity videos.
Core technical innovations of SkyReels-V2 include:
Comprehensive cinematic-level video understanding model SkyCaptioner-V1: By using a structured video representation method, combined with general descriptions from the multimodal LLM and detailed shot language from sub-expert models, it significantly improves the understanding of shot language. This model can efficiently understand video data and generate diverse descriptions that conform to the original structural information.
Motion preference optimization: Through reinforcement learning training using manually annotated and synthetically distorted data, it addresses issues such as dynamic distortion and irrationality. SkyReels-V2 excels in motion dynamics, generating smooth and realistic video content.
Efficient Diffusion-forcing framework: By fine-tuning a pre-trained diffusion model, it's transformed into a Diffusion-forcing model, significantly improving generation efficiency. This method not only reduces training costs but also enables efficient long-video generation.
Progressive resolution pre-training and multi-stage post-training optimization: Integrating billions of data points from general datasets, self-collected media, and art resource libraries, a multi-stage optimization method ensures that SkyReels-V2 gradually improves performance in various aspects, achieving cinematic-level video generation even with limited resources.
In terms of performance evaluation, SkyReels-V2 performed excellently in SkyReels-Bench and V-Bench. SkyReels-Bench contains 1020 text prompts, systematically evaluating four key dimensions: instruction following, motion quality, consistency, and visual quality. In the SkyReels-Bench evaluation, SkyReels-V2 made significant progress in instruction following, while maintaining motion quality without sacrificing video consistency. In the V-Bench 1.0 automated evaluation, SkyReels-V2 outperformed all comparison models, including HunyuanVideo-13B and Wan2.1-14B, in both overall score (83.9%) and quality score (84.7%).
SkyReels-V2 has a wide range of applications, including:
Story generation: Using a sliding window method, the model refers to previously generated frames and text prompts when generating new frames, supporting temporal extension to generate long-shot videos with coherent narratives.
Image-to-video synthesis: Provides two image-to-video (I2V) generation methods, including fine-tuning the full-sequence text-to-video (T2V) diffusion model and combining the Diffusion-forcing model with frame conditions.
Cinematography director functionality: Through specially selected samples, it ensures a balanced representation of basic camera movements and their common combinations, significantly improving the photographic effect.
Element-to-video generation: Based on the SkyReels-V2 base model, the SkyReels-A2 solution has been developed, capable of combining arbitrary visual elements into coherent videos guided by text prompts.
The Kunlun Wanwei SkyReels team stated that they will continue to drive the development of video generation technology and will fully open-source the SkyCaptioner-V1 and SkyReels-V2 series models to promote further research and application in academia and industry. The team will also continue to optimize the performance of SkyReels-V2, explore more application scenarios, and further reduce computational costs, allowing it to be more widely applied in creative content production and virtual simulation fields.
GitHub address:
https://github.com/SkyworkAI/SkyReels-V2
Paper address:
https://arxiv.org/abs/2504.13074
SkyReels official website address:
https://www.skyreels.ai/home