Recently, a research team from Nanjing University, in collaboration with ByteDance and Southwest University, launched an innovative technology called STAR (Spatial-Temporal Augmentation with Text-to-Video Models). This technology aims to utilize text-to-video models to achieve super-resolution processing of real-world videos. By integrating spatiotemporal enhancement methods, it effectively improves the quality of low-resolution videos, making it particularly suitable for low-definition videos downloaded from video-sharing platforms.
To facilitate use by researchers and developers, the research team has released a pre-trained version of the STAR model on GitHub, which includes two models: I2VGen-XL and CogVideoX-5B, along with the relevant inference code. The launch of these tools marks a significant advancement in the field of video processing.
The process of using this model is relatively straightforward. First, users need to download the pre-trained STAR model from HuggingFace and place it in the specified directory. Next, they should prepare the video files to be tested and select suitable text prompt options, including no prompt, auto-generated, or manually entered prompts. Users simply need to adjust the path settings in the script to easily perform video super-resolution processing.
This project specifically designed two models based on I2VGen-XL for different levels of video degradation processing, ensuring it can meet various needs. Additionally, the CogVideoX-5B model specifically supports an input format of 720x480, providing flexible options for specific scenarios.
This research not only provides new insights for the development of video super-resolution technology but also opens new research directions for scholars in related fields. The research team expresses gratitude to cutting-edge technologies such as I2VGen-XL, VEnhancer, CogVideoX, and OpenVid-1M, believing that these technologies laid the foundation for their project.
Project link: https://github.com/NJU-PCALab/STAR
Key points:
🌟 The new technology STAR combines text-to-video models to achieve video super-resolution and enhance video quality.
🛠️ The research team has released pre-trained models and inference code, making the usage process simple and clear.
📩 Contact information is provided to encourage users to communicate and discuss with the research team.