The data to be translated: Alibaba announced the open-source I2VGen-XL image-to-video model in a paper published in November, and now the specific code and model have finally been released. This model processes through two stages: the first is the base stage, ensuring semantic coherence, followed by the refinement stage, which enhances video details and resolution by integrating short texts. The research team optimized the I2VGen-XL model by collecting extensive data, resulting in higher semantic accuracy, detail continuity, and clarity in video generation. Detailed code can be found on GitHub.