Tencent announces the open-sourcing of its newly developed image-to-video generation framework, HunyuanVideo-I2V. This release follows the successful open-sourcing of HunyuanVideo and aims to further encourage exploration within the open-source community.

QQ_1741250034750.png

HunyuanVideo-I2V incorporates advanced video generation technology, enabling the transformation of static images into dynamic video content, offering creators expanded possibilities.

HunyuanVideo-I2V utilizes a pre-trained multi-modal large language model as a text encoder, significantly enhancing the model's understanding of the semantic content within the input image. This means the model generates semantic image tags from the input image, which are combined with potential video tags to achieve more comprehensive full-attention computation. This maximizes the synergy between image and text modalities, ensuring the generated video content from a static image is more coherent and realistic.

For users wishing to generate videos using HunyuanVideo-I2V, Tencent provides detailed installation guides and usage instructions. Users need to meet certain hardware requirements; an NVIDIA GPU with at least 80GB of VRAM is recommended for optimal video generation quality. The system supports video generation up to 720P resolution and 129 frames (approximately 5 seconds) in length.

To help users better utilize the model, Tencent also shares some tips, such as keeping prompts concise and ensuring they cover key elements, including the video's main subject, actions, and background.

Project: https://github.com/Tencent/HunyuanVideo-I2V?tab=readme-ov-file