Researchers from Fudan University and Baidu have jointly developed a new AI model called Hallo2, which can generate character animations with 4K resolution lasting for hours and can be precisely controlled through voice and text prompts.

Generating high-quality character animations has traditionally required extensive time and human resources. The emergence of Hallo2 is expected to revolutionize this status quo, bringing transformative changes to fields such as film production, virtual assistants, and game development.

image.png

The Hallo2 model is built on latent diffusion models and introduces a series of innovative technologies, including:

Patch-drop data augmentation technology: By randomly occluding motion frames, it prevents the model from over-relying on preceding frame information, ensuring stable appearance in long-sequence character animations.

Gaussian noise enhancement technology: Adding Gaussian noise to motion frames enhances the model's robustness against visual noise and motion distortion, further improving the quality and coherence of animations.

VQGAN discrete codebook prediction technology: Extending the VQGAN model to the temporal dimension and combining temporal alignment techniques to generate high-resolution videos while ensuring consistent details over time.

Text prompt control mechanism: By introducing an adaptive layer normalization mechanism, the model can precisely control character expressions and movements according to text prompts, making the animations more expressive and controllable.

The powerful performance of the Hallo2 model has been validated on multiple public datasets, including HDTF, CelebV, and the "Wild" dataset created by the researchers themselves. Experimental results show that Hallo2 surpasses all existing methods in generating high-quality, long-sequence character animations.

The release of the Hallo2 model marks a new step forward in AI character animation generation technology. In the future, researchers plan to further optimize the model's efficiency and controllability and explore its applications in more fields.

Project link: https://fudan-generative-vision.github.io/hallo2/#/