Stable Video Portraits
Generates realistic dynamic face videos.
CommonProductImageArtificial Intelligence3D Facial Reconstruction
Stable Video Portraits is an innovative hybrid 2D/3D generation method that uses pre-trained text-to-image models (2D) and 3D shape models (3D) to generate realistic dynamic face videos. This technology elevates generic 2D stable diffusion models to video models through person-specific fine-tuning, providing a time-series 3D shape model as a condition, and introduces a temporal denoising process to generate temporally smooth facial images that can be edited and morphed into text-defined celebrity likenesses without additional fine-tuning at test time. This method outperforms existing monocular head avatar methods in both quantitative and qualitative analyses.