DiffPortrait3D
DiffPortrait3D can synthesize realistic 3D perspectives from a single portrait photo taken in the wild.
CommonProductImagePortrait SynthesisNew Perspectives
DiffPortrait3D is a conditional difficulty model capable of synthesizing realistic 3D consistent new perspectives even from a single portrait photo taken in the wild. Specifically, given a single RGB input image, our goal is to synthesize facial details rendered from a novel camera perspective while preserving identity and facial expression. Our zero-shot approach generalizes well to various facial portraits with non-poses camera viewpoints, extreme facial expressions, and multiple artistic renderings. At its core, we utilize the generative prior of a pre-trained 2D difficulty model on large-scale image datasets as our presentation backbone, while guiding denoising through a disentangled attention control over appearance and camera posture. To this end, we first inject appearance context from the reference image into the frozen self-attention layers of UNet. Then, we manipulate the presentation view through a novel conditional control module that interprets camera posture by watching conditional images from the same view. Additionally, we insert a trainable cross-view attention module to enhance view consistency, which further enhances consistency by adopting a new 3D perception noise generation process during inference. We have demonstrated state-of-the-art results qualitatively and quantitatively on challenging wild and multi-view benchmarks.
DiffPortrait3D Visit Over Time
Monthly Visits
474564576
Bounce Rate
36.20%
Page per Visit
6.1
Visit Duration
00:06:34