GAIA

Voice-Driven Conversational Avatar Generation

CommonProductImageAvatar GenerationVoice-Driven
GAIA aims to synthesize natural conversational videos from voice and a single portrait image. We introduce GAIA (Generative Avatar AI) which eliminates domain priors in conversational avatar generation. GAIA consists of two stages: 1) decomposing each frame into motion and appearance representations; 2) generating a motion sequence conditioned on voice and a reference portrait image. We collected a large-scale high-quality conversational avatar dataset and trained the model at different scales. Experimental results validate GAIA's superiority, scalability, and flexibility. The methods include variational autoencoders (VAEs) and diffusion models. Diffusion models are optimized to generate motion sequences conditioned on a voice sequence and random frames in a video clip. GAIA can be used for various applications such as controllable conversational avatar generation and text-guided avatar generation.
Visit

GAIA Visit Over Time

Monthly Visits

834766

Bounce Rate

51.98%

Page per Visit

2.6

Visit Duration

00:02:16

GAIA Visit Trend

GAIA Visit Geography

GAIA Traffic Sources

GAIA Alternatives