VividTalk

Generate realistic, lip-synced rap videos

CommonProductImageAudio-drivenAvatar generation
VividTalk is a one-shot audio-driven avatar generation technique based on 3D mixed prior. It can generate realistic rap videos with rich expressions, natural head poses, and lip synchronization. This technique adopts a two-stage general framework to generate high-quality rap videos with all the above characteristics. Specifically, in the first stage, audio is mapped to a mesh by learning two types of motion (non-rigid facial motion and rigid head motion). For facial motion, a mixed shape and vertex representation is used as an intermediate representation to maximize the model's representational capability. For natural head motion, a novel learnable head posebook is proposed, and a two-stage training mechanism is adopted. In the second stage, a dual-branch motion VAE and a generator are proposed to convert the mesh into dense motion and synthesize high-quality videos frame by frame. Extensive experiments demonstrate that VividTalk can generate high-quality rap videos with lip synchronization and realistic enhancement, outperforming previous state-of-the-art works in both objective and subjective comparisons. The code for this technique will be publicly released after publication.
Visit

VividTalk Visit Over Time

Monthly Visits

48721

Bounce Rate

49.84%

Page per Visit

1.2

Visit Duration

00:00:16

VividTalk Visit Trend

VividTalk Visit Geography

VividTalk Traffic Sources

VividTalk Alternatives