VideoReTalking

Audio-driven video editing for high-quality lip-sync synchronization.

CommonProductVideoAudio-drivenLip-sync
VideoReTalking is a novel system that can edit real-world talking head videos to produce high-quality lip-sync output videos based on input audio, even with varying emotions. The system breaks down this goal into three consecutive tasks: (1) Generating facial videos with normalized expressions using an expression editing network; (2) Audio-driven lip-sync synchronization; (3) Facial enhancement to improve photorealism. Given a talking head video, we first use an expression editing network to modify the expressions of each frame according to a standardized expression template, resulting in a video with normalized expressions. This video is then input into a lip-sync network along with the given audio to generate a lip-sync video. Finally, we use an identity-aware facial enhancement network and post-processing to enhance the photorealism of the synthesized face. We utilize learning-based methods for all three steps, and all modules can be processed sequentially in a pipeline without any user intervention.
Visit

VideoReTalking Visit Over Time

Monthly Visits

1029

Bounce Rate

83.20%

Page per Visit

1.0

Visit Duration

00:00:00

VideoReTalking Visit Trend

VideoReTalking Visit Geography

No Geography Data

VideoReTalking Traffic Sources

VideoReTalking Alternatives