Goodbye Voice Actors? ByteDance's PersonaTalk Achieves Accurate Voiceover with Perfect Expression Details!

AIbase基地

Published inAI News · 4 min read · Oct 28, 2024

1.0k

Recently, ByteDance developed an AI model named PersonaTalk, which can accurately dub videos while maintaining lip synchronization and perfectly matching the speaking style.

PersonaTalk is a two-stage framework based on an attention mechanism, including geometric structure and facial rendering. In the first stage, it uses a hybrid geometric estimation method to extract the speaker's facial geometric coefficients from the reference video. Then, it extracts and encodes audio features from the target audio and learns personalized speaking styles from geometric statistical features, injecting them into the audio features. Finally, it generates target geometry that is lip-synced with the target audio and retains the personalized speaking style based on the geometric coefficients of the reference video and the target audio.

In the second stage, it uses a dual attention mechanism facial renderer to synthesize the target speaker's face, employing a carefully designed reference selection strategy to generate a face that is lip-synced with the target geometry.

The model achieves highly personalized dubbing effects by learning the speaker's speaking style from the reference video and applying it to the dubbing of the target audio. Additionally, it employs a dual attention mechanism facial renderer that can sample textures for the lips and other facial areas separately, better preserving facial details and eliminating common artifacts like teeth flickering and sticking.

Experimental results show that compared to other state-of-the-art models, PersonaTalk has significant advantages in visual quality, lip synchronization accuracy, and personalization retention. Moreover, as a general model, PersonaTalk achieves performance comparable to specific character models without any fine-tuning.

Although PersonaTalk has made significant achievements in dubbing human face videos, due to the limitations of training data, its performance in driving non-human avatars (such as cartoon characters) may be slightly lower, and it may produce artifacts when dealing with large facial poses.

To prevent the misuse of this technology, ByteDance plans to restrict access to the core model to research institutions.

Project link: https://grisoon.github.io/PersonaTalk/

PersonaTalk ByteDance AI Model Lip Sync

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

ByteDance Unveils DreamActor-M1: Replicating RunwayML Act Functionality and Pushing Animation Generation Boundaries

ByteDance recently announced its latest AI project, DreamActor-M1, a cutting-edge advancement in video generation technology. This model seamlessly replaces a person from a still image into a video scene using a reference video, generating dynamic imagery with fine-grained expressions, natural movements, and high-definition quality. This launch marks another breakthrough for ByteDance in generative AI and is seen as a challenge to existing animation generation tools (like RunwayML).

Apr 3, 2025

1.4k

ByteDance Unveils DreamActor-M1 Project, Challenging Runway Act-One's AI Character Animation Technology

ByteDance recently launched its new AI project, DreamActor-M1. This project aims to replicate the functionality of Runway Act-One, utilizing advanced generative AI technology to transform character performances in videos into virtual animations with improved accuracy and expressiveness. This news has quickly garnered widespread attention from the industry and netizens, seen as another significant step forward for ByteDance in the AI video generation field. Technological Breakthrough: Ambition to Surpass Runway Act-One. According to publicly available information, Drea...

Apr 3, 2025

1.5k

Wikimedia Foundation Warns of Bandwidth Strain from AI Crawlers

The Wikimedia Foundation has warned of increasing bandwidth strain on its projects caused by AI-powered web crawlers. Representatives noted a 50% increase in bandwidth consumption for multimedia files since January 2024, largely attributed to automated programs harvesting content from Wikimedia's openly licensed image library for AI model training. Wikimedia Foundation staff members Birgit Mueller, Chris Danis, and...

Apr 3, 2025

290

OpenAI's o3 Model Cost Correction: Per-Task Price May Reach $30,000

The Arc Prize Foundation, responsible for maintaining and managing the competition, last week revised its cost estimate for OpenAI's upcoming o3 inference AI model with a staggering adjustment—from an initial estimate of $3,000 per ARC-AGI task to $30,000. This price correction reveals that the operational costs of today's most complex AI models may be ten times higher than previously anticipated. While OpenAI has yet to announce an official pricing strategy for o3, or even officially release the model, the Arc Prize...

Apr 3, 2025

290

Hugging Face Adds Handy Feature: One-Click Check for Compatible Models

Hugging Face, a leading open-source AI community platform, has launched a highly anticipated new feature: users can quickly see which machine learning models their computer hardware can run via platform settings. Users simply add their hardware information, such as GPU model, to their Hugging Face profile settings (located at the top right corner: Profile Icon > Settings > Local Apps and Hardware).

Apr 3, 2025

410

ByteDance Releases MegaTTS3 on Hugging Face: A Breakthrough in Lightweight Speech Synthesis

Beijing—ByteDance recently released its latest text-to-speech (TTS) model, MegaTTS3, on the Hugging Face open-source AI community. This release has quickly garnered attention from AI researchers and developers worldwide due to its breakthroughs in lightweight design and multilingual support. Based on community feedback and official information, MegaTTS3 is hailed as a significant advancement in speech synthesis. MegaTTS3's core highlights are...

Apr 3, 2025

260

Meta Unveils MoCha: AI System Transforms Text into Vivid Animated Characters with Natural Lip Sync and Movement

Meta, in collaboration with researchers from the University of Waterloo, recently launched MoCha, a novel AI system that generates full-body animated characters with synchronized speech and natural movements from simple text descriptions. This innovative technology promises to significantly enhance content creation efficiency and expressiveness, showcasing immense potential across various fields. Breaking the mold, MoCha's full-body animation with precise lip-sync differs from previous AI models that primarily focused on facial expressions. Its unique ability to render natural full-body movement makes it a groundbreaking advancement.

Apr 2, 2025

1.0k

Study Claims OpenAI May Have Used O'Reilly Paid Books to Train AI Models Without Authorization

A new study has raised concerns that OpenAI may have used copyrighted O'Reilly Media books to train its latest AI models without permission. The study was released by the AI Disclosures Project, a non-profit organization founded in 2024 by media mogul Tim O'Reilly and economist Ilan Strauss. AI models can be viewed as sophisticated prediction engines that...

Apr 2, 2025

230

AI Daily: Runway Launches New Video Model Gen-4; Unitree G1 Sells Over One Million in 5-Minute Livestream; OpenAI to Open-Source New Model

Welcome to the 【AI Daily】column! Your daily guide to exploring the world of artificial intelligence. We bring you the hottest AI news, focusing on developers and helping you understand technology trends and innovative AI product applications. Check out the latest AI products: https://top.aibase.com/ 1. Runway's stunning new AI video generation model, Gen-4, boasts incredibly consistent characters and scenes. Runway's recently launched Gen-4 AI model has generated significant buzz in the media generation field...

Apr 1, 2025

1.0k

Krea Launches 3D Capabilities and Website Redesign: From Text to 3D Creations in Seconds

Generative AI platform Krea recently announced the launch of its 3D generation capabilities and a complete website redesign. This marks a double breakthrough for Krea in technological innovation and user experience, further solidifying its leading position in the creative tools field. The newly launched 3D generation functionality is the core highlight of this update. Users can quickly generate interactive 3D objects from text descriptions or 2D images, adjusting angles, lighting, and textures in real-time. This functionality is based on Krea's proprietary AI model and internal...

Apr 1, 2025

230

AI News

AI Daily

AI Timeline

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview

Goodbye Voice Actors? ByteDance's PersonaTalk Achieves Accurate Voiceover with Perfect Expression Details!

AIbase基地

This article is from AIbase Daily

AI News Recommendations

ByteDance Unveils DreamActor-M1: Replicating RunwayML Act Functionality and Pushing Animation Generation Boundaries

ByteDance Unveils DreamActor-M1 Project, Challenging Runway Act-One's AI Character Animation Technology

Wikimedia Foundation Warns of Bandwidth Strain from AI Crawlers

OpenAI's o3 Model Cost Correction: Per-Task Price May Reach $30,000

Hugging Face Adds Handy Feature: One-Click Check for Compatible Models

ByteDance Releases MegaTTS3 on Hugging Face: A Breakthrough in Lightweight Speech Synthesis

Meta Unveils MoCha: AI System Transforms Text into Vivid Animated Characters with Natural Lip Sync and Movement

Study Claims OpenAI May Have Used O'Reilly Paid Books to Train AI Models Without Authorization

AI Daily: Runway Launches New Video Model Gen-4; Unitree G1 Sells Over One Million in 5-Minute Livestream; OpenAI to Open-Source New Model

Krea Launches 3D Capabilities and Website Redesign: From Text to 3D Creations in Seconds