Remember the ByteDance project Loopy that left everyone in awe when it was first released earlier this month? This lip-syncing project, which perfectly matches the digital human's voice with the visuals, expressions, and emotions, has officially launched on Jimeng.
AIbase tested it out and the results are quite impressive, making it the best lip-syncing service currently available for Chinese language.
In the past, lip-sync videos often had a common flaw: the mouth movements seemed to match the audio, but the voice never quite felt like it belonged to the person speaking, creating a sense of disconnection when watching such videos.
ByteDance, in collaboration with Zhejiang University's research team, has developed a video diffusion model called LOOPY, based on audio-driven technology, which perfectly addresses this issue.
Unlike traditional lip-syncing where characters merely move their mouths, Loopy enables characters in lip-sync videos to express appropriate tones, emotions, and facial expressions in the context of speaking or singing. It can precisely "direct" every subtle movement of the virtual character, such as sighs, emotional eyebrow and eye movements, and natural head movements.
Currently, this feature has been integrated into Jimeng's video generation module:
AIbase uploaded a photo of a girl to test it out,
Jimeng currently offers two lip-syncing options:
1. Text-to-Speech
The operation on Jimeng is straightforward: simply upload the image or video of the character you want to lip-sync, input the text, and choose a voice. Here, AIbase selected a cool,御姐-style voice, and the test results are as follows:
As you can see, the character exhibits subtle facial expressions while speaking, and the dynamic details like nasolabial folds appear quite realistic.
2. Upload Local Audio
Moreover, you can not only make her speak but also upload a singing audio to make her sing:
Here, AIbase chose a popular excerpt from a recent Douyin video to see the results:
The results are truly impressive, not only are the lip movements accurate, but the voice doesn't feel disjointed, as if it's the girl's original voice.
However, there was a small issue: the girl in the photo chosen by AIbase wasn't looking at the viewer, which might not create a strong sense of immersion. Let's try a more direct angle:
Isn't that much better? And while the character is singing, she also exhibits very realistic actions like closing her eyes and shaking her head.
AIbase also tested a male version, and the results are as follows:
Isn't the effect stunning? What surprised AIbase the most is that it also considers very subtle details like the Adam's apple and eyebrows, making the overall video more realistic.
Feel free to experience it yourself~
Jimeng Product Entry: https://top.aibase.com/tool/jimeng