Sync Labs Releases Lipsync-2: The World's First Zero-Shot Lip-Sync Model

AIbase基地

Published inAI News · 4 min read · Apr 8, 2025

AI technology company Sync Labs recently announced the launch of its latest product, Lipsync-2, via Twitter. This model is hailed as the "world's first zero-shot lip-sync model," capable of preserving the speaker's unique style without requiring additional training or fine-tuning. This breakthrough technology significantly improves realism, expressiveness, control, quality, and speed, making it suitable for real-person videos, animation, and AI-generated content.

Innovative Features of Lipsync-2

According to Sync Labs' April 1st Twitter post, the core highlight of Lipsync-2 is its "zero-shot" capability. This means the model can instantly learn and generate lip-sync effects that match a speaker's unique style without pre-training on that specific speaker. This feature revolutionizes traditional lip-sync technology, which typically requires massive training datasets, allowing content creators to use the technology more efficiently.

Furthermore, Sync Labs revealed that Lipsync-2 represents a technological leap across multiple dimensions. Whether it's real-person videos, animated characters, or AI-generated figures, Lipsync-2 delivers enhanced realism and expressiveness.

New Control Feature: Temperature Parameter

In addition to its zero-shot capability, Lipsync-2 introduces a control feature called "temperature." This parameter allows users to adjust the intensity of the lip-sync effect, ranging from a natural and subtle synchronization to a more exaggerated and expressive result, catering to various needs. Currently, this feature is in private testing and is gradually being rolled out to paying users.

Application Prospects: Multilingual Education and Content Creation

Sync Labs' April 3rd Twitter post further showcased Lipsync-2's potential applications, highlighting its "outstanding accuracy, style, and expressiveness" and envisioning a future where "every lecture can be presented in every language." This technology can be used not only for video translation and sub-character editing but also to facilitate character re-animation and even support realistic AI-generated user content (UGC), revolutionizing education, entertainment, and marketing.

Industry Response and Future Expectations

The release of Lipsync-2 has quickly garnered industry attention. Sync Labs stated that the model is available for testing on the fal platform, accessible through fal's model library. Since its April 1st announcement, discussions about Lipsync-2 on Twitter have steadily increased, with many users expressing anticipation for its cross-domain application potential.

As a pioneer in AI video technology, Sync Labs has once again demonstrated its leadership in innovation with Lipsync-2. With the gradual rollout of this technology, the barrier to content creation may be further lowered, while audiences will enjoy a more natural and immersive audio-visual experience.

Zero-shot Lipsync-2 Lip-sync Model SyncLabs

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Alibaba Unveils OmniTalker: A Breakthrough in AI Video Generation, Achieving Stylized Speech and Expression Synchronization with a Single Reference Video

Recently, a research team from Alibaba Group released OmniTalker, a new AI technology project that has quickly garnered industry attention for its impressive video generation capabilities. OmniTalker can accurately capture the speech style and facial expressions of a person from a single reference video and generate a dynamic video with synchronized lip movements and natural expressions. This technology showcases Alibaba's strength in generative AI and offers revolutionary possibilities for video content creation.

Apr 7, 2025

780

LanPaint: Zero-Shot Image Inpainting with Diffusion Models

Recently, developer scraed released LanPaint on GitHub, a zero-shot image inpainting tool. This tool aims to help users achieve high-quality image inpainting results on any Stable Diffusion (SD) model, including custom-trained models. LanPaint achieves this by iteratively prompting the model to 'think' before denoising, resulting in more seamless and accurate inpainting. A key feature of LanPaint is its zero-shot capability; users can start using it immediately without any training.

Mar 10, 2025

390

Spark-TTS: A Text-to-Speech System Supporting Zero-Shot Voice Cloning and Fine-grained Control

Mar 6, 2025

1.1k

Byte's open-source lip-sync model atetSync achieves ultra-realistic lip synchronization

Jan 6, 2025

5.6k

Zero-Shot Learning Disrupts 'Segment Everything'! SAMURAI Breaks Through Video Tracking Bottlenecks, Locking Targets in Real Time Effortlessly!

The 'Segment Anything' model SAM launched by Meta has been a force to be reckoned with in the field of image segmentation, but it struggles when it comes to video object tracking, especially in crowded, fast-moving, or hide-and-seek scenarios. This is due to SAM's memory mechanism, which acts like a 'fixed window', only focusing on the most recent frames while ignoring the quality of the memory content, leading to error propagation in videos and significantly diminished tracking performance. To address this issue, the University of Washington's...

Nov 25, 2024

5.1k

OuteTTS-0.1-350M: A Novel Text-to-Speech Synthesis Method with Zero-Shot Voice Cloning Capability

Recently, Oute AI released a novel text-to-speech synthesis method called OuteTTS-0.1-350M. This method utilizes pure language modeling without the need for external adapters or complex architectures, offering a simplified TTS approach. OuteTTS-0.1-350M is based on the LLaMa architecture, using WavTokenizer to directly generate audio tokens, making the process more efficient. The model features zero-shot voice cloning capability, requiring only a few seconds of reference audio.

Nov 6, 2024

3.1k

Google's New Voice Cloning Technology: Voice Cloning with Just a Few Seconds of Audio Sample

In today's rapidly advancing technology, speech synthesis technology is also progressing, especially in the field of restoring lost voices. Recently, Google researchers introduced a new technology called 'Zero-shot Voice Transfer' which can be directly integrated with state-of-the-art Text-to-Speech (TTS) systems to help those who have lost their voices due to illness or accidents regain their 'voice memory'. The core of this technology is its 'zero-shot' capability, meaning that we do not need a large number of samples to achieve this.

Sep 25, 2024

4.6k

Xiaoice AI Digital Staff Upgrade: Unveiling Zero-Shot Technology and Over a Trillion-Parameter Model Base

小冰公司的最新发布标志着其AI数字员工产品的重大升级，新增“零样本”数字人技术（Zero-shot Xiaoice Neural Rendering，Zero-XNR）、超千亿大模型基座以及高能多媒体传输系统，显著提升实时交互质量和效果。Zero-XNR技术结合TTS语音模型，并配以高效聚类框架实现秒级高质量声音与形象复刻。大模型基座与Agent构建框架的强化集结了强大的职业交互功能，用于精准商业交互。透影音画传输系统的引入，则确保了超高清视频的传输流畅性与抗干扰性，增强了用户体验。在技术创新引领下，小冰旨在推动数字人技术的普及与产业化应用，经过微软背景的专业打磨，现已成为一家独享核心技术的中国AI公司。其技术框架涵盖了自然语言处理、语音识别、视觉交互及AI内容生成，已成功构建全球化的AI数字人产品体系，服务从金融到教育、智能汽车、智能地产等众多行业，支持数字工作者提供平稳、可靠并生产力充足的交互体验。

Jul 17, 2024

1.4k

Microsoft Upgrades Azure AI Speech Services, Introducing 9 More Realistic AI Voices

["Microsoft has launched 9 more realistic AI voices, providing users with a more natural and immersive conversation experience.", "The upgrade introduces zero-shot learning, enhancing the naturalness of synthesized speech and improving feature imitation accuracy.", "The personalized voice feature makes creating custom voices quick and simple, significantly enhancing voice realism.", "Supports 400 types of neural voices, covering over 140 languages, with fast and seamless conversion.", "Responsible AI use is emphasized with the release of 9 AI voices optimized for dialogue, increasing options and diversity."]

Apr 2, 2024

3.2k

Synclabs Releases Lip Sync Model Sync-1.6.0 to Reduce Flickering

["Synclabs has released the latest version of their lip sync model - Sync-1.6.0, achieving smooth and accurate lip shape generation.", "The new model reduces flickering between video frames, providing a more natural audio-visual experience.", "Users can experience the Sync-1.6.0 service through a browser interface or API, simplifying the audio-visual production process.", "Sync-1.6.0 optimizes lip sync accuracy and video quality, delivering a more realistic viewing experience for users.", "The release of this model will benefit the digital media and entertainment industry." ]

Mar 25, 2024

3.8k

AI News

AI Daily

AI Timeline

Al Hardware

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview