The Alibaba team has released EMO, a portrait video generation framework capable of producing audio portraits with rich facial expressions and head poses. EMO utilizes a reference network to extract features from reference images and motion frames, processes audio through a pre-trained audio encoder for embedding, and combines multi-frame noise with facial region masks to generate videos. Experimental results show that EMO outperforms existing methods in terms of expressiveness and realism. The potential applications of this model could enhance the level of digital media and virtual content generation technology, but it may also be misused as a tool for criminal activities.
Alibaba Releases EMO, a Portrait Video Generation Framework

开源中国
This article is from AIbase Daily
Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.