Recently, a research team from the University of Science and Technology of China introduced an innovative video editing tool called PortraitGen. By simply inputting a video, users can achieve multi-modal portrait editing such as text-based character effects, reference image-based character effects, clothing changes, and lighting adjustments.

image.png

What's even more exciting is that all this can be accomplished in just 30 minutes, with the edited portrait video playing smoothly at 100 frames per second!

The core of this technology lies in tracking SMPL-X coefficients. The research team first tracks monocular videos and then constructs a 3D Gaussian feature field through a mechanism known as neural Gaussian textures.

By iteratively updating the dataset, users can achieve diverse portrait editing. Notably, the team has also proposed a "facial-aware editing" module aimed at enhancing expression quality while preserving personalized facial structures, resulting in natural and delicate editing effects.

Text-based Character Effects

PortraitGen's editing capabilities are highly powerful, allowing users to perform text-driven and image-driven editing.

For instance, text-driven editing utilizes a 2D editing model called InstructPix2Pix. Users only need to input an RGB image, text instructions, and a noisy latent image, and the system can make fine adjustments based on this information.

Stylized Editing

In image-driven editing, the team has adopted techniques such as style transfer and virtual try-on to meet different needs, allowing users to easily transfer styles to video frames and even achieve clothing change effects.

Lighting Adjustment

More interestingly, PortraitGen can also adjust the lighting of video frames according to user-provided lighting descriptions, making the entire video more harmonious and aesthetically pleasing.

Compared to other top video editing tools, PortraitGen excels in prompt retention, identity preservation, and temporal consistency.

Technically, PortraitGen's introduction of neural Gaussian textures differs from the traditional spherical harmonic coefficient method by storing learnable features for each Gaussian point, enriching the editing effects and supporting more complex styles.

image.png

Additionally, with enhanced facial recognition editing functions and optimized expression consistency, PortraitGen demonstrates its strong potential in the field of detailed portrait editing.

image.png

Project Entry: https://top.aibase.com/tool/portraitgen

Key Points:

✨ PortraitGen can edit 2D portrait videos into 4D Gaussian fields in just 30 minutes, supporting smooth playback at 100 frames per second.

🎨 Offers multiple editing methods, including text-driven and image-driven, making video style transformation more flexible and diverse.

💡 Through the facial-aware editing module, it enhances expression quality while preserving users' personalized facial features.