ByteDance, in collaboration with research teams from universities in China and Singapore, has unveiled PhotoDoodle, a novel AI image editing system poised to redefine image creation. This innovative technology, built upon the Flux.1 model, learns artistic styles from a small number of samples and precisely executes specific editing instructions, opening up exciting new possibilities for creative expression.

Based on Flux.1

At the heart of PhotoDoodle lies the OmniEditor system, initially developed by the research team. It cleverly leverages LoRA (Low-Rank Adaptation) technology to enhance the Flux.1 image generation model from German startup Black Forest Labs. This approach avoids completely reshaping the original model's weights; instead, it adds specialized small matrices, enabling adjustments ranging from subtle conceptual tweaks to complete style transformations.

Researchers then trained OmniEditor using a variant called EditLoRA to replicate unique artistic styles. Through a curated set of image pairs created in collaboration with artists, the system mastered the nuances of each artistic style.

QQ20250226-092429.png

PhotoDoodle adds fun elements like monsters, magical effects, and decorative illustrations while preserving the original image composition. | Image: Huang et al.

"Positional Encoding Cloning": Maintaining Visual Harmony

PhotoDoodle's most striking innovation is its "Positional Encoding Cloning" technique. This allows the AI to remember the precise location of each pixel in the original image, maintaining compositional integrity when adding new elements and ensuring seamless integration into the background.

This addresses a key pain point of traditional AI image editing: either altering the entire image style or only editing localized areas, making it difficult to incorporate new decorative elements while preserving the original perspective and background. PhotoDoodle achieves this breakthrough without additional parameter training, significantly improving processing efficiency.

QQ20250226-092411.png

PhotoDoodle transforms everyday photos using various artistic styles – from cute cartoon monsters to hand-drawn lines and color effects. | Image: Huang et al.

Towards Single-Image Training

In practical tests, PhotoDoodle effortlessly handles complex instructions ranging from "make the cat whiter" to "add a pink monster climbing the building." Compared to existing technologies, it demonstrates superior performance in benchmark tests such as image-text description similarity, significantly outperforming its counterparts in both targeted edits and global image changes.

QQ20250226-092421.png

A comparison of PhotoDoodle with existing AI image editing systems clearly shows the difference in the quality of specific prompt execution. | Image: Huang et al.

Currently, PhotoDoodle requires dozens of image pairs and thousands of training steps to master a new style. The research team is now focusing on more efficient single-image training methods and has released a dataset containing six different artistic styles and over 300 image pairs. The relevant code has also been open-sourced on GitHub, providing a solid foundation for future research.

Address: https://github.com/showlab/PhotoDoodle