In today's digital age, image generation technology is advancing at an astonishing pace. Recently, a research team from the National University of Singapore proposed a brand new framework—OminiControl—designed to enhance the flexibility and efficiency of image generation. This framework combines image conditions and fully leverages the already trained Diffusion Transformer (DiT) model, bringing unprecedented control capabilities.

In simple terms, by providing a source image, OminiControl can integrate the subject of that image into the generated pictures. For instance, the editor uploaded the source image on the left and input the prompt "a chip person placed next to a doctor's office desk with a stethoscope on it," resulting in a rather ordinary output, as shown below:

image.png

The core of OminiControl lies in its "parameter reuse mechanism." This mechanism allows the DiT model to effectively handle image conditions with fewer additional parameters. This means that compared to existing methods, OminiControl only needs to add 0.1% to 0.1% of parameters to achieve powerful functionality. Furthermore, it can uniformly handle various image condition tasks, such as subject-based generation and the application of spatial alignment conditions, like edges, depth maps, and more. This flexibility is particularly suited for subject-driven generation tasks.

image.png

The research team also emphasized that OminiControl achieves these capabilities through training on generated images, which is especially important for subject-driven generation. After extensive evaluation, OminiControl significantly outperformed existing UNet models and DiT adaptation models in tasks of subject-driven generation and spatial alignment condition generation. This research achievement brings new possibilities to the field of creative work.

To support broader research, the team also released a training dataset named Subjects200K, which contains over 200,000 identity-consistent images and provides an efficient data synthesis pipeline. This dataset will be a valuable resource for researchers, helping them further explore subject-consistent generation tasks.

image.png

The launch of Omini not only enhances the efficiency and effectiveness of image generation but also offers more possibilities for artistic creation. With continuous technological advancements, future image generation will become more intelligent and personalized.

Online experience: https://huggingface.co/spaces/Yuanshi/OminiControl

GitHub: https://github.com/Yuanshi9815/OminiControl

Paper: https://arxiv.org/html/2411.15098v2

Key Points:

🌟 OminiControl enhances the control capabilities and efficiency of image generation through its parameter reuse mechanism.

🎨 This framework can simultaneously handle various image condition tasks, such as edges and depth maps, adapting to different creative needs.

📸 The team released a dataset of over 200,000 images, Subjects200K, to support further research and exploration.