New AI Image Generation Framework OminiControl: Integrating Subject Material into Generated Images

AIbase基地

Published inAI News · 5 min read · Nov 26, 2024

671

In today's digital age, image generation technology is advancing at an astonishing pace. Recently, a research team from the National University of Singapore proposed a brand new framework—OminiControl—designed to enhance the flexibility and efficiency of image generation. This framework combines image conditions and fully leverages the already trained Diffusion Transformer (DiT) model, bringing unprecedented control capabilities.

In simple terms, by providing a source image, OminiControl can integrate the subject of that image into the generated pictures. For instance, the editor uploaded the source image on the left and input the prompt "a chip person placed next to a doctor's office desk with a stethoscope on it," resulting in a rather ordinary output, as shown below:

The core of OminiControl lies in its "parameter reuse mechanism." This mechanism allows the DiT model to effectively handle image conditions with fewer additional parameters. This means that compared to existing methods, OminiControl only needs to add 0.1% to 0.1% of parameters to achieve powerful functionality. Furthermore, it can uniformly handle various image condition tasks, such as subject-based generation and the application of spatial alignment conditions, like edges, depth maps, and more. This flexibility is particularly suited for subject-driven generation tasks.

The research team also emphasized that OminiControl achieves these capabilities through training on generated images, which is especially important for subject-driven generation. After extensive evaluation, OminiControl significantly outperformed existing UNet models and DiT adaptation models in tasks of subject-driven generation and spatial alignment condition generation. This research achievement brings new possibilities to the field of creative work.

To support broader research, the team also released a training dataset named Subjects200K, which contains over 200,000 identity-consistent images and provides an efficient data synthesis pipeline. This dataset will be a valuable resource for researchers, helping them further explore subject-consistent generation tasks.

The launch of Omini not only enhances the efficiency and effectiveness of image generation but also offers more possibilities for artistic creation. With continuous technological advancements, future image generation will become more intelligent and personalized.

Online experience: https://huggingface.co/spaces/Yuanshi/OminiControl

GitHub: https://github.com/Yuanshi9815/OminiControl

Paper: https://arxiv.org/html/2411.15098v2

Key Points:

🌟 OminiControl enhances the control capabilities and efficiency of image generation through its parameter reuse mechanism.

🎨 This framework can simultaneously handle various image condition tasks, such as edges and depth maps, adapting to different creative needs.

📸 The team released a dataset of over 200,000 images, Subjects200K, to support further research and exploration.

China Huadian Launches 'Huadian Zhi' Large Model, Energy Management Enters a New Intelligent Era

China Huadian launched the 'Huadian Zhi' large model at the 2025 New Power System Forum, achieving breakthroughs in artificial intelligence and predictive applications. The model pioneered runoff prediction technology, increasing the water energy utilization rate of the Wujiang River Basin from 5.8% to 10.8%, promoting the intelligent transformation of the power industry.

AntGroup Launches Multilingual Visual Large Model Training Framework to Break Language Barriers!

AntGroup introduced a multilingual multimodal large model training framework at the Hong Kong FinTech Festival, breaking through the bottlenecks of multilingual applications. This technology targets small languages such as Egyptian Arabic, and through a language-aware optimization framework, it achieves a 'thinking in the target language' mechanism, improving the training effectiveness for resource-scarce languages.

MiniMax Music 2.0 Officially Released, Marking a New Era in Music Creation

MiniMax launches the next-generation music generation model, Music 2.0, which is described as a 'singing producer' due to its significantly improved music understanding and expression capabilities. The model can accurately capture vocal emotions and instrumental dynamics, achieving a key breakthrough in sound expressiveness, signaling a major transformation in the music creation experience.

eBay Collaborates with ChatGPT to Enter a New Era of Intelligent E-commerce, Shopping Experience to Be Fully Upgraded!

E-commerce giant eBay announced a strategic cooperation with ChatGPT to jointly create a unified intelligent platform. By integrating eBay's own AI shopping assistant with third-party intelligences like ChatGPT, the aim is to enhance the personalization and convenience of consumers' shopping experience. The CEO of eBay emphasized that the platform will use its vast data accumulated over 30 years to provide more accurate product recommendations to users.

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

AI Brand Monitoring Tool

GEO Services​

AI Search Visibility Checker

AI Model Compatibility Checker

AI Deployment Calculator

AI Dataset Collection

Intelligent Document Recognition

New AI Image Generation Framework OminiControl: Integrating Subject Material into Generated Images

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Google Gemini Platform to Launch Nano Banana2 Image Generation Technology with Upgrades

Llama.cpp Has Evolved Completely! The Era of Local AI Has a Multimodal Revolution, Ollama May Be Outclassed

Microsoft officially launches its first AI image generator, MAI-Image-1

China Huadian Launches 'Huadian Zhi' Large Model, Energy Management Enters a New Intelligent Era

Shanghai's First AI Copyright Case Concludes, Meleagris Image Plagiarism Sparks Controversy

AntGroup Launches Multilingual Visual Large Model Training Framework to Break Language Barriers!

Wenxin Magic Comic Function Launch: One Sentence, One Image, Two Minutes to Generate a Serial! Everyone Can Be a Cartoonist

MiniMax Music 2.0 Officially Released, Marking a New Era in Music Creation

eBay Collaborates with ChatGPT to Enter a New Era of Intelligent E-commerce, Shopping Experience to Be Fully Upgraded!

LongCat-Flash-Omni Officially Released, Opening a New Era of Multimodal Real-Time Interaction

GEO Services