Is ControlNe Outperformed? The Versatile Image Generation Model OmniGen Has Arrived, Achieving Image Generation and Fine Editing with Simple Prompts

AIbase基地

Published inAI News · 4 min read · Oct 24, 2024

811

Recently, the research team at Beijing AI Institute has introduced a new image generation model named OmniGen.

Versatile Image Generation and Editing Model

Unlike previous image generation tools such as Stable Diffusion, OmniGen's most notable feature is its versatility, handling multiple tasks within a single framework:

It can process various image generation tasks, including text-to-image generation and image editing, making it a truly all-rounder.

This means users can control image generation and fine-tune edits with simple prompts, eliminating the need for additional plugins like ControlNet or IP-Adapter for detailed adjustments!

OmniGen's architecture is highly streamlined. Unlike traditional image generation models, it does not require extra text encoders or complex workflows. Simply input conditions, and OmniGen efficiently generates images, significantly enhancing user experience. It combines variational autoencoders with pretrained Transformer models, handling both image and text inputs within one model, reducing unnecessary complexity.

To enhance image generation quality, OmniGen employs a calibration flow training method, which directly regresses target speeds, making image generation control more precise. Additionally, its progressive training strategy gradually masters generation techniques from low to high resolution, yielding impressive results.

OmniGen Rivals Advanced Models in Image Generation

OmniGen's training dataset is also extensive and diverse, covering various image generation tasks. To ensure robust multitasking capabilities, researchers constructed a large-scale dataset called X2I, including data for text-to-image and image editing tasks. This allows OmniGen to effectively learn and transfer knowledge from different tasks, demonstrating new generation capabilities.

In multiple tests, OmniGen's performance has been remarkable. In text-to-image generation, it matches the performance of the most advanced models on the market. In the GenEval benchmark test, OmniGen was trained on just 0.1 million images, compared to SD3's over 1 billion images.

Its image editing capabilities are equally impressive, accurately controlling source images and editing instructions. For instance, on the EMU-Edit test set, it outperformed models like InstructPix2Pix and even matched the current state-of-the-art EMU-Edit model.

In subject-driven generation tasks, OmniGen showcases exceptional personalized capabilities, suitable for various fields such as art creation and advertising design.

Try it out at: https://huggingface.co/spaces/Shitao/OmniGen

Paper: https://arxiv.org/html/2409.11340v1

Mistral Seeks $1 Billion in Funding to Target the Throne of AI in Europe!

French AI company Mistral is seeking $1 billion in equity financing, with a valuation of $6.51 billion. The company is known for its open-source large language model and chatbot Le Chat, and has raised a total of $1.19 billion in funding so far. This round of financing will be used for research and development and market expansion. Additionally, it will collaborate with MGX Fund and NVIDIA to build the largest AI data center park in Europe, supporting France's AI sovereignty initiative. Mistral's development will enhance Europe's position in the global AI competition.

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

Is ControlNe Outperformed? The Versatile Image Generation Model OmniGen Has Arrived, Achieving Image Generation and Fine Editing with Simple Prompts

AIbase基地

This article is from AIbase Daily

AI News Recommendations

AI Daily: Alibaba Tongyi Opens Source Audio Generation Model ThinkSound; Google Veo3 Generates Images into Videos; Feishu Announces Several New AI Products

Hong Kong's First AI Q&A System Launches, Taking You to Explore the Intelligent Era

Mistral Seeks $1 Billion in Funding to Target the Throne of AI in Europe!

Lark Launches Multiple AI New Products to Help Enterprises Build a Smart Office Ecosystem!

Hugging Face Launches SmolLM3: A 3B-Parameter Small Model Competes with 4B Giants, 128K Context Leads a New Trend in Efficient AI!

Vidu Q1 Shock Upgrade: Reference to Video Supports Up to Seven Images, AI Video Generation Sets New Records

Feishu Launches Multiple AI Products and Builds an Enterprise-Level Doubao

Google Veo3 Makes a Major Upgrade, Supporting the Generation of Animated Videos from Static Images

Apple is developing an AI customer service assistant similar to ChatGPT to enhance user support experience

Zhiyuan Robot Announces Patent Related to Robot Motion Control Model