UNIMO-G-Unified Image Generation

UNIMO-G is a simple multimodal conditional diffusion framework for processing interwoven text and visual inputs. It comprises two core components: a multimodal large language model (MLLM) for encoding multimodal prompts and a conditional denoising diffusion network for generating images based on the encoded multimodal inputs. We utilize a two-stage training strategy to effectively train this framework: Firstly, pre-training on a large-scale text-image pair dataset to develop conditional image generation capabilities, followed by guided fine-tuning using multimodal prompts to achieve unified image generation capabilities. We have adopted a carefully designed data processing pipeline, including language grounding and image segmentation, to construct multimodal prompts. UNIMO-G demonstrates outstanding performance in text-to-image generation and zero-shot theme-driven synthesis, proving highly effective in generating high-fidelity images with complex multimodal prompts involving multiple image entities.

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

UNIMO-G

UNIMO-G Visit Over Time

UNIMO-G Visit Trend

UNIMO-G Visit Geography

UNIMO-G Traffic Sources

UNIMO-G Alternatives

UNIMO-G — Unified Image Generation

CreatiLayout — CreatiLayout technology for creative layout-to-image generation is based on Siamese Multimodal Diffusion Transformers.

ImagenHub — ImagenHub: Inference and Evaluation of Standardized Conditional Image Generation Models

DiffSensei — Customized comic generation model, connecting multimodal LLMs and diffusion models.

Stable Diffusion WebUI Forge — Stable Diffusion WebUI Forge is an image generation platform built on top of Stable Diffusion WebUI.

Instruct-Imagen — Multimodal Image Generation Model

Stable Diffusion 3.5 Medium — A multimodal diffusion transformer model for generating images based on text.

Tencent EMMA — Multimodal Text-to-Image Generation Model

Diffusers Image Outpaint — Image extension using diffusion models

CHOIS — Human-Object Interaction Synthesis technology based on Conditional Diffusion Models

Stable Diffusion API — Generate and optimize Dreambooth Stable Diffusion using the API, saving cost, time, and money, and achieving 50x faster image generation.

CogView3 — A text-to-image generation system based on cascaded diffusion

Stable Diffusion 3 API — Advanced text-to-image generation system

Lumina-mGPT — A multimodal autoregressive model excelling in text-to-image generation.

IPAdapter-Instruct — A model for image generation.

diffusion-client — A powerful Android Stable Diffusion client

Diffusion-RWKV — An extensible diffusion model based on the RWKV architecture.

Diffusion Self-Distillation — A diffusion self-distillation technique for zero-shot custom image generation.

Neural Network Diffusion — Implementation of Neural Network Diffusion Model

ACE: All-round Creator and Editor Following Instructions via Diffusion Transformer — A versatile creator and editor that follows instructions via diffusion transformers

Stable Diffusion Web UI — Web interface for image generation based on Stable Diffusion

pseudo-flex-base — A flexible text-to-image generation model based on Diffusion.

CreativeSynth — Multimodal diffusion generation of artistic images

Stable Diffusion — Free Stable Diffusion AI Image Generator

Diffusion Bee — The easiest way to run a stable Diffusion model locally.

Mann-E Art — An image generation model based on Stable Diffusion XL

OmniGen2 — A powerful unified multimodal model that supports text-to-image generation and image editing.

Stable Diffusion 3 Free Online — Advanced Text-to-Image Generation Model