Tencent EMMA

Multimodal Text-to-Image Generation Model

PremiumNewProductImageImage GenerationMultimodal
EMMA is a novel image generation model built upon the state-of-the-art text-to-image diffusion model ELLA. It can accept multimodal prompts and, through its innovative multimodal feature connector design, effectively integrates text and supplementary modal information. This model, by freezing all parameters of the original T2I diffusion model and only adjusting some additional layers, reveals the interesting property that pre-trained T2I diffusion models can secretly accept multimodal prompts. EMMA is easy to adapt to different existing frameworks, making it a flexible and effective tool for generating personalized and context-aware images even videos.
Visit

Tencent EMMA Alternatives