Instruct-Imagen
Multimodal Image Generation Model
CommonProductImageMultimodalImage Generation
Instruct-Imagen is a multimodal image generation model that utilizes multi-modal instructions to handle heterogeneous image generation tasks and achieve generalization in unknown tasks. The model leverages natural language to integrate diverse modalities (e.g., text, edges, style, theme, etc.), standardizing a rich set of generative intents. Through fine-tuning on a pre-trained text-to-image diffusion model using a two-stage framework, incorporating retrieval-enhanced training and fine-tuning on diverse image generation tasks, the model demonstrates state-of-the-art performance on various image generation datasets, matching or exceeding previous task-specific models in human evaluation. It also shows promising generalization ability for unknown and more complex tasks.
Instruct-Imagen Visit Over Time
Monthly Visits
19075321
Bounce Rate
45.07%
Page per Visit
5.5
Visit Duration
00:05:32