DynamicControl

Adaptive condition selection enhances control in text-to-image generation.

CommonProductImageText-to-imageDiffusion model
DynamicControl is a framework designed to enhance the control of text-to-image diffusion models. It dynamically combines various control signals and supports adaptive selection of different numbers and types of conditions to synthesize images more reliably and in detail. The framework first utilizes a dual-loop controller, employing pre-trained conditional generation and discriminator models to generate initial real score rankings for all input conditions. Then, a multimodal large language model (MLLM) constructs an efficient condition evaluator to optimize the condition ordering. DynamicControl jointly optimizes MLLM and the diffusion model, leveraging the inference capabilities of MLLM to facilitate multi-condition text-to-image tasks. The final ordered conditions are input into a parallel multi-control adapter to learn dynamic visual condition feature maps and integrate them to adjust ControlNet, enhancing control over the generated images.
Visit

DynamicControl Alternatives