ControlNet++ is a novel text-to-image diffusion model that significantly improves controllability under various conditioning by explicitly optimizing the pixel-level cyclic consistency between the generated image and the conditioning control. It utilizes a pre-trained discriminative reward model to extract the corresponding conditioning from the generated image and optimizes the consistency loss between the input conditioning control and the extracted conditioning. Furthermore, ControlNet++ introduces an efficient reward strategy by adding noise to the input image and then using a single-step denoised image for reward fine-tuning, avoiding the significant time and memory cost associated with image sampling.