FineControlNet is an official PyTorch implementation for generating images controlled by spatial-aligned text inputs, such as 2D human poses, and instance-specific text descriptions. It can handle a wide range of spatial inputs from simple line drawings to complex human poses. FineControlNet ensures natural interaction and visual coherence between instances and the environment, while retaining the high quality and generalization capabilities of Stable Diffusion, but with enhanced control.