Leffa is a unified framework for controllable character image generation that allows for precise control over character appearance (e.g., virtual fitting) and pose (e.g., pose transfer). The model minimizes detail distortion and maintains high image quality by guiding target queries to relevant areas in the reference images during training. Key advantages of Leffa include model agnosticism, enabling performance enhancement of other diffusion models.