Solution to improve spatial consistency in text-to-image models

CommonProductImageText-to-imageSpatial consistency
SPRIGHT is a large-scale visual language dataset and model focusing on spatial relationships. It constructs the SPRIGHT dataset by re-describing 6 million images, significantly increasing the spatial phrases in the descriptions. The model is fine-tuned on 444 images containing numerous objects to optimize the generation of images with spatial relationships. SPRIGHT achieves state-of-the-art spatial consistency in multiple benchmark tests while improving image quality scores.

SPRIGHT Alternatives