SPRIGHT
Solution to improve spatial consistency in text-to-image models
CommonProductImageText-to-imageSpatial consistency
SPRIGHT is a large-scale visual language dataset and model focusing on spatial relationships. It constructs the SPRIGHT dataset by re-describing 6 million images, significantly increasing the spatial phrases in the descriptions. The model is fine-tuned on 444 images containing numerous objects to optimize the generation of images with spatial relationships. SPRIGHT achieves state-of-the-art spatial consistency in multiple benchmark tests while improving image quality scores.
SPRIGHT Visit Over Time
Monthly Visits
939
Bounce Rate
56.39%
Page per Visit
1.0
Visit Duration
00:00:00