ConsiStory
Unsupervised Consistency Text-to-Image Generation
CommonProductImageImage GenerationConsistency
ConsiStory is a method for generating consistent subjects in pre-trained text-to-image models without requiring any training. It does not require fine-tuning or personalization, making it 20 times faster than previous state-of-the-art methods. We enhance the model by introducing a subject-driven shared attention module and a relationship-based feature injection approach to promote consistency between images. Additionally, we develop strategies that encourage layout diversity while maintaining subject consistency. ConsiStory can naturally extend to multi-subject scenarios and even achieve zero-shot personalization for common objects.