Translated data: Tencent released the ELLA project yesterday, an efficient large language model adapter that enhances the ability of existing SD models to understand prompt words without the need for training. ELLA integrates large language models into text-to-image diffusion models, significantly improving the model's ability to handle text alignment. The team designed a time-step aware semantic connector to help diffusion models better understand text prompts at different stages. ELLA can be easily integrated into community models and tools, enhancing the ability to follow complex prompts. Experiments show that ELLA performs excellently in handling complex prompts that include multiple objects and different attributes, bringing new possibilities for the development of text-to-image models.