ELLA

An LLM-enhanced semantic alignment adapter for diffusion models

CommonProductImageText-to-ImageSemantic Alignment
ELLA (Efficient Large Language Model Adapter) is a lightweight method that equips existing CLIP-based diffusion models with powerful LLMs. ELLA enhances the model's prompt following capability, enabling text-to-image models to understand long texts. We designed a Time-Sensitive Semantic Connector (TSC) to extract various denoising stage time-step related conditioning from pre-trained LLMs. Our TSC dynamically adapts semantic features for different sampling time steps, helping to freeze U-Net at different semantic levels. ELLA outperforms benchmarks like DPG-Bench, particularly in dense prompting scenarios involving multiple object combinations, diverse attributes, and relationships.
Visit

ELLA Visit Over Time

Monthly Visits

810

Bounce Rate

42.65%

Page per Visit

1.0

Visit Duration

00:00:00

ELLA Visit Trend

ELLA Visit Geography

ELLA Traffic Sources

ELLA Alternatives