Sana

High-efficiency high-resolution image synthesis framework

CommonProductImageImage SynthesisText to Image
Sana is a text-to-image framework capable of efficiently generating images with resolutions up to 4096×4096. It synthesizes high-resolution, high-quality images at an incredibly fast speed while maintaining strong text-image alignment and can be deployed on laptop GPUs. The core design of Sana includes a deep compressed autoencoder, a linear diffusion transformer (DiT), a small language model as a decoder-only text encoder, and efficient training and sampling strategies. Compared to modern large diffusion models, Sana-0.6B is 20 times smaller and measures throughput over 100 times faster. Additionally, Sana-0.6B can be deployed on a 16GB laptop GPU, generating images at 1024×1024 resolution in less than 1 second. Sana makes low-cost content creation feasible.
Visit

Sana Visit Over Time

Monthly Visits

118137

Bounce Rate

59.22%

Page per Visit

1.6

Visit Duration

00:00:52

Sana Visit Trend

Sana Visit Geography

Sana Traffic Sources

Sana Alternatives