Sana
High-efficiency high-resolution image synthesis framework
CommonProductImageImage SynthesisText to Image
Sana is a text-to-image framework capable of efficiently generating images with resolutions up to 4096×4096. It synthesizes high-resolution, high-quality images at an incredibly fast speed while maintaining strong text-image alignment and can be deployed on laptop GPUs. The core design of Sana includes a deep compressed autoencoder, a linear diffusion transformer (DiT), a small language model as a decoder-only text encoder, and efficient training and sampling strategies. Compared to modern large diffusion models, Sana-0.6B is 20 times smaller and measures throughput over 100 times faster. Additionally, Sana-0.6B can be deployed on a 16GB laptop GPU, generating images at 1024×1024 resolution in less than 1 second. Sana makes low-cost content creation feasible.
Sana Visit Over Time
Monthly Visits
118137
Bounce Rate
59.22%
Page per Visit
1.6
Visit Duration
00:00:52