Sana is a text-to-image generation framework developed by NVIDIA that efficiently produces high-definition images with resolutions of up to 4096×4096. It maintains high text-image consistency and operates at high speed, making it deployable on laptop GPUs. The Sana model is based on linear diffusion transformers and uses pre-trained text encoders along with spatially compressed latent feature encoders. This technology is significant for its ability to rapidly generate high-quality images, having a revolutionary impact on artistic creation, design, and other creative fields. The Sana model is licensed under CC BY-NC-SA 4.0, and its source code is available on GitHub.