PIXART-Σ is a diffusion transformer model that directly generates 4K resolution images. Compared to its predecessor PixArt-α, it offers higher image fidelity and better alignment with text prompts. The key features of PIXART-Σ include an efficient training process, where it evolves from a 'weaker' baseline model to a 'stronger' model by leveraging higher-quality data in a process called 'weak-to-strong training'. Improvements in PIXART-Σ include the use of higher-quality training data and efficient label compression.