PIXART LCM
Fast and controllable image generation with latent consistency model
CommonProductImageImage generationLatent consistency model
PIXART LCM is a text-to-image synthesis framework that integrates the latent consistency model (LCM) and ControlNet into the advanced PIXART-α model. PIXART LCM is renowned for its ability to generate high-quality 1024px resolution images through an efficient training process. Integrating LCM in PIXART-δ significantly accelerates inference speed, allowing for the generation of high-quality images in just 2-4 steps. Notably, PIXART-δ achieves the milestone of generating 1024x1024 pixel images within 0.5 seconds, a 7-fold improvement over PIXART-α. Furthermore, PIXART-δ is meticulously designed for efficient training on a 32GB V100GPU within a single day. With 8-bit inference capability, PIXART-δ can synthesize 1024px images under an 8GB GPU memory constraint, considerably enhancing its usability and accessibility. Additionally, the introduction of a ControlNet-like module enables fine-grained control over text-to-image diffusion models. We propose a novel ControlNet-Transformer architecture, specifically tailored for Transformers, achieving explicit controllability and high-quality image generation. As a leading open-source image generation model, PIXART-δ offers a promising alternative within the stable diffusion model family, significantly contributing to the field of text-to-image synthesis.
PIXART LCM Visit Over Time
Monthly Visits
17788201
Bounce Rate
44.87%
Page per Visit
5.4
Visit Duration
00:05:32