CogView is a pre-trained Transformer model designed for general-text-to-image generation. The model consists of 4.1 billion parameters and is capable of generating high-quality and diverse images. The model's training approach follows an abstract-to-specific methodology, first pretraining to acquire general knowledge and then fine-tuning within specific domains to generate images, significantly enhancing the quality of generated images. Notably, the research paper also introduces two techniques to stabilize the training of large models: PB-relax and Sandwich-LN.