The Zhipu Technology team has announced exciting news today: their latest text-to-image model, CogView3, along with its upgraded version, CogView3-Plus-3B, are now officially open-sourced and successfully launched on the "Zhipu Qingyan" App. The introduction of these two models marks a new phase in AI-assisted artistic creation.

CogView3, as a text-to-image model based on cascaded diffusion, boasts an intricate generation process. It initially creates a low-resolution image of 512x512 pixels, which is then enhanced through a relay diffusion process to 1024x1024, and finally iterated to produce a high-definition 2048x2048 image. This step-by-step generation method, akin to a digital artist gradually perfecting a canvas, delivers an ultimate visual experience for users.

image.png

According to official evaluations, CogView3's performance is astonishing, outperforming the current top open-source text-to-image model, SDXL, by 77%. Notably, CogView3's inference speed is only one-tenth of SDXL's, showcasing the remarkable achievements of the Zhipu team in model optimization.

The introduction of CogView3-Plus takes this technology to new heights. This version incorporates the advanced DiT framework, adopts Zero-SNR diffusion noise scheduling, and innovatively integrates a text-image joint attention mechanism. These enhancements not only elevate the model's overall performance but also significantly reduce training and inference costs, achieving a perfect balance between efficiency and effectiveness. The 16-dimensional VAE latent space used by CogView3-Plus opens new possibilities for future image generation technologies.

image.png

For developers and researchers eager to explore this cutting-edge technology, the Zhipu Technology team has made the source code repositories for CogView3 and CogView3-Plus-3B available. This move is sure to accelerate the rapid development of the entire AI image generation field, providing a solid technological foundation for more innovative applications.

With the introduction of the CogView3 series models, the application prospects of text-to-image technology are even broader. From personal creation to commercial design, from educational assistance to the entertainment industry, this technology is expected to bring revolutionary changes. We can foresee that in the near future, AI-assisted creation will become the norm, allowing more people to easily realize their artistic visions.

Open-source repository address:

https://github.com/THUDM/CogView3

Plus open-source model repository:

https://huggingface.co/THUDM/CogView3-Plus-3B

https://modelscope.cn/models/ZhipuAI/CogView3-Plus-3B