Chinese Open-Source Image Model Finally Arrives! Zhipu's Powerful Release of CogView4: Handles Chinese Prompts with Ease, Generating Images from Chinese Characters!

AIbase基地

Published inAI News · 5 min read · Mar 4, 2025

Tired of searching for open-source image models that understand Chinese? Say goodbye to the limitations of English prompts! Chinese AI giant Zhipu AI has proudly open-sourced its new text-to-image model, CogView4, pushing Chinese image generation technology to new heights! Now, designers, content creators, and even AI art novices can use their native language to master AI image generation!

CogView4's biggest highlight is its incredibly strong understanding of Chinese! No more struggling with translation software; use natural Chinese instructions, and CogView4 will instantly grasp your artistic vision and accurately generate the desired image! Even more impressive, it's the first open-source model that can directly "write" Chinese characters into the image! This is a game-changer for Chinese users, allowing for more authentic creative expression without worrying about text incompatibility.

Even better, CogView4 completely removes limitations on image size and prompt length! Want to generate a massive widescreen poster? No problem! Want a lengthy prompt describing a complex scene? Go ahead! CogView4 can easily handle it all, fulfilling your wildest creative needs and unleashing your imagination.

CogView4 isn't just style over substance; it won first place in the authoritative DPG-Bench benchmark test, demonstrating its superior capabilities. This means CogView4 is not only user-friendly but also powerful, offering top-tier image generation quality to meet even the most demanding requirements.

To help more developers and users utilize CogView4, Zhipu AI has also announced plans to open-source supporting ControlNet, ComfyUI, and model fine-tuning tools – essentially providing the complete toolkit! This means you can not only use CogView4's powerful features out-of-the-box but also customize it to create even more personalized and powerful image generation models.

So, how did CogView4 achieve this? Simply put, it boasts several key technological upgrades:

Bilingual Capability Leap: CogView4's "brain" has been upgraded to the more powerful GLM-4 encoder, enabling it to handle both Chinese and English seamlessly. It has also been trained on a massive amount of bilingual text and image data, overcoming the limitations of previous Chinese models that struggled with English, achieving true bilingual fluency.

Smarter Text Processing: CogView4 uses dynamic text length technology, acting like an intelligent tailor that adjusts to the length of the prompt, avoiding the waste and redundancy of traditional fixed-length methods, resulting in a 5-30% efficiency increase. This means CogView4 not only understands prompts more accurately but also generates images faster.

More Flexible Resolution Generation: CogView4 uses "hybrid resolution training" and "two-dimensional rotational positional encoding," allowing it to handle various image sizes, from high-resolution images to smaller, more refined ones. It also employs a Flow-matching diffusion model and parameterized linear dynamic noise scheduling for smoother and more controllable image generation.

More Refined Training Process: CogView4's training process is meticulously refined, undergoing multi-stage training and human preference alignment. From basic resolution to versatile resolution and then high-quality data fine-tuning, every step strives for excellence. It also retains the Share-param DiT architecture and uses independent adaptive layer normalization for different modalities, making the model more powerful and efficient.

Project Address: https://github.com/THUDM/CogView4

ByteDance Releases Seedream 3.0 Text-to-Image Model Technical Report: Significant Performance Upgrades

ByteDance's Seed team has officially released the technical report for its Seedream 3.0 text-to-image model. This model boasts significant performance improvements, representing a native high-resolution, bilingual (English and Chinese) foundational image generation model. It achieves breakthroughs in resolution, structural accuracy of generated images, and more, showing significant advantages over the previous version. The report details Seedream 3.0's performance across various dimensions. Data in the charts are normalized using the best indicator as a reference. Seedream 3.0 natively supports...

Deep Research Now Powered by Gemini 2.5 Pro: Google's Most Intelligent AI Model Arrives

April 9, 2025: AI research tools have reached a critical breakthrough. Google announced a major upgrade to its highly anticipated Deep Research feature, now powered by the experimental Gemini 2.5 Pro. This model has demonstrated superior performance in industry reasoning benchmarks and Chatbot Arena evaluations, considered by professionals to be one of the most powerful AI models globally. This technological advancement has sparked widespread interest among researchers, technical experts, and industry observers.

ReliaQuest Secures $500M in Funding to Advance Intelligent AI Security Technology

Amidst growing global concerns about cybersecurity, Tampa, Florida-based security operations company ReliaQuest announced it has secured over $500 million in funding, bringing its valuation to $3.4 billion. The funding round, led by EQT, KKR, and FTV Capital, marks ReliaQuest's rapid ascendancy in the cybersecurity landscape. This significantly increases ReliaQuest's valuation from the $1 billion valuation achieved in December 2021, demonstrating substantial growth in just a few years.

Say Goodbye to Modeling Hell! Turn Hand-Drawn Sketches into Fine 3D Models Instantly with MeshPad!

Tired of struggling with complex 3D modeling software? Frustrated by endless parameters and menus? Now, all you need is a pen to 'draw' your 3D world like an artist! A revolutionary tool called MeshPad has arrived, completely changing the traditional 3D modeling process. It lets you create and edit 3D shapes as easily as doodling on paper, without any professional 3D modeling skills required. Anyone can become a 3D designer! MeshPad's core magic lies in its sketch-based interaction.

DreamFusion Integrates DeepSeek: From Prompt to Painting, All in One!

Never worry about writing prompts for AI art again! DreamFusion has officially integrated DeepSeek, bringing intelligent prompt generation capabilities directly into its platform. This means you'll no longer struggle to find the right words when creating art with DreamFusion. DeepSeek, the AI copywriting expert, will become your personal prompt assistant, solving your creative challenges with a single click and unleashing a torrent of artistic inspiration! This collaboration between DreamFusion and DeepSeek is a game-changer, directly addressing a major pain point for AI art enthusiasts.

AI Daily: CogView4, an Open-Source Text-to-Image Model Generating Chinese Characters; Ollama, a Large Model Tool, Has a Critical Vulnerability; Tencent Yuanbao Surpasses DeepSeek in Downloads

Welcome to the 【AI Daily】column! Your daily guide to exploring the world of artificial intelligence. We present you with the hottest AI content, focusing on developers, helping you understand technology trends and learn about innovative AI product applications. Discover new AI products: https://top.aibase.com/ 1. Zhipu Releases CogView4, the First Open-Source Text-to-Image Model Capable of Generating Chinese Characters On March 4, 2025, Beijing Zhipu Huazhang Technology Co., Ltd. launched CogView4...

CogView4: An Open-Source Text-to-Image Model Supporting Bilingual Prompts

Zhihu AI has officially released CogView4, its latest open-source text-to-image model. Boasting 600 million parameters, CogView4 notably supports both Chinese and English prompts and text-to-image generation, making it the first open-source model capable of generating Chinese characters within images. Its key feature is its support for bilingual prompts, excelling at understanding and following complex Chinese instructions, a boon for Chinese content creators. As the first open-source text-to-image model to generate Chinese characters in images, it fills a significant gap in the open-source landscape.