Beijing Zhipu Artificial Intelligence Co., Ltd. announced a series of significant technological updates on August 29, 2024, including the release of a new generation of foundational models and new application services.
At the KDD2024 conference, Zhipu introduced a new generation of foundational models, including the language model GLM-4-Plus, the text-to-image model CogView-3-Plus, the image/video understanding model GLM-4V-Plus, and the video generation model CogVideoX. These models have reached international leading levels in their respective fields.
The GLM-4-Plus model has seen comprehensive improvements in language understanding, instruction following, and long-text processing, on par with the first-tier models like GPT-4o. The CogView-3-Plus model has replaced the traditional UNet architecture with a Transformer architecture, optimizing the model's effectiveness and approaching the performance of top-tier models like MJ-V6 and FLUX. The GLM-4V-Plus model possesses high-quality capabilities in image and video understanding, becoming the first general-purpose video understanding model API in China. The CogVideoX model, after releasing the 2B version, further open-sourced the 5B version, enhancing its performance and becoming a standout among currently open-source video generation models.
Additionally, Zhipu launched the first video call service for C-end users in China on the Qingyan APP, which transcends text, audio, and video modalities and features real-time inference capabilities, providing users with a smooth interactive experience.
Zhipu also announced the free use of the GLM-4-Flash API, which has advantages in speed and performance, allowing users to quickly and freely build customized models and applications. At the same time, to meet the needs of different users, Zhipu offers model fine-tuning functions.
Zhipu states that it will continue to advance, enabling machines to think like humans and bring more advanced technology and services to users.
Key Updates:
Language Foundation Model GLM-4-Plus: Comprehensive improvements in language understanding, instruction following, and long-text processing, maintaining international leading levels.
Text-to-Image Foundation Model CogView-3-Plus: Performance close to the current best models like MJ-V6 and FLUX.
Image/Video Understanding Foundation Model GLM-4V-Plus: Excellent image understanding capabilities and time-aware video understanding. This model will be available on the open platform (bigmodel.cn) and becomes the first general-purpose video understanding model API in China.
Video Generation Foundation Model CogVideoX: After releasing and open-sourcing the 2B version, the 5B version is also officially open-sourced, with further enhanced performance, making it the best choice among currently open-source video generation models.
Qingyan APP Launches Video Calls: The first video call service open to C-end users in China, featuring cross-modal capabilities in text, audio, and video, with real-time inference.
GLM-4-Flash API: Completely free inference service with fine-tuning options available.
Video Call Service Application Link:
https://zhipu-ai.feishu.cn/share/base/form/shrcnqpIx9q5ILEFeT2cPNhyuSf