THUDM Releases GLM-4: A 3.2 Billion Parameter Model Rivaling GPT-4o and DeepSeek-V3

AIbase基地

Published inAI News · 6 min read · Apr 15, 2025

15

In the rapidly evolving field of large language models (LLMs), researchers and organizations face numerous challenges. These include enhancing reasoning capabilities, providing robust multilingual support, and effectively managing complex open-ended tasks. While smaller models are often more accessible and cost-effective, they typically underperform larger models. Therefore, developing medium-sized models that effectively balance computational efficiency with strong reasoning and instruction-following abilities has become a current trend.

Recently, Tsinghua University released GLM4, specifically the GLM-Z1-32B-0414 variant, addressing these challenges. Trained on a massive dataset comprising 15 trillion tokens, GLM4 aims to provide reliable multilingual capabilities and introduces an innovative reasoning strategy called "thinking-in-the-loop".

This release positions GLM4 alongside other prominent models like DeepSeek Distill, QwQ, and O1-mini, and is distributed under the popular MIT license. Notably, despite its 3.2 billion parameter size, GLM4 demonstrates performance in reasoning benchmarks comparable to much larger models such as GPT-4o and DeepSeek-V3, which boast up to 671 billion parameters.

Technically, GLM-Z1-32B-0414 leverages high-quality training data, including synthetically generated reasoning tasks, to enhance its analytical capabilities. The model integrates advanced techniques like rejection sampling and reinforcement learning (RL) to improve performance on agent-based tasks, coding, function calling, and search-driven question answering.

Furthermore, its "deep reasoning model" variant, optimized for complex mathematical, logical, and coding tasks, incorporates a cold-start method and extended RL training. A pairwise ranking feedback mechanism was also employed during training to enhance overall reasoning effectiveness.

An advanced variant, GLM-Z1-Rumination-32B-0414, introduces a novel "rumination" method, allowing the model to engage in extended, reflective reasoning to tackle open-ended, complex problems like AI-driven city analysis. This variant combines advanced search tools with multi-objective reinforcement learning, significantly improving its practicality in research-intensive tasks and complex retrieval scenarios. To cater to diverse needs, the GLM-Z1-9B-0414 version, with its 9 billion parameters, demonstrates strong mathematical and general reasoning capabilities, showcasing the viability of smaller-scale models.

Benchmark performance data underscores the strengths of the GLM4 family. Specifically, GLM-4-32B-0414 exhibits robust performance across multiple benchmarks, comparing favorably against models like GPT-4o, DeepSeek-V3, and Qwen2.5-Max. On the IFEval instruction-following benchmark, GLM4 achieved a high score of 87.6. On the TAU-Bench for task automation benchmarks such as retail (68.7) and aviation (51.2), GLM4 also performed well. In search-augmented question answering tasks evaluated by SimpleQA, the model achieved a score of 88.1.

Additionally, in function calling tasks on the BFCL-v3 benchmark, GLM4 achieved an overall score of 69.6, nearly matching GPT-4o's performance. In real-world code repair scenarios tested via the Moatless framework, GLM4 demonstrated a success rate of 33.8%, highlighting its practical value.

GLM4 demonstrates the potential as an effective LLM family, successfully bridging the performance gap between smaller, accessible models and traditionally larger ones. The GLM-Z1 series, particularly the 32B variant, exemplifies this balanced approach by providing strong reasoning capabilities while maintaining computational affordability. Its permissive MIT license positions GLM4 as a valuable tool for high-performance AI solutions in research and enterprise applications, without the substantial computational overhead associated with traditional large models.

huggingface:https://huggingface.co/THUDM/GLM-Z1-32B-0414

Key Highlights:
- 🌍 GLM4 is a 3.2 billion parameter language model released by Tsinghua University, featuring strong multilingual and reasoning capabilities.
- 📊 The model excels in various benchmark tests, particularly in instruction following and task automation, showcasing performance comparable to much larger models.
- 🚀 GLM4's MIT license makes high-performance AI solutions more accessible, suitable for research and enterprise applications.

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News

AI Daily

AI Timeline

Al Hardware

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview

THUDM Releases GLM-4: A 3.2 Billion Parameter Model Rivaling GPT-4o and DeepSeek-V3

AIbase基地

This article is from AIbase Daily