Wandering Nomad: ChatGLM’s GLM‑4 family levels up

30.7.25

ChatGLM’s GLM‑4 family levels up—and brings its own toolbox

Tsinghua‑spun Zhipu AI has spent three years iterating on ChatGLM, a Chinese‑English rival to GPT. Its new report zooms in on the GLM‑4 series, a trio that stretches from a data‑center‑class behemoth to a 9 B‑parameter fine‑tune you can run at home. The headline: GLM‑4 “closely rivals or outperforms GPT‑4” on marquee leaderboards—while an All Tools variant autonomously fires up external apps to finish harder jobs.

Under the hood

Piece	Why it matters
10 T‑token corpus (Chinese & English‑heavy, 24 other languages)	Gives the model near‑par bilingual parity—something GPT‑4 still chases in Chinese.
Multi‑stage alignment (SFT → RLHF)	Drives instruction following to GPT‑4‑Turbo levels on IFEval without bloating answers.
All Tools post‑training	Lets GLM‑4 decide if a prompt needs web search, Python, text‑to‑image, or any user‑defined API—no manual tool triggers.

The SKUs

GLM‑4 – flagship ~130 B active params, 128 K context, up to 1 M with sparse attention.
GLM‑4‑Air – latency‑trimmed 34 B variant tuned for GPU serving.
GLM‑4‑9B / 9B‑Chat – consumer‑grade checkpoint (128 K / 1 M context) already live on Hugging Face.

Scorecard highlights

General reasoning: beats or ties GPT‑4 on MMLU, GSM8K, MATH, BBH, GPQA, HumanEval.
Chinese alignment: tops GPT‑4 on AlignBench.
Long context: matches GPT‑4‑Turbo 128 K and Claude 3 at 256 K spill‑tests.
Tool use: in dev‑set trials, GLM‑4 All Tools edges GPT‑4 All Tools in web‑info retrieval and Python‑powered math. a

Why it matters

Bilingual crown – China finally has an open(-ish) model that doesn’t trade English chops for Mandarin mastery.
Tool autonomy – A single checkpoint that chooses whether to browse, code or draw marks a step toward plug‑and‑play agent workflows.
Open‑source momentum – Previous ChatGLM releases logged 10 M+ Hugging Face downloads in 2023; GLM‑4‑9B is expected to super‑charge that hobbyist wave.

Rapid timeline of the GLM ecosystem

![timeline figure omitted] The paper’s timeline shows an 18‑month sprint from GLM‑130B to GLM‑4‑All Tools, with side quests into code (CodeGeeX), vision (GLM‑4V‑9B) and agents (AutoWebGLM).

The road ahead

Zhipu AI hints at an MoE‑style GLM‑5 and deeper tool libraries (SQL, vector search, proprietary APIs). For builders already juggling browser calls, Python sandboxes and image pipes, GLM‑4 All Tools may offer a cleaner, unified brain—especially if your product needs to speak both English and Mandarin with equal poise.

Paper link: arXiv 2406.12793 (PDF)