Wandering Nomad: open‑source AI

23.7.25

Qwen3‑Coder: Alibaba’s 480‑B Agentic Code Model Aims for One‑Million‑Token Repos

When Alibaba’s Qwen research group dropped the link to “Qwen3‑Coder: Agentic Coding in the World,” AI Twitter lit up in minutes. The post introduces Qwen3‑Coder‑480B‑A35B‑Instruct, a gargantuan 480‑billion‑parameter Mixture‑of‑Experts (MoE) language model in which only 35 B parameters activate per token, making deployment far leaner than raw size suggests. Released on July 22, 2025 with permissive access points on GitHub, Hugging Face, and ModelScope, the model claims state‑of‑the‑art results in agent‑style coding and tool use—rivaling Anthropic’s Claude 4 Sonnet while remaining fully open‑weight.

Architecture built for truly big code

The Qwen team doubled down on “scaling in three dimensions.” First, tokens: 7.5 T training tokens with a hefty 70 % code ratio to anchor programming skill while preserving math and general reasoning. Second, context: the model handles a native 256 K‑token window and can stretch to 1 M tokens using YaRN extrapolation, making whole‑repository prompts or week‑long chat traces finally practical. Third, synthetic data: Qwen2.5‑Coder was used to rewrite noisy corpora, boosting baseline cleanliness before fine‑tuning even starts.

Reinforcement learning at industrial scale

Rather than stopping at supervised fine‑tune, Qwen3‑Coder undergoes two novel RL phases. “Scaling Code RL” turns automated unit‑test generation into millions of execution‑checked training rounds—improving code‑run accuracy and even general abilities. Then comes Agent RL, where 20 000 parallel cloud environments simulate real SWE‑Bench tickets. The model learns to plan, invoke tools, and iterate until tests pass, producing best‑in‑class scores on SWE‑Bench Verified without any test‑time tricks.

Benchmarks and agentic chops

Early numbers show Qwen3‑Coder topping every open‑source competitor on Agentic Coding, Agentic Browser‑Use, and Agentic Tool‑Use tracks; Alibaba positions it as “comparable to Claude Sonnet 4” in practical autonomy. In short, it doesn’t just spit snippets—it reasons across multi‑file repos, calls compilers, and revises until green checks appear. For developers chasing fully automated pull‑request bots, that’s a milestone.

Meet Qwen Code—your command‑line copilot

To make those agentic skills tangible, the team open‑sourced Qwen Code, a Node‑based CLI forked from Gemini CLI. With a one‑line npm i -g @qwen-code/qwen-code, users gain a prompt‑driven shell that speaks directly to Qwen3‑Coder via an OpenAI‑compatible endpoint. Prefer other tooling? The blog shows drop‑in guides for Claude Code, Cline, and generic REST calls, so the model can slot into VS Code, Git hooks, or CI pipelines in minutes.

Why it matters

Qwen3‑Coder is more than another “bigger‑is‑better” headline. By combining MoE efficiency, million‑token context, and reinforcement learning tuned for agent workflows, Alibaba delivers a bridge between research hype and developer reality. Hobbyists with a single A100 can experiment with 256 K‑token coding agents, while enterprises get an Apache‑friendly alternative to closed, usage‑metered APIs. For AI enthusiasts, it’s an invitation: wire up Qwen3‑Coder to your build system, hand it a failing test, and watch an open model patch your codebase—all without leaving the command line. The age of end‑to‑end agentic coding just took a decisive step forward.

KAT‑V1 teaches big models when to think—smarter answers, fewer tokens

Large language models excel at reasoning—but often over‑reason, spewing page‑long chains of thought that waste tokens and slow latency. Kuaishou’s Tongyi Lab says its new KAT‑V1 solves that inefficiency with an AutoThink paradigm that dynamically switches between explicit reasoning and terse replies based on task difficulty. The result: a 40 B‑parameter model that matches or beats much larger rivals on toughest‑in‑class benchmarks while trimming compute.

Three ingredients behind AutoThink

Building block	What it does	Why it matters
Dual‑regime dataset	A tagging pipeline + multi‑agent synthesis label each sample as reasoning or no‑reasoning, creating paired traces for mode training.	Gives the model a supervised sense of when to think aloud.
MTP‑enhanced knowledge distillation	Multi‑Token‑Prediction transfers fine‑grained reasoning skills from a tutor model with far less pre‑training cost.	Fine‑grained signal without billions of tokens.
Step‑SRPO RL	Reinforcement learning that adds intermediate supervision to GRPO so the agent optimises both mode selection and answer accuracy in one loop.	Aligns “think vs. skip” decisions with final reward.

Benchmark highlights

LiveCodeBench Pro (leakage‑controlled): tops all open models and edges past OpenAI o3‑mini.
Math, logic & reasoning suites: consistently equals or beats DeepSeek‑R1‑0528 and Qwen3‑235B‑A22B with 40 % fewer active parameters.
Token efficiency: AutoThink cuts average response length and thus total token usage (exact numbers vary by task but run tens of percent lower than straight chain‑of‑thought baselines).

Why this matters

Compute saves tokens, not quality. AutoThink shows you can claw back cost without the typical accuracy drop.
Controllable verbosity. Developers can enforce hard token budgets or latency targets by toggling mode thresholds.
Scales up. A 200 B Mixture‑of‑Experts version with 40 B active weights is already training and showing bigger gains, hinting at a fresh scaling path that isn’t just “more parameters.”

Open for business

KAT‑V1 weights, Step‑SRPO code, and the dual‑regime dataset are live on Hugging Face, and the model already powers Kwaipilot, Kuaishou’s internal coding copilot, where engineers report faster completions and fewer hallucinations.

AutoThink is a reminder that the next leap in LLM performance may come not from thinking harder—but from knowing when not to think at all.

Paper link: arXiv 2507.08297 (PDF)