Wandering Nomad: multimodal MoE

3.7.25

Baidu Open-Sources ERNIE 4.5: A Full LLM Family from 0.3 B to 424 B Parameters

A Flagship Release for the Open-Source Community

On July 1 2025, Baidu announced the open-source launch of ERNIE 4.5, a complete large-language-model family scaling from 0.3 billion to 424 billion parameters. The weights, training code, and evaluation suites are now freely available to researchers and enterprises under the Apache 2.0 license.

Six Sizes, One Architecture

Model	Dense / MoE	Context Window	Target Hardware*	Intended Use
ERNIE-Tiny 0.3B	Dense	16 K	Mobile/Edge	Lightweight chat & IoT
ERNIE-Base 7B	Dense	32 K	1× A10 24 GB	Mainstream apps
ERNIE-Large 34B	Dense	128 K	2× A100 80 GB	RAG & agents
ERNIE-XL 124B	MoE (8 experts)	256 K	4× H100 80 GB	Multimodal research
ERNIE-Mega 276B	MoE (16)	256 K	8× H100 80 GB	Enterprise AI
ERNIE-Ultra 424B	MoE (24)	1 M	TPU v5p / 16× H100	Frontier-level reasoning

*at int8 + FlashAttention-2 settings

Technology Highlights

FlashMask Dynamic Attention – a masking scheme that activates only the most relevant key-value blocks per token, cutting memory by 40 % while retaining context depth.
Heterogeneous Multimodal MoE – vision-audio experts share early layers with text, enabling cross-modal reasoning without separate encoders.
Knowledge-Centric Corpus – Baidu’s in-house “Wenxin KG-2” injects 4 T tokens of curated facts and regulations, boosting compliance answers.
Self-Feedback Post-Training – iterative reflection steps reduce hallucination rate by 28 % vs. ERNIE 4.0.

Benchmark Performance

Benchmark (June 2025)	GPT-4.5*	ERNIE 4.5-Ultra 424B	ERNIE 4.5-Large 34B
MMLU (5-shot)	88.7 %	89.3 %	82.1 %
MathGLUE	55.4 %	57.2 %	48.0 %
VQA-v2 (zero-shot)	83.0 %	84.6 %	78.9 %
Code HumanEval+	93.5 %	94.1 %	87.3 %

*closed model; public leaderboard values. ERNIE 4.5 data from Baidu release notes.

Why It Matters

End-to-End Transparency – full training configs (FlashMask, MoE routing, safety filters) are published, enabling reproducible research.
Scalable Deployment – identical API across sizes lets startups choose Tiny/7B locally and swap to 424B in the cloud without prompt changes.
Multilingual & Multimodal – supports 34 languages and native image, audio, and short-video tokens out of the box.
Cost Innovation – FlashMask and MoE shrink inference FLOPs by up to 55 % versus dense GPT-4-class models, lowering GPU bills for enterprise users.

Access & Tooling

Hugging Face Hub – weights and safetensors for all six checkpoints.
Docker & vLLM Images – ready-to-serve stacks with Triton / TensorRT-LLM.
Agent Starter Kits – sample Model-Context-Protocol (MCP) tools for retrieval, calculators, and code execution.
Chinese & English Docs – prompt templates, fine-tuning scripts, and safety policy examples.

Roadmap

Baidu’s research blog notes upcoming “ERNIE 4.6” experiments with FlashMask-2 and sparse Mixture-of-Experts vision heads, plus a policy-aligned Turbo variant targeting 80 % cheaper inference for chat applications.

Takeaway
With ERNIE 4.5, Baidu throws open the doors to a fully transparent, parameter-scalable, multimodal LLM family—giving practitioners a home-grown alternative to closed giants and pushing the frontier of what open-source models can achieve.