A Flagship Release for the Open-Source Community
On July 1 2025, Baidu announced the open-source launch of ERNIE 4.5, a complete large-language-model family scaling from 0.3 billion to 424 billion parameters. The weights, training code, and evaluation suites are now freely available to researchers and enterprises under the Apache 2.0 license.
Six Sizes, One Architecture
Model | Dense / MoE | Context Window | Target Hardware* | Intended Use |
---|---|---|---|---|
ERNIE-Tiny 0.3B | Dense | 16 K | Mobile/Edge | Lightweight chat & IoT |
ERNIE-Base 7B | Dense | 32 K | 1× A10 24 GB | Mainstream apps |
ERNIE-Large 34B | Dense | 128 K | 2× A100 80 GB | RAG & agents |
ERNIE-XL 124B | MoE (8 experts) | 256 K | 4× H100 80 GB | Multimodal research |
ERNIE-Mega 276B | MoE (16) | 256 K | 8× H100 80 GB | Enterprise AI |
ERNIE-Ultra 424B | MoE (24) | 1 M | TPU v5p / 16× H100 | Frontier-level reasoning |
Technology Highlights
-
FlashMask Dynamic Attention – a masking scheme that activates only the most relevant key-value blocks per token, cutting memory by 40 % while retaining context depth.
-
Heterogeneous Multimodal MoE – vision-audio experts share early layers with text, enabling cross-modal reasoning without separate encoders.
-
Knowledge-Centric Corpus – Baidu’s in-house “Wenxin KG-2” injects 4 T tokens of curated facts and regulations, boosting compliance answers.
-
Self-Feedback Post-Training – iterative reflection steps reduce hallucination rate by 28 % vs. ERNIE 4.0.
Benchmark Performance
Benchmark (June 2025) | GPT-4.5* | ERNIE 4.5-Ultra 424B | ERNIE 4.5-Large 34B |
---|---|---|---|
MMLU (5-shot) | 88.7 % | 89.3 % | 82.1 % |
MathGLUE | 55.4 % | 57.2 % | 48.0 % |
VQA-v2 (zero-shot) | 83.0 % | 84.6 % | 78.9 % |
Code HumanEval+ | 93.5 % | 94.1 % | 87.3 % |
Why It Matters
-
End-to-End Transparency – full training configs (FlashMask, MoE routing, safety filters) are published, enabling reproducible research.
-
Scalable Deployment – identical API across sizes lets startups choose Tiny/7B locally and swap to 424B in the cloud without prompt changes.
-
Multilingual & Multimodal – supports 34 languages and native image, audio, and short-video tokens out of the box.
-
Cost Innovation – FlashMask and MoE shrink inference FLOPs by up to 55 % versus dense GPT-4-class models, lowering GPU bills for enterprise users.
Access & Tooling
-
Hugging Face Hub – weights and safetensors for all six checkpoints.
-
Docker & vLLM Images – ready-to-serve stacks with Triton / TensorRT-LLM.
-
Agent Starter Kits – sample Model-Context-Protocol (MCP) tools for retrieval, calculators, and code execution.
-
Chinese & English Docs – prompt templates, fine-tuning scripts, and safety policy examples.
Roadmap
Baidu’s research blog notes upcoming “ERNIE 4.6” experiments with FlashMask-2 and sparse Mixture-of-Experts vision heads, plus a policy-aligned Turbo variant targeting 80 % cheaper inference for chat applications.
Takeaway
With ERNIE 4.5, Baidu throws open the doors to a fully transparent, parameter-scalable, multimodal LLM family—giving practitioners a home-grown alternative to closed giants and pushing the frontier of what open-source models can achieve.