GitHub - PaddlePaddle/ERNIE: The official repository for ERNIE 4.5 and ERNIEKit – its industrial-grade development toolkit based on PaddlePaddle.

📑 Blog | 📚 Cookbook | 📑 Paper | 🛠️ Toolkit

Introduction to ERNIE 4.5

We introduce ERNIE 4.5, a new family of large-scale multimodal models comprising 10 distinct variants. The model family consist of Mixture-of-Experts (MoE) models with 47B and 3B active parameters, with the largest model having 424B total parameters, as well as a 0.3B dense model. For the MoE architecture, we propose a novel heterogeneous modality structure, which supports parameter sharing across modalities while also allowing dedicated parameters for each individual modality. This MoE architecture has the advantage to enhance multimodal understanding without compromising, and even improving, performance on text-related tasks. All of our models are trained with optimal efficiency using the PaddlePaddle deep learning framework, which also enables high-performance inference and streamlined deployment for them. We achieve 47% Model FLOPs Utilization (MFU) in our largest ERNIE 4.5 language model pre-training. Experimental results show that our models achieve state-of-the-art performance across multiple text and multimodal benchmarks, especially in instruction following, world knowledge memorization, visual understanding and multimodal reasoning. All models are publicly accessible under Apache 2.0 to support future research and development in the field. Additionally, we open source the development toolkits for ERNIE 4.5, featuring industrial-grade capabilities, resource-efficient training and inference workflows, and multi-hardware compatibility.

ERNIE 4.5

ERNIE 4.5 Models		Model Information
Model Category	Model	Input Modality	Output Modality	Context Window
Large Language Models (LLMs)	ERNIE-4.5-300B-A47B-Base	Text	Text	128K
	ERNIE-4.5-300B-A47B
	ERNIE-4.5-21B-A3B-Base
	ERNIE-4.5-21B-A3B
Vision-Language Models (VLMs)	ERNIE-4.5-VL-424B-A47B-Base	Text/Image/Video	Text
	ERNIE-4.5-VL-424B-A47B
	ERNIE-4.5-VL-28B-A3B-Base
	ERNIE-4.5-VL-28B-A3B
Dense Models	ERNIE-4.5-0.3B-Base	Text	Text
Dense Models	ERNIE-4.5-0.3B	Text	Text

Note: All models (including pre-trained weights and inference code) have been released on 🤗Hugging Face, and AI Studio. Check our blog for more details.

Highlights

Our model family is characterized by three key innovations:

Multimodal Heterogeneous MoE Pre-Training: Our models are jointly trained on both textual and visual modalities to better capture the nuances of multimodal information and improve performance on tasks involving text understanding and generation, image understanding, and cross-modal reasoning. To achieve this without one modality hindering the learning of another, we designed a heterogeneous MoE structure, incorporated modality-isolated routing, and employed router orthogonal loss and multimodal token-balanced loss. These architectural choices ensure that both modalities are effectively represented, allowing for mutual reinforcement during training.
Scaling-Efficient Infrastructure: We propose a novel heterogeneous hybrid parallelism and hierarchical load balancing strategy for efficient training of ERNIE 4.5 models. By using intra-node expert parallelism, memory-efficient pipeline scheduling, FP8 mixed-precision training and finegrained recomputation methods, we achieve remarkable pre-training throughput. For inference, we propose multi-expert parallel collaboration method and convolutional code quantization algorithm to achieve 4-bit/2-bit lossless quantization. Furthermore, we introduce PD disaggregation with dynamic role switching for effective resource utilization to enhance inference performance for ERNIE 4.5 MoE models. Built on PaddlePaddle, ERNIE 4.5 delivers high-performance inference across a wide range of hardware platforms.
Modality-Specific Post-Training: To meet the diverse requirements of real-world applications, we fine-tuned variants of the pre-trained model for specific modalities. Our LLMs are optimized for general-purpose language understanding and generation. The VLMs focuses on visuallanguage understanding and supports both thinking and non-thinking modes. Each model employed a combination of Supervised Fine-tuning (SFT), Direct Preference Optimization (DPO) or a modified reinforcement learning method named Unified Preference Optimization (UPO) for post-training.

Performance and Benchmark Results

ERNIE-4.5-300B-A47B-Base surpasses DeepSeek-V3-671B-A37B-Base on 22 out of 28 benchmarks, demonstrating leading performance across all major capability categories. This underscores the substantial improvements in generalization, reasoning, and knowledge-intensive tasks brought about by scaling up the ERNIE-4.5-Base model relative to other state-of-the-art large models. With a total parameter size of 21B (approximately 70% that of Qwen3-30B), ERNIE-4.5-21B-A3B-Base outperforms Qwen3-30B-A3B-Base on several math and reasoning benchmarks, including BBH and CMATH. ERNIE-4.5-21B-A3B-Base remains highly competitive given its significantly smaller model size, demonstrating notable parameter efficiency and favorable performance trade-offs.

ERNIE-4.5-300B-A47B, the post trained model, demonstrates significant strengths in instruction following and knowledge tasks, as evidenced by the state-of-the-art scores on benchmarks such as IFEval, Multi-IF, SimpleQA, and ChineseSimpleQA. The lightweight model ERNIE-4.5-21B-A3B achieves competitive performance compared to Qwen3-30B-A3B, despite having approximately 30% fewer total parameters.

In the non-thinking mode, ERNIE-4.5-VL exhibits outstanding proficiency in visual perception, document and chart understanding, and visual knowledge, performing strongly across a range of established benchmarks. Under the thinking mode, ERNIE-4.5-VL not only demonstrates enhanced reasoning abilities compared to the non-thinking mode, but also retains the strong perception capabilities of the latter. ERNIE-4.5-VL-424B-A47B delivers consistently strong results across the various multimodal evaluation benchmarks. Its thinking mode offers a distinct advantage on challenging benchmarks such as MathVista, MMMU, and VisualPuzzle, while maintaining competitive performance on perception-focused datasets like CV-Bench and RealWorldQA. The lightweight vision-language model ERNIE-4.5-28B-A3B achieves competitive or even superior performance compared to Qwen2.5-VL-7B and Qwen2.5-VL-32B across most benchmarks, despite using significantly fewer activation parameters. Notably, our lightweight model also supports both thinking and non-thinking modes, offering functionalities consistent with ERNIE-4.5-VL-424B-A47B.

Performace of ERNIE-4.5 pre-trained models

Performance of post-trained model ERNIE-4.5-300B-A47B

Performance of post-trained model ERNIE-4.5-21B-A3B

Performance of post-trained multimodal models in thinking mode

Performance of post-trained multimodal models in non-thinking mode

Model Development

ERNIE 4.5 models are trained and deployed for inference using the PaddlePaddle framework. The full workflow of training, compression, and inference for ERNIE 4.5 is supported through the ERNIEKit and FastDeploy toolkit. The table below details the feature matrix of the ERNIE 4.5 model family for training and inference.

Model	Training	Inference
ERNIE-4.5-300B-A47B-Base	SFT/SFT-LoRA/DPO/DPO-LoRA	BF16 / W4A16C16 / W8A16C16 / FP8
ERNIE-4.5-300B-A47B	SFT/SFT-LoRA/DPO/DPO-LoRA/QAT	BF16 / W4A16C16 / W8A16C16 / W4A8C8 / FP8 / 2Bits
ERNIE-4.5-21B-A3B-Base	SFT/SFT-LoRA/DPO/DPO-LoRA	BF16 / W4A16C16 / W8A16C16 / FP8
ERNIE-4.5-21B-A3B	SFT/SFT-LoRA/DPO/DPO-LoRA	BF16 / W4A16C16 / W8A16C16 / FP8
ERNIE-4.5-VL-424B-A47B-Base	Coming Soon	BF16 / W4A16C16 / W8A16C16 / FP8
ERNIE-4.5-VL-424B-A47B	Coming Soon	BF16 / W4A16C16 / W8A16C16 / FP8
ERNIE-4.5-VL-28B-A3B-Base	Coming Soon	BF16 / W4A16C16 / W8A16C16 / FP8
ERNIE-4.5-VL-28B-A3B	Coming Soon	BF16 / W4A16C16 / W8A16C16 / FP8
ERNIE-4.5-0.3B-Base	SFT/SFT-LoRA/DPO/DPO-LoRA	BF16 / W8A16C16 / FP8
ERNIE-4.5-0.3B	SFT/SFT-LoRA/DPO/DPO-LoRA	BF16 / W8A16C16 / FP8

Note: For different ERNIE 4.5 model, we provide diverse quantization schemes using the notation WxAxCx, where: W indicates weight precision, A indicates activation precision, C indicates KV Cache precision, x represents numerical precision.

ERNIEKit: ERNIE Development Toolkit Based on PaddlePaddle

ERNIEKit is an industrial-grade training and compression development toolkit for ERNIE models based on PaddlePaddle, offering full-cycle development support for the ERNIE 4.5 model family. Key capabilities include:

High-performance pre-training implementation
Full-parameter supervised fine-tuning (SFT)
Direct Preference Optimization (DPO)
Parameter-efficient fine-tuning and alignment (SFT-LoRA/DPO-LoRA)
Quantization-Aware Training (QAT)
Post-Training Quantization (PTQ) [WIP]

Minimum hardware requirements for training each model are documented here.

Quick Start

When you install ERNIEKit successfully, you can start training ERNIE 4.5 models with the following command:

# download model from huggingface
huggingface-cli download baidu/ERNIE-4.5-0.3B-Paddle --local-dir baidu/ERNIE-4.5-0.3B-Paddle
# 8K Sequence Length, SFT
erniekit train examples/configs/ERNIE-4.5-0.3B/sft/run_sft_8k.yaml

For detailed guides on installation, CLI usage, WebUI, multi-node training, and advanced features, please refer to ERNIEKit Training Document.

ERNIEKit WebUI demo:

webui_demo_0630.mp4

FastDeploy：High-performance Inference and Deployment Toolkit for LLMs and VLMs Based on PaddlePaddle

FastDeploy is an inference and deployment toolkit for large language models and visual language models, developed based on PaddlePaddle. It delivers production-ready, easy-to-use multi-hardware deployment solutions with multi-level load-balanced PD disaggregation, comprehensive quantization format support, OpenAI API server and vLLM compatible etc.

For installation please refer to FastDeploy.

Offline Inference

from fastdeploy import LLM, SamplingParams

prompt = "Write me a poem about large language model."
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

llm = LLM(model="baidu/ERNIE-4.5-0.3B-Paddle", max_model_len=32768)

outputs = llm.generate(prompt, sampling_params)

Online Serving

python -m fastdeploy.entrypoints.openai.api_server \
    --model "baidu/ERNIE-4.5-0.3B-Paddle" \
    --max-model-len 32768 \
    --port 9904

For more inference and deployment guides, please refer to FastDeploy.

Cookbooks

Discover best-practice guides showcasing ERNIE’s capabilities across multiple domains:

Cookbook	Description	Gradio Demo
Conversation	Building conversational applications.	conversation_demo.py
Simple ERNIE Bot	Creating a lightweight web-based ERNIE Bot.	simple_ernie_bot_demo.py
Web-Search-Enhanced Conversation	Building conversational apps with integrated web search.	web_search_demo.py
Knowledge Retrieval-based Q&A	Building intelligent Q&A systems with private knowledge bases.	knowledge_retrieval_demo.py
Advanced Search	Building article-generation applications using deep information extraction.	advanced_search_demo.py
SFT tutorial	Optimizing task performance through supervised fine-tuning with ERNIEKit.	-
DPO tutorial	Aligning models with human preferences using ERNIEKit.	-
Text Recognition	A Comprehensive Guide to Developing Text Recognition for Non-Chinese and Non-English Languages Using ERNIE and PaddleOCR.	-
Document Translation	Document Translation Practice Based on ERNIE and PaddleOCR.	-
Key Information Extraction	Key Information Extraction in Contract Scenarios Based on ERNIE and PaddleOCR.	-

Community

PaddlePaddle WeChat official account	Join the tech discussion group

License

The ERNIE 4.5 models are provided under the Apache License 2.0. This license permits commercial use, subject to its terms and conditions.

Citation

If you find ERNIE 4.5 useful or wish to use it in your projects, please kindly cite our technical report:

@misc{ernie2025technicalreport,
      title={ERNIE 4.5 Technical Report},
      author={Baidu ERNIE Team},
      year={2025},
      eprint={},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={}
}

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
cookbook		cookbook
data_processor		data_processor
docs		docs
ernie		ernie
erniekit		erniekit
examples		examples
requirements/gpu		requirements/gpu
tools		tools
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Introduction to ERNIE 4.5

Highlights

Performance and Benchmark Results

Performace of ERNIE-4.5 pre-trained models

Performance of post-trained model ERNIE-4.5-300B-A47B

Performance of post-trained model ERNIE-4.5-21B-A3B

Performance of post-trained multimodal models in thinking mode

Performance of post-trained multimodal models in non-thinking mode

Model Development

ERNIEKit: ERNIE Development Toolkit Based on PaddlePaddle

Quick Start

FastDeploy：High-performance Inference and Deployment Toolkit for LLMs and VLMs Based on PaddlePaddle

Offline Inference

Online Serving

Cookbooks

Community

License

Citation

About

Uh oh!

Releases 6

Uh oh!

Contributors 9

Uh oh!

Languages

License

PaddlePaddle/ERNIE

Folders and files

Latest commit

History

Repository files navigation

Introduction to ERNIE 4.5

Highlights

Performance and Benchmark Results

Performace of ERNIE-4.5 pre-trained models

Performance of post-trained model ERNIE-4.5-300B-A47B

Performance of post-trained model ERNIE-4.5-21B-A3B

Performance of post-trained multimodal models in thinking mode

Performance of post-trained multimodal models in non-thinking mode

Model Development

ERNIEKit: ERNIE Development Toolkit Based on PaddlePaddle

Quick Start

FastDeploy：High-performance Inference and Deployment Toolkit for LLMs and VLMs Based on PaddlePaddle

Offline Inference

Online Serving

Cookbooks

Community

License

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Uh oh!

Contributors 9

Uh oh!

Languages