In this work, we propose HC-SMoE (Hierarchical Clustering for Sparsely activated Mixture of Experts), a task-agnostic expert merging framework that reduces SMoE model parameters without retraining. Unlike previous methods, HC-SMoE employs hierarchical clustering based on expert outputs, ensuring that the merging process is unaffected by routing decisions. This output-based clustering strategy captures functional similarities between experts, offering an adaptable solution for models with numerous experts. We validate our approach through extensive experiments on eight zero-shot language tasks and demonstrate its effectiveness in large-scale SMoE models like Qwen and Mixtral. Our results show that HC-SMoE consistently achieves strong performance, highlighting its potential for real-world deployment.
- [2025/05/01] 🔥 Our paper, Retraining-Free Merging of Sparse Mixture-of-Experts via Hierarchical Clustering, has been accepted by ICML 2025!
- Release accepted version of the paper.
This repository is written based on the codes in the GitHub.
- Install basic packages.
pip install -r requirements.txt - Install
lm-eval. lm-evaluation-harness
Please download the C4 training data c4-train.00000-of-01024.json from allenai/c4.
Then save it to path hcsmoe/data/c4-train.00000-of-01024.json.
Or you can assign the path in hcsmoe/evaluation/minipile.py.
DATASETS = {
'c4': lambda: load_dataset('json', data_files={'train': 'hcsmoe/data/c4-train.00000-of-01024.json'}, trust_remote_code=True),
}We provide the code script in scripts/mixtral/run.sh and scripts/qwen/run.sh. Change the setting in those files. Run the script file as follows.
For detailed description for each argument, please see here.
bash scripts/mixtral/run.sh
bash scripts/qwen/run.sh
@misc{chen2025hcsmoe,
title={Retraining-Free Merging of Sparse MoE via Hierarchical Clustering},
author={I-Chun Chen and Hsu-Shen Liu and Wei-Fang Sun and Chen-Hao Chao and Yen-Chang Hsu and Chun-Yi Lee},
year={2025},
booktitle={International Conference on Machine Learning (ICML)}
}
