Retraining-Free Merging of Sparse Mixture-of-Experts via Hierarchical Clustering

In this work, we propose HC-SMoE (Hierarchical Clustering for Sparsely activated Mixture of Experts), a task-agnostic expert merging framework that reduces SMoE model parameters without retraining. Unlike previous methods, HC-SMoE employs hierarchical clustering based on expert outputs, ensuring that the merging process is unaffected by routing decisions. This output-based clustering strategy captures functional similarities between experts, offering an adaptable solution for models with numerous experts. We validate our approach through extensive experiments on eight zero-shot language tasks and demonstrate its effectiveness in large-scale SMoE models like Qwen and Mixtral. Our results show that HC-SMoE consistently achieves strong performance, highlighting its potential for real-world deployment.

Updates

[2025/05/01] 🔥 Our paper, Retraining-Free Merging of Sparse Mixture-of-Experts via Hierarchical Clustering, has been accepted by ICML 2025!

TODO

Release accepted version of the paper.

Code Description

This repository is written based on the codes in the GitHub.

Setup

Install basic packages. pip install -r requirements.txt
Install lm-eval. lm-evaluation-harness

Dataset Preparation

Please download the C4 training data c4-train.00000-of-01024.json from allenai/c4.

Then save it to path hcsmoe/data/c4-train.00000-of-01024.json.

Or you can assign the path in hcsmoe/evaluation/minipile.py.

DATASETS = {
    'c4': lambda: load_dataset('json', data_files={'train': 'hcsmoe/data/c4-train.00000-of-01024.json'}, trust_remote_code=True),
}

Experiments

We provide the code script in scripts/mixtral/run.sh and scripts/qwen/run.sh. Change the setting in those files. Run the script file as follows.

For detailed description for each argument, please see here.

bash scripts/mixtral/run.sh
bash scripts/qwen/run.sh

Citation

@misc{chen2025hcsmoe,
      title={Retraining-Free Merging of Sparse MoE via Hierarchical Clustering}, 
      author={I-Chun Chen and Hsu-Shen Liu and Wei-Fang Sun and Chen-Hao Chao and Yen-Chang Hsu and Chun-Yi Lee},
      year={2025},
      booktitle={International Conference on Machine Learning (ICML)}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
hcsmoe		hcsmoe
scripts		scripts
static		static
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Retraining-Free Merging of Sparse Mixture-of-Experts via Hierarchical Clustering

Updates

TODO

Code Description

Setup

Dataset Preparation

Experiments

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Retraining-Free Merging of Sparse Mixture-of-Experts via Hierarchical Clustering

Updates

TODO

Code Description

Setup

Dataset Preparation

Experiments

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages