Filtered not Mixed: Filtering-Based Online Gating for Mixture of Large Language Models

This repository serves as the supplementary materials reposity for our proposed Mixture of Experts Filter (MoE-F) work published and presented at the ICLR 2025 [OpenReview].

📋 Table of Contents

🧠 Concept
📌 Concrete Example
- Financial Market Movement Task
- Expert Weights Heatmap
📊 Generating the Results Tables
📈 Experts Performance on NIFTY Test Split
📝 Citing
🙏 Acknowledgements

🧠 Concept

The following is a conceptual flow showing how MoE-F works:

📌 Concrete Example

Financial Market Movement Task

Dataset The NIFTY dataset for Financial News Headlines paper used for this section's experiments are available via HuggingFace.

Examining a cross-sectional time-window snapshot allows a better understanding.

Expert Weights Heatmap

The below diagram depicts corresponding weighting ranks of the 7 experts corresponding to the 3 randomly sampled week-long trading windows with market mimicking different (bull, bear, neutral) regimes shown above.

📊 Generating the Results Tables

The experiments folder contains all expert models' results in response to the NIFTY-LM's test split.

To generate the main results of the paper (in Table 2), run:

./generate_results.sh --model_name "OpenAI" --model_variant "gpt-4o" --seed 42 --average "weighted"

Using the model_names and model_variants as desired.

model_names = ["Llama-2", "Llama-2", "Meta-Llama-3", "Meta-Llama-3", "Mixtral-8x7B", "dbrx", "OpenAI"]
model_variants = ["7b-chat-hf", "70b-chat-hf", "8B-Instruct", "70B-Instruct", "Instruct-v0.1", "instruct", "gpt-4o"]

📈 Experts Performance on NIFTY Test Split

Llama-class models (Llama 2,3-[7B, 8B, 70B])

MoE models (Mixtral_7x8B, DBRX) and GPT4o

📝 Citing

For scholastic references, please cite our paper as:

@article{saqur2024filtered,
      title={Filtered not Mixed: Stochastic Filtering-Based Online Gating for Mixture of Large Language Models}, 
      author={Raeid Saqur and Anastasis Kratsios and Florian Krach and Yannick Limmer and Jacob-Junqi Tian and John Willes and Blanka Horvath and Frank Rudzicz},
      year={2024},
      eprint={2406.02969},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2406.02969}, 
}

🙏 Acknowledgements

Raeid Saqur (RS) is supported by Canada NSERC CGS-D Doctoral Grant. Anastasis Kratsios (AK) acknowledges financial support from an NSERC Discovery Grant No.\ RGPIN-2023-04482 and their McMaster Startup Funds. RS and AK acknowledge that resources used in preparing this research were provided, in part, by the Province of Ontario, the Government of Canada through CIFAR, and companies sponsoring the Vector Institute https://vectorinstitute.ai/partnerships/current-partners/. The authors would also like to thank Marshall Wang for helping with reference code for computing DBRX experiments.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
MoE-F_supplementary.materials/experiments		MoE-F_supplementary.materials/experiments
imgs		imgs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
filtering_moe_eval.py		filtering_moe_eval.py
filtering_moe_eval_all_experts_average_micro.txt		filtering_moe_eval_all_experts_average_micro.txt
filtering_moe_eval_all_experts_average_weighted.txt		filtering_moe_eval_all_experts_average_weighted.txt
generate_results.sh		generate_results.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Filtered not Mixed: Filtering-Based Online Gating for Mixture of Large Language Models

📋 Table of Contents

🧠 Concept

📌 Concrete Example

Financial Market Movement Task

Expert Weights Heatmap

📊 Generating the Results Tables

📈 Experts Performance on NIFTY Test Split

📝 Citing

🙏 Acknowledgements

About

Releases

Packages

Languages

License

raeidsaqur/moe-f

Folders and files

Latest commit

History

Repository files navigation

Filtered not Mixed: Filtering-Based Online Gating for Mixture of Large Language Models

📋 Table of Contents

🧠 Concept

📌 Concrete Example

Financial Market Movement Task

Expert Weights Heatmap

📊 Generating the Results Tables

📈 Experts Performance on NIFTY Test Split

📝 Citing

🙏 Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages