This repository serves as the supplementary materials reposity for our proposed Mixture of Experts Filter (MoE-F) work published and presented at the ICLR 2025 [OpenReview].
- 🧠 Concept
- 📌 Concrete Example
- 📊 Generating the Results Tables
- 📈 Experts Performance on NIFTY Test Split
- 📝 Citing
- 🙏 Acknowledgements
The following is a conceptual flow showing how MoE-F works:
Dataset The NIFTY dataset for Financial News Headlines paper used for this section's experiments are available via HuggingFace.
Examining a cross-sectional time-window snapshot allows a better understanding.
The below diagram depicts corresponding weighting ranks of the 7 experts corresponding to the 3 randomly sampled week-long trading windows with market mimicking different (bull, bear, neutral) regimes shown above.
The experiments folder contains all expert models' results in response to the NIFTY-LM's test split.
To generate the main results of the paper (in Table 2), run:
./generate_results.sh --model_name "OpenAI" --model_variant "gpt-4o" --seed 42 --average "weighted"
Using the model_names
and model_variants
as desired.
model_names = ["Llama-2", "Llama-2", "Meta-Llama-3", "Meta-Llama-3", "Mixtral-8x7B", "dbrx", "OpenAI"]
model_variants = ["7b-chat-hf", "70b-chat-hf", "8B-Instruct", "70B-Instruct", "Instruct-v0.1", "instruct", "gpt-4o"]
📈 Experts Performance on NIFTY Test Split
Llama-class models (Llama 2,3-[7B, 8B, 70B])
MoE models (Mixtral_7x8B, DBRX) and GPT4o
For scholastic references, please cite our paper as:
@article{saqur2024filtered,
title={Filtered not Mixed: Stochastic Filtering-Based Online Gating for Mixture of Large Language Models},
author={Raeid Saqur and Anastasis Kratsios and Florian Krach and Yannick Limmer and Jacob-Junqi Tian and John Willes and Blanka Horvath and Frank Rudzicz},
year={2024},
eprint={2406.02969},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2406.02969},
}
Raeid Saqur (RS) is supported by Canada NSERC CGS-D Doctoral Grant. Anastasis Kratsios (AK) acknowledges financial support from an NSERC Discovery Grant No.\ RGPIN-2023-04482 and their McMaster Startup Funds. RS and AK acknowledge that resources used in preparing this research were provided, in part, by the Province of Ontario, the Government of Canada through CIFAR, and companies sponsoring the Vector Institute https://vectorinstitute.ai/partnerships/current-partners/. The authors would also like to thank Marshall Wang for helping with reference code for computing DBRX experiments.