Skip to content

Supplementary materials for the paper: **Filtered not Mixed: Stochastic Filtering-Based Online Gating for Mixture of Large Language Models**

License

Notifications You must be signed in to change notification settings

raeidsaqur/moe-f

Repository files navigation

Filtered not Mixed: Filtering-Based Online Gating for Mixture of Large Language Models

This repository serves as the supplementary materials reposity for our proposed Mixture of Experts Filter (MoE-F) work published and presented at the ICLR 2025 [OpenReview].

📋 Table of Contents

🧠 Concept

The following is a conceptual flow showing how MoE-F works: Mixture of Experts

📌 Concrete Example

Financial Market Movement Task

Dataset The NIFTY dataset for Financial News Headlines paper used for this section's experiments are available via HuggingFace.

Examining a cross-sectional time-window snapshot allows a better understanding.

Market Movement Plot

Expert Weights Heatmap

The below diagram depicts corresponding weighting ranks of the 7 experts corresponding to the 3 randomly sampled week-long trading windows with market mimicking different (bull, bear, neutral) regimes shown above.

Expert Weights Heatmap

📊 Generating the Results Tables

The experiments folder contains all expert models' results in response to the NIFTY-LM's test split.

To generate the main results of the paper (in Table 2), run:

./generate_results.sh --model_name "OpenAI" --model_variant "gpt-4o" --seed 42 --average "weighted"

Using the model_names and model_variants as desired.

model_names = ["Llama-2", "Llama-2", "Meta-Llama-3", "Meta-Llama-3", "Mixtral-8x7B", "dbrx", "OpenAI"]
model_variants = ["7b-chat-hf", "70b-chat-hf", "8B-Instruct", "70B-Instruct", "Instruct-v0.1", "instruct", "gpt-4o"]

📈 Experts Performance on NIFTY Test Split

Llama-class models (Llama 2,3-[7B, 8B, 70B])

Llama-class models' Confusion Matrices

MoE models (Mixtral_7x8B, DBRX) and GPT4o

MoEs and GPT4o models' Confusion Matrices

📝 Citing

For scholastic references, please cite our paper as:

@article{saqur2024filtered,
      title={Filtered not Mixed: Stochastic Filtering-Based Online Gating for Mixture of Large Language Models}, 
      author={Raeid Saqur and Anastasis Kratsios and Florian Krach and Yannick Limmer and Jacob-Junqi Tian and John Willes and Blanka Horvath and Frank Rudzicz},
      year={2024},
      eprint={2406.02969},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2406.02969}, 
}

🙏 Acknowledgements

Raeid Saqur (RS) is supported by Canada NSERC CGS-D Doctoral Grant. Anastasis Kratsios (AK) acknowledges financial support from an NSERC Discovery Grant No.\ RGPIN-2023-04482 and their McMaster Startup Funds. RS and AK acknowledge that resources used in preparing this research were provided, in part, by the Province of Ontario, the Government of Canada through CIFAR, and companies sponsoring the Vector Institute https://vectorinstitute.ai/partnerships/current-partners/. The authors would also like to thank Marshall Wang for helping with reference code for computing DBRX experiments.

About

Supplementary materials for the paper: **Filtered not Mixed: Stochastic Filtering-Based Online Gating for Mixture of Large Language Models**

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published