Add non-architecture Huggingface conversion parameters

# 🐞 **Describe the Bug**

When converting models config options not included in the architecture config are not imported from the Hugging Face model's `config.json`.

This creates an unexpected and undocumented requirement for manual configuration, which can lead to costly mistakes.

The following critical options are affected:

* `window_size` and `max_windows_layers` for models trained with windowed attention (e.g. Qwen 2), see #163.
* `router_aux_loss_coef` for MoEs such as Mixtral.

Currently, the load-from-HF-model feature suggests seamless integration, but this bug prevents complete and accurate model conversion. Users are likely to assume the conversion will "just work" and may unknowingly train models with incorrect configurations.

# 🔄 **Steps to Reproduce**

1. **Load a HF model using Fast-LLM:**  
   Use a HF model that requires non-architecture-specific parameters (e.g., `window_size` for sliding window attention).

2. **Observe missing configurations:**  
   Check the output model configuration. Notice that parameters not included in the architecture config are missing or set to default values, potentially breaking the model.

# 🎯 **Expected Behavior**

Fast-LLM should correctly import all relevant configuration options from the Hugging Face `config.json`, not just those in the architecture configuration. This ensures that models are fully converted and behave as expected, without requiring manual intervention or hidden knowledge about configuration quirks.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add non-architecture Huggingface conversion parameters #166

🐞 Describe the Bug

🔄 Steps to Reproduce

🎯 Expected Behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add non-architecture Huggingface conversion parameters #166

Description

🐞 Describe the Bug

🔄 Steps to Reproduce

🎯 Expected Behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions