Skip to content

Add non-architecture Huggingface conversion parameters #166

Open
@tscholak

Description

@tscholak

🐞 Describe the Bug

When converting models config options not included in the architecture config are not imported from the Hugging Face model's config.json.

This creates an unexpected and undocumented requirement for manual configuration, which can lead to costly mistakes.

The following critical options are affected:

  • window_size and max_windows_layers for models trained with windowed attention (e.g. Qwen 2), see Qwen2 converter #163.
  • router_aux_loss_coef for MoEs such as Mixtral.

Currently, the load-from-HF-model feature suggests seamless integration, but this bug prevents complete and accurate model conversion. Users are likely to assume the conversion will "just work" and may unknowingly train models with incorrect configurations.

🔄 Steps to Reproduce

  1. Load a HF model using Fast-LLM:
    Use a HF model that requires non-architecture-specific parameters (e.g., window_size for sliding window attention).

  2. Observe missing configurations:
    Check the output model configuration. Notice that parameters not included in the architecture config are missing or set to default values, potentially breaking the model.

🎯 Expected Behavior

Fast-LLM should correctly import all relevant configuration options from the Hugging Face config.json, not just those in the architecture configuration. This ensures that models are fully converted and behave as expected, without requiring manual intervention or hidden knowledge about configuration quirks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions