Skip to content

Add Hunyuan dense V1 (Hunyuan-MT-7B) support#4

Open
beshkenadze wants to merge 1 commit into
mainfrom
hunyuan-mt-7b
Open

Add Hunyuan dense V1 (Hunyuan-MT-7B) support#4
beshkenadze wants to merge 1 commit into
mainfrom
hunyuan-mt-7b

Conversation

@beshkenadze

Copy link
Copy Markdown
Owner

What

Adds Tencent's Hunyuan dense V1 architecture (hunyuan_v1_dense / HunYuanDenseV1ForCausalLM), used by Hunyuan-MT-7B (translation-tuned) and Hunyuan-7B-Instruct. Ported from mlx-lm's hunyuan_v1_dense.py.

Architecture

A Llama-family dense transformer, closest to the existing Qwen3 path:

  • Per-head QK RMSNorm (query_layernorm/key_layernorm), GQA, SwiGLU MLP, pre/post RMSNorm blocks.
  • Tied embeddings (sanitize drops the tied lm_head).
  • DynamicNTKAlphaRoPE — the one new piece: rescales the RoPE base once by alpha^(dim/(dim-2)) and reuses the existing freqs-based fast-RoPE path (no sequence-length-dependent resizing). Mirrors mlx-lm.

Changes

  • Libraries/MLXLMCommon/RoPEUtils.swift: DynamicNTKAlphaRoPE.
  • Libraries/MLXLLM/Models/Hunyuan.swift: HunyuanModel / HunyuanConfiguration (flat config decode; accepts both head_dim and attention_head_dim; conditional qk-norm).
  • Libraries/MLXLLM/LLMModelFactory.swift: register hunyuan_v1_dense and the hunyuan alias (the older Instruct checkpoint reports model_type: hunyuan, dense-equivalent with num_experts=1); presets hunyuan_mt_7b_4bit/8bit + hunyuan_7b_instruct_4bit.
  • Tests/MLXLMTests/HunyuanTests.swift: 5 unit tests (config decode, sanitize, tiny forward, dynamic-RoPE alpha, presets).
  • scripts/: hunyuan_reference.sh (CUDA greedy capture) and hunyuan_convert_mlx.sh (MLX 4/8-bit conversion).

Validation

  • 5/5 unit tests pass; package builds.

  • End-to-end byte-identical parity against the mlx-lm reference on a locally-converted Hunyuan-MT-7B-4bit, greedy:

    Prompt: Translate the following segment into Chinese, without additional explanation.\n\nIt's on the house.

    output
    mlx-lm (hunyuan_v1_dense) 这顿饭由我们公司承担费用。
    Swift HunyuanModel 这顿饭由我们公司承担费用。

    Identical — validates config decode, weight loading, qk-norm, GQA, dynamic RoPE, tied embeddings, and the chat template via swift-jinja.

Notes

  • No mlx-community build of Hunyuan-MT-7B exists yet; scripts/hunyuan_convert_mlx.sh produces the 4/8-bit weights the presets point at (tokenizer.json ships with the source, so it loads in swift-transformers).
  • The older mlx-community/Hunyuan-7B-Instruct-4bit ships only hy.tiktoken (no tokenizer.json) so it does not load in swift — Hunyuan-MT-7B is the intended target.

@beshkenadze

Copy link
Copy Markdown
Owner Author

📌 Upstreaming note (ml-explore)

This PR bundles two layers for convenience in the fork, but upstream it must go as two separate PRs — the core RoPE addition lives in MLXLMCommon, the model in MLXLLM, and a maintainer will want them reviewable independently:

  1. Core (MLXLMCommon)DynamicNTKAlphaRoPE in RoPEUtils.swift (a reusable RoPE variant, alongside the existing YarnRoPE/Llama3RoPE/ProportionalRoPE).
  2. Model (MLXLLM)Models/Hunyuan.swift + LLMModelFactory registration (hunyuan_v1_dense + MT-7B presets) + Tests/MLXLMTests/HunyuanTests.swift. Depends on Add TranslateGemma support (Gemma 3 translation template) + tests & prefix-cache benchmark #1.

Each should branch off ml-explore/main independently (PR #2 should land first / be cherry-picked before PR #1).

Tencent's Hunyuan dense V1 architecture (HunYuanDenseV1ForCausalLM), used by the
translation models Hunyuan-MT-7B and Hy-MT2-7B. A Llama-family dense transformer,
closest to the existing Qwen3 path:

- Per-head QK RMSNorm, GQA, SwiGLU MLP, pre/post RMSNorm blocks.
- Tied embeddings (sanitize drops the tied lm_head).
- DynamicNTKAlphaRoPE: rescales the RoPE base once by alpha^(dim/(dim-2)) and reuses
  the existing freqs-based fast-RoPE path (no sequence-length-dependent resizing).

Changes:
- Libraries/MLXLMCommon/RoPEUtils.swift: DynamicNTKAlphaRoPE.
- Libraries/MLXLLM/Models/Hunyuan.swift: HunyuanModel / HunyuanConfiguration (flat
  config decode; accepts head_dim or attention_head_dim; conditional qk-norm).
- Libraries/MLXLLM/LLMModelFactory.swift: register hunyuan_v1_dense; presets
  hunyuan_mt_7b_4bit/8bit and hy_mt2_7b_4bit/8bit.
- Tests/MLXLMTests/HunyuanTests.swift: config decode, attention_head_dim alias,
  sanitize, tiny forward, dynamic-RoPE alpha, presets.

Validated end-to-end against the mlx-lm reference (byte-identical greedy) on locally
converted 4-bit weights for both models.
@beshkenadze

Copy link
Copy Markdown
Owner Author

✅ Upstreamed (correction to the note above)

Submitted to ml-explore/mlx-swift-lm as two separate PRs — the split is per-feature (each branched off upstream main, not the fork), which is cleaner than the core/model split I sketched earlier:

Both build + pass tests on the current upstream base.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant