Skip to content

Add Hunyuan dense V1 (hunyuan_v1_dense): Hunyuan-MT-7B and Hy-MT2-7B#347

Open
beshkenadze wants to merge 1 commit into
ml-explore:mainfrom
beshkenadze:upstream-hunyuan
Open

Add Hunyuan dense V1 (hunyuan_v1_dense): Hunyuan-MT-7B and Hy-MT2-7B#347
beshkenadze wants to merge 1 commit into
ml-explore:mainfrom
beshkenadze:upstream-hunyuan

Conversation

@beshkenadze

Copy link
Copy Markdown

What

Adds Tencent's Hunyuan dense V1 architecture (hunyuan_v1_dense / HunYuanDenseV1ForCausalLM), used by the translation models Hunyuan-MT-7B and Hy-MT2-7B. Ported from mlx-lm's hunyuan_v1_dense.py.

Architecture

A Llama-family dense transformer, closest to the existing Qwen3 path:

  • Per-head QK RMSNorm, GQA, SwiGLU MLP, pre/post RMSNorm blocks.
  • Tied embeddings (sanitize drops the tied lm_head).
  • DynamicNTKAlphaRoPE — the one new utility: rescales the RoPE base once by alpha^(dim/(dim-2)) and reuses the existing freqs-based fast-RoPE path (no sequence-length-dependent resizing). Lives alongside YarnRoPE/Llama3RoPE/ProportionalRoPE in RoPEUtils.swift.

Changes

  • Libraries/MLXLMCommon/RoPEUtils.swift: DynamicNTKAlphaRoPE.
  • Libraries/MLXLLM/Models/Hunyuan.swift: HunyuanModel / HunyuanConfiguration (flat config decode; accepts head_dim or attention_head_dim; conditional qk-norm).
  • Libraries/MLXLLM/LLMModelFactory.swift: register hunyuan_v1_dense; presets hunyuan_mt_7b_4bit/8bit and hy_mt2_7b_4bit/8bit.
  • Tests/MLXLMTests/HunyuanTests.swift: config decode, attention_head_dim alias, sanitize, tiny forward, dynamic-RoPE alpha, presets.

Validation

  • 6/6 unit tests pass; package builds.

  • End-to-end byte-identical parity vs the mlx-lm reference (greedy) on locally-converted 4-bit weights for both models:

    Prompt: Translate the following segment into Chinese, without additional explanation.\n\nIt's on the house.

    model mlx-lm Swift HunyuanModel
    Hunyuan-MT-7B-4bit 这顿饭由我们公司承担费用。 这顿饭由我们公司承担费用。
    Hy-MT2-7B-4bit 这顿算店的。 这顿算店的。

    Identical — validates config decode, weight loading, qk-norm, GQA, dynamic RoPE, tied embeddings, and the chat template.

MLX weights: mlx-community/Hunyuan-MT-7B-{4bit,8bit} and mlx-community/Hy-MT2-7B-{4bit,8bit}.

Tencent's Hunyuan dense V1 architecture (HunYuanDenseV1ForCausalLM), used by the
translation models Hunyuan-MT-7B and Hy-MT2-7B. A Llama-family dense transformer,
closest to the existing Qwen3 path:

- Per-head QK RMSNorm, GQA, SwiGLU MLP, pre/post RMSNorm blocks.
- Tied embeddings (sanitize drops the tied lm_head).
- DynamicNTKAlphaRoPE: rescales the RoPE base once by alpha^(dim/(dim-2)) and reuses
  the existing freqs-based fast-RoPE path (no sequence-length-dependent resizing).

Changes:
- Libraries/MLXLMCommon/RoPEUtils.swift: DynamicNTKAlphaRoPE.
- Libraries/MLXLLM/Models/Hunyuan.swift: HunyuanModel / HunyuanConfiguration (flat
  config decode; accepts head_dim or attention_head_dim; conditional qk-norm).
- Libraries/MLXLLM/LLMModelFactory.swift: register hunyuan_v1_dense; presets
  hunyuan_mt_7b_4bit/8bit and hy_mt2_7b_4bit/8bit.
- Tests/MLXLMTests/HunyuanTests.swift: config decode, attention_head_dim alias,
  sanitize, tiny forward, dynamic-RoPE alpha, presets.

Validated end-to-end against the mlx-lm reference (byte-identical greedy) on locally
converted 4-bit weights for both models.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant