Add Hunyuan dense V1 (Hunyuan-MT-7B) support by beshkenadze · Pull Request #4 · beshkenadze/mlx-swift-lm

beshkenadze · 2026-06-13T15:22:33Z

What

Adds Tencent's Hunyuan dense V1 architecture (hunyuan_v1_dense / HunYuanDenseV1ForCausalLM), used by Hunyuan-MT-7B (translation-tuned) and Hunyuan-7B-Instruct. Ported from mlx-lm's hunyuan_v1_dense.py.

Architecture

A Llama-family dense transformer, closest to the existing Qwen3 path:

Per-head QK RMSNorm (query_layernorm/key_layernorm), GQA, SwiGLU MLP, pre/post RMSNorm blocks.
Tied embeddings (sanitize drops the tied lm_head).
DynamicNTKAlphaRoPE — the one new piece: rescales the RoPE base once by alpha^(dim/(dim-2)) and reuses the existing freqs-based fast-RoPE path (no sequence-length-dependent resizing). Mirrors mlx-lm.

Changes

Libraries/MLXLMCommon/RoPEUtils.swift: DynamicNTKAlphaRoPE.
Libraries/MLXLLM/Models/Hunyuan.swift: HunyuanModel / HunyuanConfiguration (flat config decode; accepts both head_dim and attention_head_dim; conditional qk-norm).
Libraries/MLXLLM/LLMModelFactory.swift: register hunyuan_v1_dense and the hunyuan alias (the older Instruct checkpoint reports model_type: hunyuan, dense-equivalent with num_experts=1); presets hunyuan_mt_7b_4bit/8bit + hunyuan_7b_instruct_4bit.
Tests/MLXLMTests/HunyuanTests.swift: 5 unit tests (config decode, sanitize, tiny forward, dynamic-RoPE alpha, presets).
scripts/: hunyuan_reference.sh (CUDA greedy capture) and hunyuan_convert_mlx.sh (MLX 4/8-bit conversion).

Validation

5/5 unit tests pass; package builds.
End-to-end byte-identical parity against the mlx-lm reference on a locally-converted Hunyuan-MT-7B-4bit, greedy:

Prompt: Translate the following segment into Chinese, without additional explanation.\n\nIt's on the house.

output

mlx-lm (hunyuan_v1_dense) 这顿饭由我们公司承担费用。

Swift HunyuanModel 这顿饭由我们公司承担费用。

Identical — validates config decode, weight loading, qk-norm, GQA, dynamic RoPE, tied embeddings, and the chat template via swift-jinja.

Notes

No mlx-community build of Hunyuan-MT-7B exists yet; scripts/hunyuan_convert_mlx.sh produces the 4/8-bit weights the presets point at (tokenizer.json ships with the source, so it loads in swift-transformers).
The older mlx-community/Hunyuan-7B-Instruct-4bit ships only hy.tiktoken (no tokenizer.json) so it does not load in swift — Hunyuan-MT-7B is the intended target.

beshkenadze · 2026-06-13T16:16:54Z

📌 Upstreaming note (ml-explore)

This PR bundles two layers for convenience in the fork, but upstream it must go as two separate PRs — the core RoPE addition lives in MLXLMCommon, the model in MLXLLM, and a maintainer will want them reviewable independently:

Core (MLXLMCommon) — DynamicNTKAlphaRoPE in RoPEUtils.swift (a reusable RoPE variant, alongside the existing YarnRoPE/Llama3RoPE/ProportionalRoPE).
Model (MLXLLM) — Models/Hunyuan.swift + LLMModelFactory registration (hunyuan_v1_dense + MT-7B presets) + Tests/MLXLMTests/HunyuanTests.swift. Depends on Add TranslateGemma support (Gemma 3 translation template) + tests & prefix-cache benchmark #1.

Each should branch off ml-explore/main independently (PR #2 should land first / be cherry-picked before PR #1).

Tencent's Hunyuan dense V1 architecture (HunYuanDenseV1ForCausalLM), used by the translation models Hunyuan-MT-7B and Hy-MT2-7B. A Llama-family dense transformer, closest to the existing Qwen3 path: - Per-head QK RMSNorm, GQA, SwiGLU MLP, pre/post RMSNorm blocks. - Tied embeddings (sanitize drops the tied lm_head). - DynamicNTKAlphaRoPE: rescales the RoPE base once by alpha^(dim/(dim-2)) and reuses the existing freqs-based fast-RoPE path (no sequence-length-dependent resizing). Changes: - Libraries/MLXLMCommon/RoPEUtils.swift: DynamicNTKAlphaRoPE. - Libraries/MLXLLM/Models/Hunyuan.swift: HunyuanModel / HunyuanConfiguration (flat config decode; accepts head_dim or attention_head_dim; conditional qk-norm). - Libraries/MLXLLM/LLMModelFactory.swift: register hunyuan_v1_dense; presets hunyuan_mt_7b_4bit/8bit and hy_mt2_7b_4bit/8bit. - Tests/MLXLMTests/HunyuanTests.swift: config decode, attention_head_dim alias, sanitize, tiny forward, dynamic-RoPE alpha, presets. Validated end-to-end against the mlx-lm reference (byte-identical greedy) on locally converted 4-bit weights for both models.

beshkenadze · 2026-06-13T17:38:48Z

✅ Upstreamed (correction to the note above)

Submitted to ml-explore/mlx-swift-lm as two separate PRs — the split is per-feature (each branched off upstream main, not the fork), which is cleaner than the core/model split I sketched earlier:

Add Hunyuan dense V1 (hunyuan_v1_dense): Hunyuan-MT-7B and Hy-MT2-7B ml-explore/mlx-swift-lm#347 — Hunyuan dense V1 (this PR's contents: model + DynamicNTKAlphaRoPE + presets + tests), kept as one PR since the RoPE util ships with its only consumer.
Gemma 3: chunked prompt prefill, skip lm_head on prompt positions ml-explore/mlx-swift-lm#346 — the independent Gemma 3 chunked-prefill speedup.

Both build + pass tests on the current upstream base.

beshkenadze force-pushed the hunyuan-mt-7b branch from 0124bb3 to 264fe11 Compare June 13, 2026 16:46

beshkenadze force-pushed the hunyuan-mt-7b branch from 264fe11 to c7623b6 Compare June 13, 2026 17:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Hunyuan dense V1 (Hunyuan-MT-7B) support#4

Add Hunyuan dense V1 (Hunyuan-MT-7B) support#4
beshkenadze wants to merge 1 commit into
mainfrom
hunyuan-mt-7b

beshkenadze commented Jun 13, 2026

Uh oh!

beshkenadze commented Jun 13, 2026

Uh oh!

beshkenadze commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	output
mlx-lm (`hunyuan_v1_dense`)	这顿饭由我们公司承担费用。
Swift `HunyuanModel`	这顿饭由我们公司承担费用。

Conversation

beshkenadze commented Jun 13, 2026

What

Architecture

Changes

Validation

Notes

Uh oh!

beshkenadze commented Jun 13, 2026

📌 Upstreaming note (ml-explore)

Uh oh!

beshkenadze commented Jun 13, 2026

✅ Upstreamed (correction to the note above)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant