Add Hunyuan dense V1 (hunyuan_v1_dense): Hunyuan-MT-7B and Hy-MT2-7B by beshkenadze · Pull Request #347 · ml-explore/mlx-swift-lm

beshkenadze · 2026-06-13T17:37:27Z

What

Adds Tencent's Hunyuan dense V1 architecture (hunyuan_v1_dense / HunYuanDenseV1ForCausalLM), used by the translation models Hunyuan-MT-7B and Hy-MT2-7B. Ported from mlx-lm's hunyuan_v1_dense.py.

Architecture

A Llama-family dense transformer, closest to the existing Qwen3 path:

Per-head QK RMSNorm, GQA, SwiGLU MLP, pre/post RMSNorm blocks.
Tied embeddings (sanitize drops the tied lm_head).
DynamicNTKAlphaRoPE — the one new utility: rescales the RoPE base once by alpha^(dim/(dim-2)) and reuses the existing freqs-based fast-RoPE path (no sequence-length-dependent resizing). Lives alongside YarnRoPE/Llama3RoPE/ProportionalRoPE in RoPEUtils.swift.

Changes

Libraries/MLXLMCommon/RoPEUtils.swift: DynamicNTKAlphaRoPE.
Libraries/MLXLLM/Models/Hunyuan.swift: HunyuanModel / HunyuanConfiguration (flat config decode; accepts head_dim or attention_head_dim; conditional qk-norm).
Libraries/MLXLLM/LLMModelFactory.swift: register hunyuan_v1_dense; presets hunyuan_mt_7b_4bit/8bit and hy_mt2_7b_4bit/8bit.
Tests/MLXLMTests/HunyuanTests.swift: config decode, attention_head_dim alias, sanitize, tiny forward, dynamic-RoPE alpha, presets.

Validation

6/6 unit tests pass; package builds.
End-to-end byte-identical parity vs the mlx-lm reference (greedy) on locally-converted 4-bit weights for both models:

Prompt: Translate the following segment into Chinese, without additional explanation.\n\nIt's on the house.

model mlx-lm Swift HunyuanModel

Hunyuan-MT-7B-4bit 这顿饭由我们公司承担费用。这顿饭由我们公司承担费用。

Hy-MT2-7B-4bit 这顿算店的。这顿算店的。

Identical — validates config decode, weight loading, qk-norm, GQA, dynamic RoPE, tied embeddings, and the chat template.

MLX weights: mlx-community/Hunyuan-MT-7B-{4bit,8bit} and mlx-community/Hy-MT2-7B-{4bit,8bit}.

Tencent's Hunyuan dense V1 architecture (HunYuanDenseV1ForCausalLM), used by the translation models Hunyuan-MT-7B and Hy-MT2-7B. A Llama-family dense transformer, closest to the existing Qwen3 path: - Per-head QK RMSNorm, GQA, SwiGLU MLP, pre/post RMSNorm blocks. - Tied embeddings (sanitize drops the tied lm_head). - DynamicNTKAlphaRoPE: rescales the RoPE base once by alpha^(dim/(dim-2)) and reuses the existing freqs-based fast-RoPE path (no sequence-length-dependent resizing). Changes: - Libraries/MLXLMCommon/RoPEUtils.swift: DynamicNTKAlphaRoPE. - Libraries/MLXLLM/Models/Hunyuan.swift: HunyuanModel / HunyuanConfiguration (flat config decode; accepts head_dim or attention_head_dim; conditional qk-norm). - Libraries/MLXLLM/LLMModelFactory.swift: register hunyuan_v1_dense; presets hunyuan_mt_7b_4bit/8bit and hy_mt2_7b_4bit/8bit. - Tests/MLXLMTests/HunyuanTests.swift: config decode, attention_head_dim alias, sanitize, tiny forward, dynamic-RoPE alpha, presets. Validated end-to-end against the mlx-lm reference (byte-identical greedy) on locally converted 4-bit weights for both models.

beshkenadze mentioned this pull request Jun 13, 2026

Add Hunyuan dense V1 (Hunyuan-MT-7B) support beshkenadze/mlx-swift-lm#4

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Hunyuan dense V1 (hunyuan_v1_dense): Hunyuan-MT-7B and Hy-MT2-7B#347

Add Hunyuan dense V1 (hunyuan_v1_dense): Hunyuan-MT-7B and Hy-MT2-7B#347
beshkenadze wants to merge 1 commit into
ml-explore:mainfrom
beshkenadze:upstream-hunyuan

beshkenadze commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

model	mlx-lm	Swift `HunyuanModel`
Hunyuan-MT-7B-4bit	这顿饭由我们公司承担费用。	这顿饭由我们公司承担费用。
Hy-MT2-7B-4bit	这顿算店的。	这顿算店的。

Conversation

beshkenadze commented Jun 13, 2026

What

Architecture

Changes

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant