docs(qwen35): draft TP2 phased design by Mrtroll486 · Pull Request #450 · openinfer-project/openinfer

Mrtroll486 · 2026-06-24T13:27:10Z

Description

Drafts the Qwen3.5 TP2 design as a Qwen3 TP follow-up rather than a new parallel-runtime proposal.

The doc scopes Qwen3.5 tensor parallelism around reusing the existing Qwen3 controller/worker TP runtime, then splits the work into two phases:

Phase 1: shard full-attention + MLP while keeping linear attention/GDR replicated, so the dense TP path and Qwen3.5 multi-rank runtime can be validated first.
Phase 2: shard linear attention, conv state, GDR recurrent state, and GDR kernels, using vLLM's Qwen3Next/GDN TP contract as the reference.

If this direction looks acceptable and the doc is merged, I plan to open follow-up issues from the phase breakdown in the doc rather than starting with a large implementation PR. I would especially appreciate feedback from the model/runtime owners and maintainers on the phase split, non-goals, acceptance criteria, and the vLLM reference contract before turning the design into implementation tasks.

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update

Checklist

My code follows the style guidelines of this project (see docs/conventions/coding-style.md).
I have performed a self-review of my own code.
I have formatted my commits according to Commitizen conventions.
I have run the local test suite and all tests pass (see CLAUDE.md).

Docs-only; no runtime behavior, kernel code, scheduler code, or tests are changed.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f1433c552b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

xiaguan · 2026-06-25T04:13:36Z

Direction looks good — reusing the Qwen3 runtime and splitting dense vs GDR/recurrent is the right call. Two pieces of feedback:

1. Tighten it. The doc is longer than it needs to be. The six separate non-goal lists and the repeated per-phase acceptance lists carry a lot of redundancy — a reader should get the phase split and the partition contract in about half the length.

2. Don't hard-lock TP=2. Qwen3's TP config is already degree-parametric (Config::local_num_attention_heads(tp), local_q_dim(tp), local_intermediate_size(tp), ...), so write the partition contract as formulas in tp rather than baking in 8 / 2048 / 4608. Make "only TP2 is validated first" a test-scope note and fail-closed on indivisible degrees — instead of "do not support TP>2" as an architectural non-goal. Full-attn (16 q / 4 KV heads) is divisibility-clean through TP4. It's the same code, just keeps the door open and matches what Qwen3 already does.

Mrtroll486 · 2026-06-25T08:50:49Z

Thanks for the review. I tightened the design doc and changed the TP framing to match the Qwen3 runtime better.

What changed:

Collapsed the repeated non-goal / acceptance sections into shorter Boundaries and
validation-scope notes.
Rewrote the partition contract in terms of tp formulas instead of hard-coding TP2
local sizes.
Kept TP=2 only as the first validation target, not as an architectural limit.
Added fail-closed divisibility requirements for candidate TP degrees.
Kept the Qwen3.5-specific q/gate head-pair sharding requirement.
Updated the docs index summary to match the new degree-parametric framing.

I left the implementation plan split intact: Phase 1 validates dense full-attn/MLP TP with replicated linear/GDR, and Phase 2 handles sharded linear attention / GDR state using the vLLM Qwen3Next/GDN contract as the reference.

If this is merged, I’ll open the two follow-up implementation issues myself for Phase 1 and Phase 2.

xiaguan · 2026-06-25T15:58:13Z

Thanks for the iterations — the direction is right. But the implementation side still has too many open questions to lock this in as a committed design doc: the scope of which operators actually change, CUDA Graph capture under TP, and how the NCCL group gets set up are all unresolved here.

Let's open an RFC issue and hash these out there before merging a design note. Could you open one (linked to #446) so we can discuss the implementation details properly?

Mrtroll486 · 2026-06-25T16:38:48Z

Thanks for the suggestion. I agree that I moved a bit too quickly toward a design note before the implementation details were fully clarified.

I’ll open an RFC issue and link it to #446, so we can properly discuss the operator scope, CUDA Graph capture under TP, and NCCL group setup before locking in the design.

docs(qwen35): draft TP2 phased design

f1433c5

chatgpt-codex-connector Bot reviewed Jun 24, 2026

View reviewed changes

Comment thread docs/models/qwen35/tp-design.md Outdated

Comment thread docs/models/qwen35/tp-design.md Outdated

doc(qwen35): removed placeholder commands and clarify TP q-gate sharding

aff47fd

docs(qwen35): tighten TP design around degree-parametric contract

e91b6f9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs(qwen35): draft TP2 phased design#450

docs(qwen35): draft TP2 phased design#450
Mrtroll486 wants to merge 3 commits into
openinfer-project:mainfrom
Mrtroll486:docs/qwen35-tp-design

Mrtroll486 commented Jun 24, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

xiaguan commented Jun 25, 2026

Uh oh!

Mrtroll486 commented Jun 25, 2026

Uh oh!

xiaguan commented Jun 25, 2026

Uh oh!

Mrtroll486 commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Mrtroll486 commented Jun 24, 2026

Description

Type of Change

Checklist

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

xiaguan commented Jun 25, 2026

Uh oh!

Mrtroll486 commented Jun 25, 2026

Uh oh!

xiaguan commented Jun 25, 2026

Uh oh!

Mrtroll486 commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants