chore(qwen3): single-source split-KV config and pin warmup, measure report GEMMs under the production policy by FeathBow · Pull Request #473 · openinfer-project/openinfer

FeathBow · 2026-06-29T18:00:47Z

Description

Refs #414, #435. Follows #462.

Two cleanups on the Qwen3 decode path, no intended change to production serving.

Before

The split-KV decode config (chunk-size formula, per-request cap, and a split_kv_* label) was spread across runtime constants, a duplicated formula, and a hardcoded split_kv_256x64 label at several call sites, which could silently desync.
qwen3_model_report timed decode projection GEMMs in a fresh untuned context, so under the default Tuned policy it measured plain GemmEx, not the algo production runs.

After

Split-KV config is single-sourced: SplitKvConfig (formula/label/parse) lives in the qwen3 crate (src/split_kv.rs) and openinfer-core carries a typed PagedDecodePath; the trace records the runtime-resolved chunk/cap as attrs instead of a hardcoded label, which the report reads back, failing loud if missing or divergent.
The report routes projection GEMM measures through the production --policy and launch_gemm: Pin/PerToken are measured faithfully; Tuned is flagged unfaithful_gemmex and excluded from the totals (pointing at --policy pin). It records measured_split_kv and a test asserts it equals the recorded attrs, so the measure cannot silently re-derive chunk/cap from kv_len. The default output path is policy-keyed.

Test Env

test suite passed on sm_89, x86_64.

Type of Change

Chore (non-breaking change which fixes an issue)

xiaguan · 2026-06-30T15:11:18Z

Core move is right: routing the report's projection-GEMM measure through numeric_policy() and replaying split-KV scalars verbatim from synced_split_kv → PagedDecodePath attrs is a clean single-writer/single-reader channel. Problem is the two tests are re-verifying invariants the types already own.

`pin_trace_chunk_size` — delete

The invariant ("trace records the resolved chunk, not a kv_len re-derivation") is already in PagedDecodePath::SplitKv { chunk_size, cap } + the one-write/one-read synced_split_kv. The test is a full GPU + safetensors run verifying a field copy. The large = 8192 pick also hides assumptions about max_position_embeddings that break silently when max_pos / 256 == 8192 / 64.

`report_gemm_faithful` — push most of it into types

The Tuned→unfaithful_gemmex, Pin/PerToken→totals, total_is_partial chain is a runtime if over an ad-hoc string check — a contract the type system should own. Dispatch measure_catalog on policy so a Tuned GEMM measure returns Excluded, not LatencyStats; total_is_partial then falls out of by_op, not a post-hoc coverage_rows scan.

What's left of the test then is pin_served > 0 — a genuine runtime observation the compiler cannot reach, worth its CI cost. Note the current PerToken branch is a false-positive gate: it asserts pin_served == 0 + measured but never observes that PerToken served anything PerToken-specific, so it would pass for a --policy per-token silently falling back to GemmEx.

Minor

schema: 5 jumps 3 versions; note "local-only trials, shipping at 5" or bump one.
AttentionDecodeCase::new(batch, kv_len) is dead and inherits the now-drifted DEFAULT_SPLIT_KV_CONFIG. Delete it.
SplitKvConfig::new is const fn with zero guard; actual_chunk_size panics on new(64, 0). Reject 0 (drop const) or return usize::MAX on zero cap.

Request changes. Runtime plumbing is good; tests are carrying invariants the types already own.

…d gate

FeathBow · 2026-07-01T11:03:34Z

thanks for the review :) all above done

FeathBow marked this pull request as draft June 30, 2026 23:31

chore(qwen3): single-source split-KV report measure + per-token serve…

1ea050d

…d gate

FeathBow force-pushed the fix/qwen3-splitkv-single-source branch from 12e3775 to 1ea050d Compare July 1, 2026 10:55

FeathBow marked this pull request as ready for review July 1, 2026 11:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore(qwen3): single-source split-KV config and pin warmup, measure report GEMMs under the production policy#473

chore(qwen3): single-source split-KV config and pin warmup, measure report GEMMs under the production policy#473
FeathBow wants to merge 1 commit into
openinfer-project:mainfrom
FeathBow:fix/qwen3-splitkv-single-source

FeathBow commented Jun 29, 2026

Uh oh!

xiaguan commented Jun 30, 2026

Uh oh!

FeathBow commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

FeathBow commented Jun 29, 2026

Description

Before

After

Test Env

Type of Change

Uh oh!

xiaguan commented Jun 30, 2026

pin_trace_chunk_size — delete

report_gemm_faithful — push most of it into types

Minor

Uh oh!

FeathBow commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`pin_trace_chunk_size` — delete

`report_gemm_faithful` — push most of it into types