Skip to content

chore(qwen3): single-source split-KV config and pin warmup, measure report GEMMs under the production policy#473

Open
FeathBow wants to merge 1 commit into
openinfer-project:mainfrom
FeathBow:fix/qwen3-splitkv-single-source
Open

chore(qwen3): single-source split-KV config and pin warmup, measure report GEMMs under the production policy#473
FeathBow wants to merge 1 commit into
openinfer-project:mainfrom
FeathBow:fix/qwen3-splitkv-single-source

Conversation

@FeathBow

Copy link
Copy Markdown
Collaborator

Description

Refs #414, #435. Follows #462.

Two cleanups on the Qwen3 decode path, no intended change to production serving.

Before

  • The split-KV decode config (chunk-size formula, per-request cap, and a split_kv_* label) was spread across runtime constants, a duplicated formula, and a hardcoded split_kv_256x64 label at several call sites, which could silently desync.
  • qwen3_model_report timed decode projection GEMMs in a fresh untuned context, so under the default Tuned policy it measured plain GemmEx, not the algo production runs.

After

  • Split-KV config is single-sourced: SplitKvConfig (formula/label/parse) lives in the qwen3 crate (src/split_kv.rs) and openinfer-core carries a typed PagedDecodePath; the trace records the runtime-resolved chunk/cap as attrs instead of a hardcoded label, which the report reads back, failing loud if missing or divergent.
  • The report routes projection GEMM measures through the production --policy and launch_gemm: Pin/PerToken are measured faithfully; Tuned is flagged unfaithful_gemmex and excluded from the totals (pointing at --policy pin). It records measured_split_kv and a test asserts it equals the recorded attrs, so the measure cannot silently re-derive chunk/cap from kv_len. The default output path is policy-keyed.

Test Env

test suite passed on sm_89, x86_64.

Type of Change

  • Chore (non-breaking change which fixes an issue)

@xiaguan

xiaguan commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator

Core move is right: routing the report's projection-GEMM measure through numeric_policy() and replaying split-KV scalars verbatim from synced_split_kvPagedDecodePath attrs is a clean single-writer/single-reader channel. Problem is the two tests are re-verifying invariants the types already own.

pin_trace_chunk_size — delete

The invariant ("trace records the resolved chunk, not a kv_len re-derivation") is already in PagedDecodePath::SplitKv { chunk_size, cap } + the one-write/one-read synced_split_kv. The test is a full GPU + safetensors run verifying a field copy. The large = 8192 pick also hides assumptions about max_position_embeddings that break silently when max_pos / 256 == 8192 / 64.

report_gemm_faithful — push most of it into types

The Tuned→unfaithful_gemmex, Pin/PerToken→totals, total_is_partial chain is a runtime if over an ad-hoc string check — a contract the type system should own. Dispatch measure_catalog on policy so a Tuned GEMM measure returns Excluded, not LatencyStats; total_is_partial then falls out of by_op, not a post-hoc coverage_rows scan.

What's left of the test then is pin_served > 0 — a genuine runtime observation the compiler cannot reach, worth its CI cost. Note the current PerToken branch is a false-positive gate: it asserts pin_served == 0 + measured but never observes that PerToken served anything PerToken-specific, so it would pass for a --policy per-token silently falling back to GemmEx.

Minor

  • schema: 5 jumps 3 versions; note "local-only trials, shipping at 5" or bump one.
  • AttentionDecodeCase::new(batch, kv_len) is dead and inherits the now-drifted DEFAULT_SPLIT_KV_CONFIG. Delete it.
  • SplitKvConfig::new is const fn with zero guard; actual_chunk_size panics on new(64, 0). Reject 0 (drop const) or return usize::MAX on zero cap.

Request changes. Runtime plumbing is good; tests are carrying invariants the types already own.

@FeathBow FeathBow marked this pull request as draft June 30, 2026 23:31
@FeathBow FeathBow force-pushed the fix/qwen3-splitkv-single-source branch from 12e3775 to 1ea050d Compare July 1, 2026 10:55
@FeathBow

FeathBow commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator Author

thanks for the review :) all above done

@FeathBow FeathBow marked this pull request as ready for review July 1, 2026 11:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants