Skip to content

Add variance-normalized KV cache#329

Open
aleroot wants to merge 1 commit into
ml-explore:mainfrom
aleroot:kvarn-inspired-kv-cache
Open

Add variance-normalized KV cache#329
aleroot wants to merge 1 commit into
ml-explore:mainfrom
aleroot:kvarn-inspired-kv-cache

Conversation

@aleroot

@aleroot aleroot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Proposed changes

This PR adds an opt-in, KVarN-inspired variance-normalized KV for reducing KV-cache memory pressure during long-context generation.

This introduces kvQuantizationStrategy: .varianceNormalized, which keeps completed KV tiles compressed and uses cache-native attention over quantized rotated tiles instead of repeatedly materializing full K/V tensors.

The cache applies:

  • Hadamard rotation before tile compression
  • log-domain dual-axis variance normalization with clipped scale updates
  • best-imbalance scale tracking
  • asymmetric K/V quantization, e.g. 4-bit K with 2-bit V
  • compact completed-tile storage with fused quantization and variance scales
  • cache-native attention over completed quantized tiles

Benefits

  • Reduces KV cache memory pressure for long-context generation.
  • Avoids materializing completed compressed tiles on the decode hot path.

Checklist

Put an x in the boxes that apply.

  • I have read the CONTRIBUTING document
  • I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the necessary documentation (if needed)

Notes

The current implementation is slower than the default cache, so the intended use case is memory-constrained long-context decoding, not latency-sensitive generation.

Reference: https://arxiv.org/abs/2606.03458

@aleroot aleroot force-pushed the kvarn-inspired-kv-cache branch from 1d4061a to c8c0b5a Compare June 6, 2026 13:59
@aleroot aleroot force-pushed the kvarn-inspired-kv-cache branch from c8c0b5a to 14be156 Compare June 6, 2026 15:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant