Skip to content

fix: skip batch decode masks when unpadded#40

Open
anupsv wants to merge 1 commit into
mainfrom
fix/gemma4-unpadded-decode-mask
Open

fix: skip batch decode masks when unpadded#40
anupsv wants to merge 1 commit into
mainfrom
fix/gemma4-unpadded-decode-mask

Conversation

@anupsv

@anupsv anupsv commented Jun 16, 2026

Copy link
Copy Markdown

Summary

  • Return .none for unpadded single-token decode in BatchKVCache and BatchRotatingKVCache.
  • Keep explicit array masks when any row has positive left padding.
  • Add regression coverage for both cache types.

Why

BatchKVCache was forcing an explicit all-true mask for the dominant continuous-batching decode path. That differs from regular KVCache, which uses no mask for n == 1, and matches the Gemma 4 4-bit repetition failure mode reported upstream.

Tests

  • swift test --filter ContinuousBatchingTests/testBatchedKVCachesSkipMaskForUnpaddedSingleTokenDecode (failed before fix, passes after)
  • swift test --filter ContinuousBatchingTests/testBatchedKVCachesKeepMaskForPaddedSingleTokenDecode
  • swift test --filter ContinuousBatchingTests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant