We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent e22c472 commit af49b41Copy full SHA for af49b41
tests/layers/vllm/test_attention.py
@@ -30,9 +30,7 @@
30
# Number of attention heads (Key/Value) - for Grouped-Query Attention
31
NUM_KV_HEADS = 4
32
# Dimension of each attention head
33
-HEAD_DIM = 64
34
-# Padded head dimension
35
-PADDED_HEAD_DIM = 64
+HEAD_DIM = 128
36
# Total number of blocks in the KV cache
37
NUM_BLOCKS = 32
38
# Number of tokens per block
0 commit comments