Commit 763f9d7
authored
the fix replicates biases too if they exist (e.g. Qwen) (#328)
The fix takes care of replicating KV heads for models that have biases
in addition to weights (such as Qwen family). The KV replication doubles
the throughput for Qwen/Qwen2.5-1.5B, that has 2-KV, if compiled with
TS4. The script has been successfully tested for Qwen/Qwen2.5-1.5B and
meta-llama/Llama-3.2-1B-Instruct.
Signed-off-by: quic-morteza <[email protected]>1 parent d7a2772 commit 763f9d7
1 file changed
+4
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
63 | 63 | | |
64 | 64 | | |
65 | 65 | | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
66 | 70 | | |
67 | 71 | | |
68 | 72 | | |
| |||
0 commit comments