Skip to content

Conversation

@JohannesGaessler
Copy link
Collaborator

Follow-up to #18537 .

I was able to solve the technical issues I was having with my Strix Halo system and tested the performance change:

Details
GPU Model Microbatch size Test t/s b7644 t/s b7645 Speedup
Radeon 8060S Graphics llama 8B IQ1_S - 1.5625 bpw 1 pp2048 80.49 80.66 1.00
Radeon 8060S Graphics llama 8B IQ1_S - 1.5625 bpw 2 pp2048 135.34 135.54 1.00
Radeon 8060S Graphics llama 8B IQ1_S - 1.5625 bpw 4 pp2048 198.06 198.68 1.00
Radeon 8060S Graphics llama 8B IQ1_S - 1.5625 bpw 8 pp2048 242.29 243.07 1.00
Radeon 8060S Graphics llama 8B IQ1_S - 1.5625 bpw 16 pp2048 478.25 479.90 1.00
Radeon 8060S Graphics llama 8B IQ1_S - 1.5625 bpw 32 pp2048 655.46 658.93 1.01
Radeon 8060S Graphics llama 8B IQ1_S - 1.5625 bpw 64 pp2048 862.67 865.35 1.00
Radeon 8060S Graphics llama 8B IQ1_S - 1.5625 bpw 128 pp2048 977.95 983.60 1.01
Radeon 8060S Graphics llama 8B IQ1_S - 1.5625 bpw 256 pp2048 1026.28 1022.60 1.00
Radeon 8060S Graphics llama 8B IQ1_S - 1.5625 bpw 512 pp2048 1034.86 1042.04 1.01
Radeon 8060S Graphics llama 8B IQ1_S - 1.5625 bpw 1024 pp2048 1081.21 1093.03 1.01
Radeon 8060S Graphics llama 8B IQ1_S - 1.5625 bpw 2048 pp2048 1088.63 1101.54 1.01
Radeon 8060S Graphics llama 8B IQ2_S - 2.5 bpw 1 pp2048 57.53 57.68 1.00
Radeon 8060S Graphics llama 8B IQ2_S - 2.5 bpw 2 pp2048 100.87 101.06 1.00
Radeon 8060S Graphics llama 8B IQ2_S - 2.5 bpw 4 pp2048 154.50 154.99 1.00
Radeon 8060S Graphics llama 8B IQ2_S - 2.5 bpw 8 pp2048 190.41 190.68 1.00
Radeon 8060S Graphics llama 8B IQ2_S - 2.5 bpw 16 pp2048 282.39 284.73 1.01
Radeon 8060S Graphics llama 8B IQ2_S - 2.5 bpw 32 pp2048 584.52 587.79 1.01
Radeon 8060S Graphics llama 8B IQ2_S - 2.5 bpw 64 pp2048 791.65 791.29 1.00
Radeon 8060S Graphics llama 8B IQ2_S - 2.5 bpw 128 pp2048 832.41 833.40 1.00
Radeon 8060S Graphics llama 8B IQ2_S - 2.5 bpw 256 pp2048 862.35 620.52 0.72
Radeon 8060S Graphics llama 8B IQ2_S - 2.5 bpw 512 pp2048 868.23 795.88 0.92
Radeon 8060S Graphics llama 8B IQ2_S - 2.5 bpw 1024 pp2048 912.68 905.39 0.99
Radeon 8060S Graphics llama 8B IQ2_S - 2.5 bpw 2048 pp2048 917.08 977.26 1.07
Radeon 8060S Graphics llama 8B IQ2_XS - 2.3125 bpw 1 pp2048 60.98 61.03 1.00
Radeon 8060S Graphics llama 8B IQ2_XS - 2.3125 bpw 2 pp2048 104.00 104.47 1.00
Radeon 8060S Graphics llama 8B IQ2_XS - 2.3125 bpw 4 pp2048 156.41 157.16 1.00
Radeon 8060S Graphics llama 8B IQ2_XS - 2.3125 bpw 8 pp2048 186.89 187.82 1.00
Radeon 8060S Graphics llama 8B IQ2_XS - 2.3125 bpw 16 pp2048 277.61 280.44 1.01
Radeon 8060S Graphics llama 8B IQ2_XS - 2.3125 bpw 32 pp2048 590.42 592.67 1.00
Radeon 8060S Graphics llama 8B IQ2_XS - 2.3125 bpw 64 pp2048 779.88 782.53 1.00
Radeon 8060S Graphics llama 8B IQ2_XS - 2.3125 bpw 128 pp2048 805.76 808.56 1.00
Radeon 8060S Graphics llama 8B IQ2_XS - 2.3125 bpw 256 pp2048 837.09 589.52 0.70
Radeon 8060S Graphics llama 8B IQ2_XS - 2.3125 bpw 512 pp2048 848.95 760.39 0.90
Radeon 8060S Graphics llama 8B IQ2_XS - 2.3125 bpw 1024 pp2048 891.73 882.25 0.99
Radeon 8060S Graphics llama 8B IQ2_XS - 2.3125 bpw 2048 pp2048 895.69 964.22 1.08
Radeon 8060S Graphics llama 8B IQ2_XXS - 2.0625 bpw 1 pp2048 47.89 47.88 1.00
Radeon 8060S Graphics llama 8B IQ2_XXS - 2.0625 bpw 2 pp2048 86.35 86.22 1.00
Radeon 8060S Graphics llama 8B IQ2_XXS - 2.0625 bpw 4 pp2048 139.38 139.74 1.00
Radeon 8060S Graphics llama 8B IQ2_XXS - 2.0625 bpw 8 pp2048 169.64 170.55 1.01
Radeon 8060S Graphics llama 8B IQ2_XXS - 2.0625 bpw 16 pp2048 346.54 347.67 1.00
Radeon 8060S Graphics llama 8B IQ2_XXS - 2.0625 bpw 32 pp2048 522.02 524.26 1.00
Radeon 8060S Graphics llama 8B IQ2_XXS - 2.0625 bpw 64 pp2048 796.76 800.68 1.00
Radeon 8060S Graphics llama 8B IQ2_XXS - 2.0625 bpw 128 pp2048 961.07 964.32 1.00
Radeon 8060S Graphics llama 8B IQ2_XXS - 2.0625 bpw 256 pp2048 999.78 1003.37 1.00
Radeon 8060S Graphics llama 8B IQ2_XXS - 2.0625 bpw 512 pp2048 1011.26 1024.84 1.01
Radeon 8060S Graphics llama 8B IQ2_XXS - 2.0625 bpw 1024 pp2048 1064.53 1075.43 1.01
Radeon 8060S Graphics llama 8B IQ2_XXS - 2.0625 bpw 2048 pp2048 1070.38 1085.14 1.01
Radeon 8060S Graphics llama 8B IQ3_S - 3.4375 bpw 1 pp2048 44.60 44.57 1.00
Radeon 8060S Graphics llama 8B IQ3_S - 3.4375 bpw 2 pp2048 81.16 81.12 1.00
Radeon 8060S Graphics llama 8B IQ3_S - 3.4375 bpw 4 pp2048 136.74 136.84 1.00
Radeon 8060S Graphics llama 8B IQ3_S - 3.4375 bpw 8 pp2048 181.99 182.07 1.00
Radeon 8060S Graphics llama 8B IQ3_S - 3.4375 bpw 16 pp2048 322.14 322.08 1.00
Radeon 8060S Graphics llama 8B IQ3_S - 3.4375 bpw 32 pp2048 486.18 487.20 1.00
Radeon 8060S Graphics llama 8B IQ3_S - 3.4375 bpw 64 pp2048 788.96 789.56 1.00
Radeon 8060S Graphics llama 8B IQ3_S - 3.4375 bpw 128 pp2048 977.89 977.91 1.00
Radeon 8060S Graphics llama 8B IQ3_S - 3.4375 bpw 256 pp2048 1018.10 1014.87 1.00
Radeon 8060S Graphics llama 8B IQ3_S - 3.4375 bpw 512 pp2048 1019.21 1021.68 1.00
Radeon 8060S Graphics llama 8B IQ3_S - 3.4375 bpw 1024 pp2048 1072.28 1069.02 1.00
Radeon 8060S Graphics llama 8B IQ3_S - 3.4375 bpw 2048 pp2048 1068.64 1072.60 1.00
Radeon 8060S Graphics llama 8B IQ3_S mix - 3.66 bpw 1 pp2048 44.28 44.30 1.00
Radeon 8060S Graphics llama 8B IQ3_S mix - 3.66 bpw 2 pp2048 79.72 79.67 1.00
Radeon 8060S Graphics llama 8B IQ3_S mix - 3.66 bpw 4 pp2048 131.11 131.06 1.00
Radeon 8060S Graphics llama 8B IQ3_S mix - 3.66 bpw 8 pp2048 170.25 170.14 1.00
Radeon 8060S Graphics llama 8B IQ3_S mix - 3.66 bpw 16 pp2048 335.28 336.16 1.00
Radeon 8060S Graphics llama 8B IQ3_S mix - 3.66 bpw 32 pp2048 499.62 503.04 1.01
Radeon 8060S Graphics llama 8B IQ3_S mix - 3.66 bpw 64 pp2048 795.08 796.11 1.00
Radeon 8060S Graphics llama 8B IQ3_S mix - 3.66 bpw 128 pp2048 977.52 978.78 1.00
Radeon 8060S Graphics llama 8B IQ3_S mix - 3.66 bpw 256 pp2048 1015.43 1015.62 1.00
Radeon 8060S Graphics llama 8B IQ3_S mix - 3.66 bpw 512 pp2048 1021.13 1020.96 1.00
Radeon 8060S Graphics llama 8B IQ3_S mix - 3.66 bpw 1024 pp2048 1069.80 1067.72 1.00
Radeon 8060S Graphics llama 8B IQ3_S mix - 3.66 bpw 2048 pp2048 1068.59 1073.08 1.00
Radeon 8060S Graphics llama 8B IQ3_XS - 3.3 bpw 1 pp2048 49.95 49.87 1.00
Radeon 8060S Graphics llama 8B IQ3_XS - 3.3 bpw 2 pp2048 89.89 90.24 1.00
Radeon 8060S Graphics llama 8B IQ3_XS - 3.3 bpw 4 pp2048 145.94 146.12 1.00
Radeon 8060S Graphics llama 8B IQ3_XS - 3.3 bpw 8 pp2048 187.61 188.10 1.00
Radeon 8060S Graphics llama 8B IQ3_XS - 3.3 bpw 16 pp2048 360.69 360.83 1.00
Radeon 8060S Graphics llama 8B IQ3_XS - 3.3 bpw 32 pp2048 524.34 525.70 1.00
Radeon 8060S Graphics llama 8B IQ3_XS - 3.3 bpw 64 pp2048 822.41 825.63 1.00
Radeon 8060S Graphics llama 8B IQ3_XS - 3.3 bpw 128 pp2048 1005.10 1003.98 1.00
Radeon 8060S Graphics llama 8B IQ3_XS - 3.3 bpw 256 pp2048 1046.09 1043.41 1.00
Radeon 8060S Graphics llama 8B IQ3_XS - 3.3 bpw 512 pp2048 1052.57 1053.20 1.00
Radeon 8060S Graphics llama 8B IQ3_XS - 3.3 bpw 1024 pp2048 1103.97 1102.13 1.00
Radeon 8060S Graphics llama 8B IQ3_XS - 3.3 bpw 2048 pp2048 1104.34 1103.92 1.00
Radeon 8060S Graphics llama 8B IQ3_XXS - 3.0625 bpw 1 pp2048 54.26 54.23 1.00
Radeon 8060S Graphics llama 8B IQ3_XXS - 3.0625 bpw 2 pp2048 96.89 96.86 1.00
Radeon 8060S Graphics llama 8B IQ3_XXS - 3.0625 bpw 4 pp2048 152.05 152.09 1.00
Radeon 8060S Graphics llama 8B IQ3_XXS - 3.0625 bpw 8 pp2048 188.68 188.74 1.00
Radeon 8060S Graphics llama 8B IQ3_XXS - 3.0625 bpw 16 pp2048 372.62 374.57 1.01
Radeon 8060S Graphics llama 8B IQ3_XXS - 3.0625 bpw 32 pp2048 558.53 559.39 1.00
Radeon 8060S Graphics llama 8B IQ3_XXS - 3.0625 bpw 64 pp2048 836.24 838.77 1.00
Radeon 8060S Graphics llama 8B IQ3_XXS - 3.0625 bpw 128 pp2048 1007.78 1008.26 1.00
Radeon 8060S Graphics llama 8B IQ3_XXS - 3.0625 bpw 256 pp2048 1053.00 1013.16 0.96
Radeon 8060S Graphics llama 8B IQ3_XXS - 3.0625 bpw 512 pp2048 1066.48 1031.93 0.97
Radeon 8060S Graphics llama 8B IQ3_XXS - 3.0625 bpw 1024 pp2048 1111.23 1095.85 0.99
Radeon 8060S Graphics llama 8B IQ3_XXS - 3.0625 bpw 2048 pp2048 1111.26 1115.50 1.00
Radeon 8060S Graphics llama 8B IQ4_NL - 4.5 bpw 1 pp2048 49.48 49.47 1.00
Radeon 8060S Graphics llama 8B IQ4_NL - 4.5 bpw 2 pp2048 94.60 94.48 1.00
Radeon 8060S Graphics llama 8B IQ4_NL - 4.5 bpw 4 pp2048 168.98 168.88 1.00
Radeon 8060S Graphics llama 8B IQ4_NL - 4.5 bpw 8 pp2048 249.00 249.22 1.00
Radeon 8060S Graphics llama 8B IQ4_NL - 4.5 bpw 16 pp2048 457.06 457.92 1.00
Radeon 8060S Graphics llama 8B IQ4_NL - 4.5 bpw 32 pp2048 516.60 519.00 1.00
Radeon 8060S Graphics llama 8B IQ4_NL - 4.5 bpw 64 pp2048 944.19 943.83 1.00
Radeon 8060S Graphics llama 8B IQ4_NL - 4.5 bpw 128 pp2048 1072.45 1072.78 1.00
Radeon 8060S Graphics llama 8B IQ4_NL - 4.5 bpw 256 pp2048 1138.03 1137.18 1.00
Radeon 8060S Graphics llama 8B IQ4_NL - 4.5 bpw 512 pp2048 1148.44 1150.13 1.00
Radeon 8060S Graphics llama 8B IQ4_NL - 4.5 bpw 1024 pp2048 1202.56 1191.82 0.99
Radeon 8060S Graphics llama 8B IQ4_NL - 4.5 bpw 2048 pp2048 1194.05 1192.39 1.00
Radeon 8060S Graphics llama 8B IQ4_XS - 4.25 bpw 1 pp2048 52.82 52.85 1.00
Radeon 8060S Graphics llama 8B IQ4_XS - 4.25 bpw 2 pp2048 100.68 100.71 1.00
Radeon 8060S Graphics llama 8B IQ4_XS - 4.25 bpw 4 pp2048 179.94 180.12 1.00
Radeon 8060S Graphics llama 8B IQ4_XS - 4.25 bpw 8 pp2048 260.81 261.26 1.00
Radeon 8060S Graphics llama 8B IQ4_XS - 4.25 bpw 16 pp2048 493.87 494.99 1.00
Radeon 8060S Graphics llama 8B IQ4_XS - 4.25 bpw 32 pp2048 389.10 391.24 1.01
Radeon 8060S Graphics llama 8B IQ4_XS - 4.25 bpw 64 pp2048 947.96 948.85 1.00
Radeon 8060S Graphics llama 8B IQ4_XS - 4.25 bpw 128 pp2048 1088.40 1087.98 1.00
Radeon 8060S Graphics llama 8B IQ4_XS - 4.25 bpw 256 pp2048 1153.12 1152.56 1.00
Radeon 8060S Graphics llama 8B IQ4_XS - 4.25 bpw 512 pp2048 1162.22 1163.81 1.00
Radeon 8060S Graphics llama 8B IQ4_XS - 4.25 bpw 1024 pp2048 1215.60 1207.64 0.99
Radeon 8060S Graphics llama 8B IQ4_XS - 4.25 bpw 2048 pp2048 1202.71 1204.21 1.00
Radeon 8060S Graphics llama 8B Q2_K_S 1 pp2048 65.70 65.86 1.00
Radeon 8060S Graphics llama 8B Q2_K_S 2 pp2048 96.45 96.56 1.00
Radeon 8060S Graphics llama 8B Q2_K_S 4 pp2048 118.90 119.08 1.00
Radeon 8060S Graphics llama 8B Q2_K_S 8 pp2048 113.61 113.57 1.00
Radeon 8060S Graphics llama 8B Q2_K_S 16 pp2048 225.46 227.19 1.01
Radeon 8060S Graphics llama 8B Q2_K_S 32 pp2048 372.21 374.90 1.01
Radeon 8060S Graphics llama 8B Q2_K_S 64 pp2048 521.49 524.02 1.00
Radeon 8060S Graphics llama 8B Q2_K_S 128 pp2048 566.55 569.93 1.01
Radeon 8060S Graphics llama 8B Q2_K_S 256 pp2048 598.39 606.65 1.01
Radeon 8060S Graphics llama 8B Q2_K_S 512 pp2048 628.21 772.53 1.23
Radeon 8060S Graphics llama 8B Q2_K_S 1024 pp2048 682.67 893.04 1.31
Radeon 8060S Graphics llama 8B Q2_K_S 2048 pp2048 688.09 978.68 1.42
Radeon 8060S Graphics llama 8B Q3_K_S 1 pp2048 48.52 48.69 1.00
Radeon 8060S Graphics llama 8B Q3_K_S 2 pp2048 81.08 81.52 1.01
Radeon 8060S Graphics llama 8B Q3_K_S 4 pp2048 113.30 113.65 1.00
Radeon 8060S Graphics llama 8B Q3_K_S 8 pp2048 112.39 112.71 1.00
Radeon 8060S Graphics llama 8B Q3_K_S 16 pp2048 324.66 326.53 1.01
Radeon 8060S Graphics llama 8B Q3_K_S 32 pp2048 628.08 632.08 1.01
Radeon 8060S Graphics llama 8B Q3_K_S 64 pp2048 822.62 824.57 1.00
Radeon 8060S Graphics llama 8B Q3_K_S 128 pp2048 961.21 963.00 1.00
Radeon 8060S Graphics llama 8B Q3_K_S 256 pp2048 1010.43 1010.83 1.00
Radeon 8060S Graphics llama 8B Q3_K_S 512 pp2048 1048.95 1052.41 1.00
Radeon 8060S Graphics llama 8B Q3_K_S 1024 pp2048 1072.77 1073.93 1.00
Radeon 8060S Graphics llama 8B Q3_K_S 2048 pp2048 1073.66 1073.30 1.00
Radeon 8060S Graphics llama 8B Q4_0 1 pp2048 50.08 50.17 1.00
Radeon 8060S Graphics llama 8B Q4_0 2 pp2048 96.16 96.54 1.00
Radeon 8060S Graphics llama 8B Q4_0 4 pp2048 172.93 173.57 1.00
Radeon 8060S Graphics llama 8B Q4_0 8 pp2048 248.61 251.83 1.01
Radeon 8060S Graphics llama 8B Q4_0 16 pp2048 450.84 459.31 1.02
Radeon 8060S Graphics llama 8B Q4_0 32 pp2048 343.86 349.90 1.02
Radeon 8060S Graphics llama 8B Q4_0 64 pp2048 905.11 918.95 1.02
Radeon 8060S Graphics llama 8B Q4_0 128 pp2048 1053.75 1070.02 1.02
Radeon 8060S Graphics llama 8B Q4_0 256 pp2048 1110.67 1126.46 1.01
Radeon 8060S Graphics llama 8B Q4_0 512 pp2048 1119.08 1141.89 1.02
Radeon 8060S Graphics llama 8B Q4_0 1024 pp2048 1175.03 1194.38 1.02
Radeon 8060S Graphics llama 8B Q4_0 2048 pp2048 1172.29 1187.18 1.01
Radeon 8060S Graphics llama 8B Q4_1 1 pp2048 45.03 45.03 1.00
Radeon 8060S Graphics llama 8B Q4_1 2 pp2048 88.58 88.61 1.00
Radeon 8060S Graphics llama 8B Q4_1 4 pp2048 162.98 163.23 1.00
Radeon 8060S Graphics llama 8B Q4_1 8 pp2048 253.76 254.94 1.00
Radeon 8060S Graphics llama 8B Q4_1 16 pp2048 439.88 441.15 1.00
Radeon 8060S Graphics llama 8B Q4_1 32 pp2048 675.67 678.62 1.00
Radeon 8060S Graphics llama 8B Q4_1 64 pp2048 889.72 895.45 1.01
Radeon 8060S Graphics llama 8B Q4_1 128 pp2048 959.38 965.49 1.01
Radeon 8060S Graphics llama 8B Q4_1 256 pp2048 1005.98 1017.63 1.01
Radeon 8060S Graphics llama 8B Q4_1 512 pp2048 1028.11 1040.51 1.01
Radeon 8060S Graphics llama 8B Q4_1 1024 pp2048 1083.72 1092.39 1.01
Radeon 8060S Graphics llama 8B Q4_1 2048 pp2048 1092.66 1096.33 1.00
Radeon 8060S Graphics llama 8B Q4_K_S 1 pp2048 42.71 42.59 1.00
Radeon 8060S Graphics llama 8B Q4_K_S 2 pp2048 69.58 69.36 1.00
Radeon 8060S Graphics llama 8B Q4_K_S 4 pp2048 98.88 98.76 1.00
Radeon 8060S Graphics llama 8B Q4_K_S 8 pp2048 116.96 117.04 1.00
Radeon 8060S Graphics llama 8B Q4_K_S 16 pp2048 458.83 461.10 1.00
Radeon 8060S Graphics llama 8B Q4_K_S 32 pp2048 662.41 667.17 1.01
Radeon 8060S Graphics llama 8B Q4_K_S 64 pp2048 869.02 873.42 1.01
Radeon 8060S Graphics llama 8B Q4_K_S 128 pp2048 988.74 992.93 1.00
Radeon 8060S Graphics llama 8B Q4_K_S 256 pp2048 1037.34 1038.45 1.00
Radeon 8060S Graphics llama 8B Q4_K_S 512 pp2048 1044.03 1047.98 1.00
Radeon 8060S Graphics llama 8B Q4_K_S 1024 pp2048 1098.50 1101.53 1.00
Radeon 8060S Graphics llama 8B Q4_K_S 2048 pp2048 1104.48 1103.43 1.00
Radeon 8060S Graphics llama 8B Q5_0 1 pp2048 42.14 42.25 1.00
Radeon 8060S Graphics llama 8B Q5_0 2 pp2048 81.06 81.45 1.00
Radeon 8060S Graphics llama 8B Q5_0 4 pp2048 148.24 148.21 1.00
Radeon 8060S Graphics llama 8B Q5_0 8 pp2048 226.26 226.33 1.00
Radeon 8060S Graphics llama 8B Q5_0 16 pp2048 398.10 398.76 1.00
Radeon 8060S Graphics llama 8B Q5_0 32 pp2048 294.15 296.09 1.01
Radeon 8060S Graphics llama 8B Q5_0 64 pp2048 866.06 866.26 1.00
Radeon 8060S Graphics llama 8B Q5_0 128 pp2048 1040.95 1043.27 1.00
Radeon 8060S Graphics llama 8B Q5_0 256 pp2048 1104.15 1106.47 1.00
Radeon 8060S Graphics llama 8B Q5_0 512 pp2048 1124.53 1127.04 1.00
Radeon 8060S Graphics llama 8B Q5_0 1024 pp2048 1167.65 1169.41 1.00
Radeon 8060S Graphics llama 8B Q5_0 2048 pp2048 1155.58 1158.23 1.00
Radeon 8060S Graphics llama 8B Q5_1 1 pp2048 36.42 36.53 1.00
Radeon 8060S Graphics llama 8B Q5_1 2 pp2048 71.83 72.02 1.00
Radeon 8060S Graphics llama 8B Q5_1 4 pp2048 135.70 135.90 1.00
Radeon 8060S Graphics llama 8B Q5_1 8 pp2048 220.35 220.84 1.00
Radeon 8060S Graphics llama 8B Q5_1 16 pp2048 306.96 306.97 1.00
Radeon 8060S Graphics llama 8B Q5_1 32 pp2048 540.63 543.56 1.01
Radeon 8060S Graphics llama 8B Q5_1 64 pp2048 795.80 797.04 1.00
Radeon 8060S Graphics llama 8B Q5_1 128 pp2048 912.04 914.19 1.00
Radeon 8060S Graphics llama 8B Q5_1 256 pp2048 973.76 975.76 1.00
Radeon 8060S Graphics llama 8B Q5_1 512 pp2048 1000.48 996.70 1.00
Radeon 8060S Graphics llama 8B Q5_1 1024 pp2048 1054.89 1051.86 1.00
Radeon 8060S Graphics llama 8B Q5_1 2048 pp2048 1058.78 1060.83 1.00
Radeon 8060S Graphics llama 8B Q5_K_S 1 pp2048 38.61 38.78 1.00
Radeon 8060S Graphics llama 8B Q5_K_S 2 pp2048 64.82 64.83 1.00
Radeon 8060S Graphics llama 8B Q5_K_S 4 pp2048 93.97 94.17 1.00
Radeon 8060S Graphics llama 8B Q5_K_S 8 pp2048 113.83 113.88 1.00
Radeon 8060S Graphics llama 8B Q5_K_S 16 pp2048 453.38 454.37 1.00
Radeon 8060S Graphics llama 8B Q5_K_S 32 pp2048 673.74 674.86 1.00
Radeon 8060S Graphics llama 8B Q5_K_S 64 pp2048 882.35 885.74 1.00
Radeon 8060S Graphics llama 8B Q5_K_S 128 pp2048 965.68 969.79 1.00
Radeon 8060S Graphics llama 8B Q5_K_S 256 pp2048 1009.28 1011.65 1.00
Radeon 8060S Graphics llama 8B Q5_K_S 512 pp2048 1025.61 1028.44 1.00
Radeon 8060S Graphics llama 8B Q5_K_S 1024 pp2048 1086.24 1083.86 1.00
Radeon 8060S Graphics llama 8B Q5_K_S 2048 pp2048 1089.73 1091.21 1.00
Radeon 8060S Graphics llama 8B Q6_K 1 pp2048 34.17 34.23 1.00
Radeon 8060S Graphics llama 8B Q6_K 2 pp2048 65.05 65.15 1.00
Radeon 8060S Graphics llama 8B Q6_K 4 pp2048 112.52 112.28 1.00
Radeon 8060S Graphics llama 8B Q6_K 8 pp2048 145.02 144.71 1.00
Radeon 8060S Graphics llama 8B Q6_K 16 pp2048 338.73 339.38 1.00
Radeon 8060S Graphics llama 8B Q6_K 32 pp2048 488.43 490.40 1.00
Radeon 8060S Graphics llama 8B Q6_K 64 pp2048 637.44 635.32 1.00
Radeon 8060S Graphics llama 8B Q6_K 128 pp2048 654.69 654.49 1.00
Radeon 8060S Graphics llama 8B Q6_K 256 pp2048 683.40 573.48 0.84
Radeon 8060S Graphics llama 8B Q6_K 512 pp2048 695.59 755.10 1.09
Radeon 8060S Graphics llama 8B Q6_K 1024 pp2048 735.84 877.38 1.19
Radeon 8060S Graphics llama 8B Q6_K 2048 pp2048 746.75 953.73 1.28
Radeon 8060S Graphics llama 8B Q8_0 1 pp2048 28.30 28.31 1.00
Radeon 8060S Graphics llama 8B Q8_0 2 pp2048 55.89 56.06 1.00
Radeon 8060S Graphics llama 8B Q8_0 4 pp2048 105.92 106.56 1.01
Radeon 8060S Graphics llama 8B Q8_0 8 pp2048 188.80 189.22 1.00
Radeon 8060S Graphics llama 8B Q8_0 16 pp2048 336.63 337.55 1.00
Radeon 8060S Graphics llama 8B Q8_0 32 pp2048 384.23 389.19 1.01
Radeon 8060S Graphics llama 8B Q8_0 64 pp2048 821.79 825.11 1.00
Radeon 8060S Graphics llama 8B Q8_0 128 pp2048 972.84 981.31 1.01
Radeon 8060S Graphics llama 8B Q8_0 256 pp2048 1027.77 1031.53 1.00
Radeon 8060S Graphics llama 8B Q8_0 512 pp2048 1051.68 1054.52 1.00
Radeon 8060S Graphics llama 8B Q8_0 1024 pp2048 1112.74 1110.91 1.00
Radeon 8060S Graphics llama 8B Q8_0 2048 pp2048 1112.56 1112.48 1.00

This PR changes the kernel selection logic to use MMQ if either the performance of the hipBLAS path is worse of if the speedup is small and it would not really be worth the increase in memory use.

@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Jan 7, 2026
Copy link
Collaborator

@IMbackK IMbackK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise lgtm

}

// For some quantization types MMQ can have lower peak TOPS than hipBLAS
// so it's only faster for sufficiently small batch sizes:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra spaces

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intentional since the sentence is spanning multiple lines.

Copy link
Collaborator

@IMbackK IMbackK Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

greping around in the codebase this is not the style used making it a bit awkward. but its not a big deal

Copy link
Contributor

@Beinsezii Beinsezii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't have the chance to test at the moment but it looks good. surprised that 3_0 is so much worse in mmq than everything else

@IMbackK
Copy link
Collaborator

IMbackK commented Jan 7, 2026

for CDNA mmq is also a mixed bag, generally gfx1100 and cdna1 and cdna2 have the best tuned tensile kernels so i think its more a case of blas doing better there than mmq doing worse.

@Beinsezii
Copy link
Contributor

Probably a visit to q2/q6 perf would help everyone then.

@IMbackK
Copy link
Collaborator

IMbackK commented Jan 7, 2026

iirc from previous discussions the q2 performance anomaly also exists on cuda + mmq. someone could take a look at those kernels specifically, i havent because i dont find the q2 variants a very interesting datatype.

@Beinsezii
Copy link
Contributor

i havent because i dont find the q2 variants a very interesting datatype.

For me Q6 is the one that hurts as it's perfect for Mistral 3.2 on 24GiB. Otherwise I probably wouldn't have ever found this problem.

@JohannesGaessler JohannesGaessler merged commit d2ff4e2 into ggml-org:master Jan 10, 2026
75 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants