Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ggml-cpu: faster AVX2 variant for IQ1_M #12216

Merged
merged 1 commit into from
Mar 7, 2025

Conversation

remyoudompheng
Copy link
Contributor

This is an additional optimization for AVX2 after #12154

Benchmarks on AMD Zen 4:

Before
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3]):                  1932 runs -   532.51 us/run - 117.44 MFLOP/run - 220.54 GFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3]):                   980 runs -  1057.22 us/run - 234.88 MFLOP/run - 222.17 GFLOPS

After
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3]):                  2553 runs -   394.78 us/run - 117.44 MFLOP/run - 297.48 GFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=2,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3]):                  1225 runs -   816.50 us/run - 234.88 MFLOP/run - 287.67 GFLOPS
model size params ngl test master t/s PR t/s
qwen2 7B IQ1_M - 1.75 bpw 1.90 GiB 7.62 B 0 pp512 204.84 ± 0.31 204.07 ± 0.58
qwen2 7B IQ1_M - 1.75 bpw 1.90 GiB 7.62 B 0 tg128 14.48 ± 0.02 17.71 ± 0.18

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Mar 6, 2025
@slaren
Copy link
Member

slaren commented Mar 6, 2025

13900k:

Model Threads Test t/s master t/s optim-x86-2 Speedup
llama 8B IQ1_M - 1.75 bpw 8 pp128 20.85 25.11 1.20
llama 8B IQ1_M - 1.75 bpw 8 tg32 16.34 19.54 1.20
llama 8B IQ1_M - 1.75 bpw 16 pp128 29.23 35.90 1.23
llama 8B IQ1_M - 1.75 bpw 16 tg32 20.84 24.17 1.16
llama 8B IQ1_M - 1.75 bpw 24 pp128 32.41 39.21 1.21
llama 8B IQ1_M - 1.75 bpw 24 tg32 22.41 25.95 1.16
llama 8B IQ1_M - 1.75 bpw 32 pp128 37.96 45.32 1.19
llama 8B IQ1_M - 1.75 bpw 32 tg32 22.78 26.94 1.18

@ggerganov ggerganov merged commit 68d0027 into ggml-org:master Mar 7, 2025
47 checks passed
mglambda pushed a commit to mglambda/llama.cpp that referenced this pull request Mar 8, 2025
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Mar 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants