-
Notifications
You must be signed in to change notification settings - Fork 11.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ggml-cpu: Faster IQ1 mul_mat_vec on AVX2 using BMI2 instructions #12154
Conversation
Please add also an option to enable it manually, add a check in cpu-feats-x86.cpp, and add it to the CPU variant list in: llama.cpp/ggml/src/CMakeLists.txt Lines 308 to 312 in cc473ca
You could also check for Zen 2 in |
https://github.com/zwegner/zp7 Integrating something like the ZP7 (Zach's Peppy Parallel-Prefix-Popcountin' PEXT/PDEP Polyfill) into llama.cpp could be a smart way to address the performance issues with PDEP and PEXT on AMD Zen 2 and earlier CPUs while maintaining compatibility and efficiency across platforms. Just a polite suggestion. |
dd8f10c
to
d1aeed0
Compare
Update with CMakeLists changes (no Zen 2 specific case, maybe a separate PR can add AMD microarchitectures). |
071c312
to
a3db575
Compare
Looks good, thanks. It would also be necessary to add a llama.cpp/ggml/src/ggml-cpu/ggml-cpu.cpp Line 488 in 1a24c46
I suspect that MSVC will enable BMI2 with llama.cpp/ggml/src/ggml-cpu/CMakeLists.txt Lines 209 to 212 in a3db575
I can check for you if you don't have access to a machine with MSVC. |
Done. |
13900k:
|
…l-org#12154) * ggml-cpu: Faster IQ1 mul_mat_vec on AVX2 using BMI2 instructions * cmake: Add GGML_BMI2 build option * ggml: enable BMI2 on relevant CPU variants * ggml-cpu: include BMI2 in backend score * ggml-cpu: register BMI2 in ggml_backend_cpu_get_features * ggml-cpu: add __BMI2__ define when using MSVC
…l-org#12154) * ggml-cpu: Faster IQ1 mul_mat_vec on AVX2 using BMI2 instructions * cmake: Add GGML_BMI2 build option * ggml: enable BMI2 on relevant CPU variants * ggml-cpu: include BMI2 in backend score * ggml-cpu: register BMI2 in ggml_backend_cpu_get_features * ggml-cpu: add __BMI2__ define when using MSVC
Hello @slaren , @remyoudompheng , It seems that after this PR x86 with AVX2 build for MSVC is failing:
cmake command:
Do you have any recommendation on how to fix this issue? |
Nevermind, just disabled the support for BMI2 on Win32 using |
Hey guys, having issues with this commit, I don't know why. I put all the relevant information and what I could find issue, I did try and compile with various CUDA versions and kind of worked my way to the the current commit. |
Just a heads up, I am confirming that the BMI2 detection is probably wrong because it's forcing BMI2 on a non BMI2 CPU. |
AFAIK the CPU backend does not contain any x86 BMI2 instructions yet.
Is it fine to introduce code using BMI2 instructions?
Is it fine to simply use the
__BMI2__
since "NATIVE" build is now the standard?Some numbers on Zen 4 (new code is about 50% faster)
Note that some old CPUs (AMD Zen 2 and older) support BMI2 but emulate instructions using microcode, resulting in catastrophic slowdowns: owners of such hardware would need to manually disable BMI2 in compiler using
-mno-bmi2
.