Description
Expected Behavior
Pass the oneMKL flags to CMAKE_ARGS and installing llama-cpp-python via pip should finish successfully as the flags are supported by llama.cpp:
https://github.com/ggerganov/llama.cpp#intel-onemkl
Current Behavior
Passing the CMAKE_ARGS flags to pip installation produces error:
CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=Intel10_64lp -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DLLAMA_NATIVE=ON" FORCE_CMAKE=1 \ pip install llama-cpp-python
[20/22] : && /opt/intel/oneapi/compiler/2024.0/bin/icpx -fPIC -O3 -DNDEBUG -shared -Wl,-soname,libllava.so -o vendor/llama.cpp/examples/llava/libllava.so vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/llava.cpp.o vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/clip.cpp.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-backend.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-quants.c.o -Wl,-rpath,/tmp/tmp0dmd36dn/build/vendor/llama.cpp: vendor/llama.cpp/libllama.so /opt/intel/oneapi/mkl/2024.0/lib/libmkl_intel_lp64.so /opt/intel/oneapi/mkl/2024.0/lib/libmkl_intel_thread.so /opt/intel/oneapi/mkl/2024.0/lib/libmkl_core.so /opt/intel/oneapi/compiler/2024.0/lib/libiomp5.so -lm -ldl && : FAILED: vendor/llama.cpp/examples/llava/libllava.so : && /opt/intel/oneapi/compiler/2024.0/bin/icpx -fPIC -O3 -DNDEBUG -shared -Wl,-soname,libllava.so -o vendor/llama.cpp/examples/llava/libllava.so vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/llava.cpp.o vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/clip.cpp.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-backend.c.o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-quants.c.o -Wl,-rpath,/tmp/tmp0dmd36dn/build/vendor/llama.cpp: vendor/llama.cpp/libllama.so /opt/intel/oneapi/mkl/2024.0/lib/libmkl_intel_lp64.so /opt/intel/oneapi/mkl/2024.0/lib/libmkl_intel_thread.so /opt/intel/oneapi/mkl/2024.0/lib/libmkl_core.so /opt/intel/oneapi/compiler/2024.0/lib/libiomp5.so -lm -ldl && : vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o: file not recognized: file format not recognized icpx: error: linker command failed with exit code 1 (use -v to see invocation)
Environment and Context
Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.
- Physical (or virtual) hardware you are using, e.g. for Linux:
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 39 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 12
On-line CPU(s) list: 0-11
Vendor ID: GenuineIntel
Model name: 13th Gen Intel(R) Core(TM) i5-1340P
CPU family: 6
Model: 186
Thread(s) per core: 1
Core(s) per socket: 12
Socket(s): 1
Stepping: 2
BogoMIPS: 4377.60
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_
known_freq pni pclmulqdq vmx ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault ssbd ibr
s ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetb
v1 xsaves avx_vnni arat vnmi umip pku ospke waitpkg gfni vaes vpclmulqdq rdpid movdiri movdir64b fsrm md_clear serialize flush_l1d arch_capabilities
Virtualization features:
Virtualization: VT-x
Hypervisor vendor: KVM
Virtualization type: full
Caches (sum of all):
L1d: 384 KiB (12 instances)
L1i: 384 KiB (12 instances)
L2: 48 MiB (12 instances)
L3: 16 MiB (1 instance)
NUMA:
NUMA node(s): 1
NUMA node0 CPU(s): 0-11
Vulnerabilities:
Gather data sampling: Not affected
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Mmio stale data: Not affected
Retbleed: Not affected
Spec rstack overflow: Not affected
Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Spectre v2: Mitigation; Enhanced / Automatic IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
Srbds: Not affected
Tsx async abort: Not affected
- Operating System, e.g. for Linux:
$ uname -a
Linux ladex 6.7.2-1.el9.elrepo.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Jan 25 23:07:22 EST 2024 x86_64 x86_64 x86_64 GNU/Linux
- SDK version, e.g. for Linux:
$ python3 --version
$ make --version
$ g++ --version
$ cmake --version
$ icx --version
Python 3.11.7
GNU Make 4.3
g++ (GCC) 12.2.1 20221121 (Red Hat 12.2.1-7)
cmake version 3.20.2
Intel(R) oneAPI DPC++/C++ Compiler 2024.0.2 (2024.0.2.20231213)
Building llama.cpp directly with oneAPI works fine and performs 2x better than with BLIS and ~2.8x better than clean (not customized) build via "pip install llama-cpp-python".
Commands used to build llama.cpp direcly with oneAPI:
source /opt/intel/oneapi/setvars.sh
cmake .. -DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=Intel10_64lp -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DLLAMA_NATIVE=ON
cmake --build . --config Release