Open
Description
I'm trying to install llama-cpp-python through CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python
but met this error. Any suggestion?
× Building wheel for llama-cpp-python (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [71 lines of output]
*** scikit-build-core 0.7.1 using CMake 3.22.1 (wheel)
*** Configuring CMake...
loading initial cache file /tmp/tmph92nnsrw/build/CMakeInit.txt
-- The C compiler identification is GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: /usr/bin/git (found version "2.34.1")
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Found CUDAToolkit: /usr/local/cuda/include (found version "11.6.55")
-- cuBLAS found
-- The CUDA compiler identification is NVIDIA 11.6.55
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Using CUDA architectures: 52;61;70
-- CUDA host compiler is GNU 11.4.0
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
INSTALL TARGETS - target llama has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
INSTALL TARGETS - target llama has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
-- Configuring done
-- Generating done
-- Build files have been written to: /tmp/tmph92nnsrw/build
*** Building project with Ninja...
[1/23] cd /tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp && /usr/bin/cmake -DMSVC= -DCMAKE_C_COMPILER_VERSION=11.4.0 -DCMAKE_C_COMPILER_ID=GNU -DCMAKE_VS_PLATFORM_NAME= -DCMAKE_C_COMPILER=/usr/bin/cc -P /tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/common/../scripts/gen-build-info-cpp.cmake
-- Found Git: /usr/bin/git (found version "2.34.1")
[2/23] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -O3 -DNDEBUG -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -std=gnu++11 -MD -MT vendor/llama.cpp/common/CMakeFiles/build_info.dir/build-info.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/build_info.dir/build-info.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/build_info.dir/build-info.cpp.o -c /tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/common/build-info.cpp
[3/23] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/common/. -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/. -O3 -DNDEBUG -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -std=gnu++11 -MD -MT vendor/llama.cpp/common/CMakeFiles/common.dir/console.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/common.dir/console.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/common.dir/console.cpp.o -c /tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/common/console.cpp
[4/23] /usr/bin/cc -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/. -isystem /usr/local/cuda/include -O3 -DNDEBUG -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -march=native -std=gnu11 -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o.d -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o -c /tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/ggml-alloc.c
[5/23] /usr/bin/c++ -DLLAMA_BUILD -DLLAMA_SHARED -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/examples/llava/. -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/examples/llava/../.. -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/examples/llava/../../common -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/. -isystem /usr/local/cuda/include -O3 -DNDEBUG -fPIC -Wno-cast-qual -MD -MT vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/llava.cpp.o -MF vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/llava.cpp.o.d -o vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/llava.cpp.o -c /tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/examples/llava/llava.cpp
[6/23] /usr/bin/cc -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/. -isystem /usr/local/cuda/include -O3 -DNDEBUG -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -march=native -std=gnu11 -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-backend.c.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-backend.c.o.d -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-backend.c.o -c /tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/ggml-backend.c
[7/23] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/common/. -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/. -O3 -DNDEBUG -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -std=gnu++11 -MD -MT vendor/llama.cpp/common/CMakeFiles/common.dir/sampling.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/common.dir/sampling.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/common.dir/sampling.cpp.o -c /tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/common/sampling.cpp
[8/23] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/common/. -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/. -O3 -DNDEBUG -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -std=gnu++11 -MD -MT vendor/llama.cpp/common/CMakeFiles/common.dir/grammar-parser.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/common.dir/grammar-parser.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/common.dir/grammar-parser.cpp.o -c /tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/common/grammar-parser.cpp
[9/23] /usr/bin/c++ -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/common/. -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/. -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/examples/llava/. -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/examples/llava/../.. -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/examples/llava/../../common -O3 -DNDEBUG -MD -MT vendor/llama.cpp/examples/llava/CMakeFiles/llava-cli.dir/llava-cli.cpp.o -MF vendor/llama.cpp/examples/llava/CMakeFiles/llava-cli.dir/llava-cli.cpp.o.d -o vendor/llama.cpp/examples/llava/CMakeFiles/llava-cli.dir/llava-cli.cpp.o -c /tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/examples/llava/llava-cli.cpp
[10/23] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/. -isystem=/usr/local/cuda/include -O3 -DNDEBUG --generate-code=arch=compute_52,code=[compute_52,sm_52] --generate-code=arch=compute_61,code=[compute_61,sm_61] --generate-code=arch=compute_70,code=[compute_70,sm_70] -Xcompiler=-fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -use_fast_math -Wno-pedantic -Xcompiler "-Wno-array-bounds -Wno-format-truncation -Wextra-semi" -march=native -std=c++11 -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o.d -x cu -c /tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/ggml-cuda.cu -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o
FAILED: vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o
/usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/. -isystem=/usr/local/cuda/include -O3 -DNDEBUG --generate-code=arch=compute_52,code=[compute_52,sm_52] --generate-code=arch=compute_61,code=[compute_61,sm_61] --generate-code=arch=compute_70,code=[compute_70,sm_70] -Xcompiler=-fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -use_fast_math -Wno-pedantic -Xcompiler "-Wno-array-bounds -Wno-format-truncation -Wextra-semi" -march=native -std=c++11 -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o.d -x cu -c /tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/ggml-cuda.cu -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o
/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/ggml-cuda.cu(626): error: identifier "__hmax2" is undefined
/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/ggml-cuda.cu(5462): error: identifier "__hmax2" is undefined
/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/ggml-cuda.cu(5474): error: identifier "__hmax" is undefined
/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/ggml-cuda.cu(5481): error: identifier "__hmax" is undefined
4 errors detected in the compilation of "/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/ggml-cuda.cu".
[11/23] /usr/bin/cc -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/. -isystem /usr/local/cuda/include -O3 -DNDEBUG -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -march=native -std=gnu11 -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-quants.c.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-quants.c.o.d -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-quants.c.o -c /tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/ggml-quants.c
[12/23] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/common/. -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/. -O3 -DNDEBUG -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -std=gnu++11 -MD -MT vendor/llama.cpp/common/CMakeFiles/common.dir/train.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/common.dir/train.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/common.dir/train.cpp.o -c /tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/common/train.cpp
[13/23] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/common/. -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/. -O3 -DNDEBUG -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -std=gnu++11 -MD -MT vendor/llama.cpp/common/CMakeFiles/common.dir/common.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/common.dir/common.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/common.dir/common.cpp.o -c /tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/common/common.cpp
[14/23] /usr/bin/cc -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/. -isystem /usr/local/cuda/include -O3 -DNDEBUG -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -march=native -std=gnu11 -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o.d -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o -c /tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/ggml.c
[15/23] /usr/bin/c++ -DLLAMA_BUILD -DLLAMA_SHARED -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/examples/llava/. -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/examples/llava/../.. -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/examples/llava/../../common -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/. -isystem /usr/local/cuda/include -O3 -DNDEBUG -fPIC -Wno-cast-qual -MD -MT vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/clip.cpp.o -MF vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/clip.cpp.o.d -o vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/clip.cpp.o -c /tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/examples/llava/clip.cpp
[16/23] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -DLLAMA_BUILD -DLLAMA_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dllama_EXPORTS -I/tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/. -isystem /usr/local/cuda/include -O3 -DNDEBUG -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -std=gnu++11 -MD -MT vendor/llama.cpp/CMakeFiles/llama.dir/llama.cpp.o -MF vendor/llama.cpp/CMakeFiles/llama.dir/llama.cpp.o.d -o vendor/llama.cpp/CMakeFiles/llama.dir/llama.cpp.o -c /tmp/pip-install-lps8mve2/llama-cpp-python_f08f076e92114539aaa45aee27b04713/vendor/llama.cpp/llama.cpp
ninja: build stopped: subcommand failed.
*** CMake build failed
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for llama-cpp-python
Failed to build llama-cpp-python
ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects