Open
Description
Prerequisites
Please answer the following questions for yourself before submitting an issue.
- I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new bug or useful enhancement to share.
Expected Behavior
llama-cpp-python
can be built with cmake
Current Behavior
building llama-cpp-python
with cmake throws an error.
Environment and Context
Docker container
Failure Information (for bugs)
Steps to Reproduce
running CMAKE_ARGS='-DLLAMA_CUBLAS=on' pip install --force-reinstall --no-cache-dir llama-cpp-python
(currently latest version 0.2.29)
Can be solved by pinning to an earlier version:
CMAKE_ARGS='-DLLAMA_CUBLAS=on' pip install --force-reinstall --no-cache-dir llama-cpp-python==0.2.27
Failure Logs
=> ERROR [7/8] RUN CMAKE_ARGS='-DLLAMA_CUBLAS=on' pip install --force-reinstall --no-cache-dir llama-cpp-python 58.8s
------
> [7/8] RUN CMAKE_ARGS='-DLLAMA_CUBLAS=on' pip install --force-reinstall --no-cache-dir llama-cpp-python:
0.865 Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
2.003 Collecting llama-cpp-python
2.160 Downloading llama_cpp_python-0.2.29.tar.gz (9.5 MB)
2.887 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 9.5/9.5 MB 13.1 MB/s eta 0:00:00
3.241 Installing build dependencies: started
10.39 Installing build dependencies: finished with status 'done'
10.39 Getting requirements to build wheel: started
10.51 Getting requirements to build wheel: finished with status 'done'
10.51 Installing backend dependencies: started
38.70 Installing backend dependencies: finished with status 'done'
38.70 Preparing metadata (pyproject.toml): started
38.80 Preparing metadata (pyproject.toml): finished with status 'done'
39.70 Collecting typing-extensions>=4.5.0 (from llama-cpp-python)
39.72 Downloading typing_extensions-4.9.0-py3-none-any.whl.metadata (3.0 kB)
40.65 Collecting numpy>=1.20.0 (from llama-cpp-python)
40.67 Downloading numpy-1.26.3-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.metadata (62 kB)
40.67 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 62.7/62.7 kB 24.8 MB/s eta 0:00:00
41.52 Collecting diskcache>=5.6.1 (from llama-cpp-python)
41.54 Downloading diskcache-5.6.3-py3-none-any.whl.metadata (20 kB)
41.56 Downloading diskcache-5.6.3-py3-none-any.whl (45 kB)
41.57 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 45.5/45.5 kB 26.1 MB/s eta 0:00:00
41.59 Downloading numpy-1.26.3-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (14.2 MB)
43.16 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.2/14.2 MB 8.6 MB/s eta 0:00:00
43.18 Downloading typing_extensions-4.9.0-py3-none-any.whl (32 kB)
43.20 Building wheels for collected packages: llama-cpp-python
43.20 Building wheel for llama-cpp-python (pyproject.toml): started
57.60 Building wheel for llama-cpp-python (pyproject.toml): finished with status 'error'
57.62 error: subprocess-exited-with-error
57.62
57.62 × Building wheel for llama-cpp-python (pyproject.toml) did not run successfully.
57.62 │ exit code: 1
57.62 ╰─> [77 lines of output]
57.62 *** scikit-build-core 0.7.1 using CMake 3.28.1 (wheel)
57.62 *** Configuring CMake...
57.62 loading initial cache file /tmp/tmp3fkbwnax/build/CMakeInit.txt
57.62 -- The C compiler identification is GNU 11.3.0
57.62 -- The CXX compiler identification is GNU 11.3.0
57.62 -- Detecting C compiler ABI info
57.62 -- Detecting C compiler ABI info - done
57.62 -- Check for working C compiler: /usr/bin/cc - skipped
57.62 -- Detecting C compile features
57.62 -- Detecting C compile features - done
57.62 -- Detecting CXX compiler ABI info
57.62 -- Detecting CXX compiler ABI info - done
57.62 -- Check for working CXX compiler: /usr/bin/c++ - skipped
57.62 -- Detecting CXX compile features
57.62 -- Detecting CXX compile features - done
57.62 -- Found Git: /usr/bin/git (found version "2.34.1")
57.62 -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
57.62 -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
57.62 -- Found Threads: TRUE
57.62 -- Found CUDAToolkit: /usr/local/cuda/targets/sbsa-linux/include (found version "12.1.105")
57.62 -- cuBLAS found
57.62 -- The CUDA compiler identification is NVIDIA 12.1.105
57.62 -- Detecting CUDA compiler ABI info
57.62 -- Detecting CUDA compiler ABI info - done
57.62 -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
57.62 -- Detecting CUDA compile features
57.62 -- Detecting CUDA compile features - done
57.62 -- Using CUDA architectures: 52;61;70
57.62 -- CUDA host compiler is GNU 11.3.0
57.62
57.62 -- CMAKE_SYSTEM_PROCESSOR: aarch64
57.62 -- ARM detected
57.62 -- Performing Test COMPILER_SUPPORTS_FP16_FORMAT_I3E
57.62 -- Performing Test COMPILER_SUPPORTS_FP16_FORMAT_I3E - Failed
57.62 CMake Warning (dev) at CMakeLists.txt:21 (install):
57.62 Target llama has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
57.62 This warning is for project developers. Use -Wno-dev to suppress it.
57.62
57.62 CMake Warning (dev) at CMakeLists.txt:30 (install):
57.62 Target llama has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
57.62 This warning is for project developers. Use -Wno-dev to suppress it.
57.62
57.62 -- Configuring done (2.9s)
57.62 -- Generating done (0.0s)
57.62 -- Build files have been written to: /tmp/tmp3fkbwnax/build
57.62 *** Building project with Ninja...
57.62 Change Dir: '/tmp/tmp3fkbwnax/build'
57.62
57.62 Run Build Command(s): /tmp/pip-build-env-mb7_4c15/normal/local/lib/python3.10/dist-packages/ninja/data/bin/ninja -v
57.62 [1/23] cd /tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp && /tmp/pip-build-env-mb7_4c15/normal/local/lib/python3.10/dist-packages/cmake/data/bin/cmake -DMSVC= -DCMAKE_C_COMPILER_VERSION=11.3.0 -DCMAKE_C_COMPILER_ID=GNU -DCMAKE_VS_PLATFORM_NAME= -DCMAKE_C_COMPILER=/usr/bin/cc -P /tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/common/../scripts/gen-build-info-cpp.cmake
57.62 -- Found Git: /usr/bin/git (found version "2.34.1")
57.62 [2/23] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -O3 -DNDEBUG -std=gnu++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Wno-array-bounds -Wno-format-truncation -Wextra-semi -MD -MT vendor/llama.cpp/common/CMakeFiles/build_info.dir/build-info.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/build_info.dir/build-info.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/build_info.dir/build-info.cpp.o -c /tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/common/build-info.cpp
57.62 [3/23] /usr/bin/cc -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/. -isystem /usr/local/cuda/targets/sbsa-linux/include -O3 -DNDEBUG -std=gnu11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o.d -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o -c /tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/ggml-alloc.c
57.62 [4/23] /usr/bin/c++ -DLLAMA_BUILD -DLLAMA_SHARED -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/examples/llava/. -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/examples/llava/../.. -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/examples/llava/../../common -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/. -isystem /usr/local/cuda/targets/sbsa-linux/include -O3 -DNDEBUG -fPIC -Wno-cast-qual -MD -MT vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/llava.cpp.o -MF vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/llava.cpp.o.d -o vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/llava.cpp.o -c /tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/examples/llava/llava.cpp
57.62 [5/23] /usr/bin/cc -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/. -isystem /usr/local/cuda/targets/sbsa-linux/include -O3 -DNDEBUG -std=gnu11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-backend.c.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-backend.c.o.d -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-backend.c.o -c /tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/ggml-backend.c
57.62 [6/23] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/common/. -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Wno-array-bounds -Wno-format-truncation -Wextra-semi -MD -MT vendor/llama.cpp/common/CMakeFiles/common.dir/console.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/common.dir/console.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/common.dir/console.cpp.o -c /tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/common/console.cpp
57.62 [7/23] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/common/. -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Wno-array-bounds -Wno-format-truncation -Wextra-semi -MD -MT vendor/llama.cpp/common/CMakeFiles/common.dir/sampling.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/common.dir/sampling.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/common.dir/sampling.cpp.o -c /tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/common/sampling.cpp
57.62 [8/23] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/common/. -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Wno-array-bounds -Wno-format-truncation -Wextra-semi -MD -MT vendor/llama.cpp/common/CMakeFiles/common.dir/grammar-parser.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/common.dir/grammar-parser.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/common.dir/grammar-parser.cpp.o -c /tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/common/grammar-parser.cpp
57.62 [9/23] /usr/bin/c++ -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/common/. -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/. -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/examples/llava/. -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/examples/llava/../.. -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/examples/llava/../../common -O3 -DNDEBUG -MD -MT vendor/llama.cpp/examples/llava/CMakeFiles/llava-cli.dir/llava-cli.cpp.o -MF vendor/llama.cpp/examples/llava/CMakeFiles/llava-cli.dir/llava-cli.cpp.o.d -o vendor/llama.cpp/examples/llava/CMakeFiles/llava-cli.dir/llava-cli.cpp.o -c /tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/examples/llava/llava-cli.cpp
57.62 [10/23] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/. -isystem /usr/local/cuda/targets/sbsa-linux/include -O3 -DNDEBUG -std=c++11 "--generate-code=arch=compute_52,code=[compute_52,sm_52]" "--generate-code=arch=compute_61,code=[compute_61,sm_61]" "--generate-code=arch=compute_70,code=[compute_70,sm_70]" -Xcompiler=-fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -use_fast_math -Wno-pedantic -Xcompiler "-Wno-array-bounds -Wno-format-truncation -Wextra-semi" -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o.d -x cu -c /tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/ggml-cuda.cu -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o
57.62 FAILED: vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o
57.62 /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/. -isystem /usr/local/cuda/targets/sbsa-linux/include -O3 -DNDEBUG -std=c++11 "--generate-code=arch=compute_52,code=[compute_52,sm_52]" "--generate-code=arch=compute_61,code=[compute_61,sm_61]" "--generate-code=arch=compute_70,code=[compute_70,sm_70]" -Xcompiler=-fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -use_fast_math -Wno-pedantic -Xcompiler "-Wno-array-bounds -Wno-format-truncation -Wextra-semi" -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o.d -x cu -c /tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/ggml-cuda.cu -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o
57.62 /tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/ggml.h(309): error: identifier "half" is undefined
57.62 typedef half ggml_fp16_t;
57.62 ^
57.62
57.62 1 error detected in the compilation of "/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/ggml-cuda.cu".
57.62 [11/23] /usr/bin/cc -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/. -isystem /usr/local/cuda/targets/sbsa-linux/include -O3 -DNDEBUG -std=gnu11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-quants.c.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-quants.c.o.d -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-quants.c.o -c /tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/ggml-quants.c
57.62 [12/23] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/common/. -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Wno-array-bounds -Wno-format-truncation -Wextra-semi -MD -MT vendor/llama.cpp/common/CMakeFiles/common.dir/train.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/common.dir/train.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/common.dir/train.cpp.o -c /tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/common/train.cpp
57.62 [13/23] /usr/bin/cc -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/. -isystem /usr/local/cuda/targets/sbsa-linux/include -O3 -DNDEBUG -std=gnu11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o.d -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o -c /tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/ggml.c
57.62 [14/23] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/common/. -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Wno-array-bounds -Wno-format-truncation -Wextra-semi -MD -MT vendor/llama.cpp/common/CMakeFiles/common.dir/common.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/common.dir/common.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/common.dir/common.cpp.o -c /tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/common/common.cpp
57.62 [15/23] /usr/bin/c++ -DLLAMA_BUILD -DLLAMA_SHARED -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/examples/llava/. -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/examples/llava/../.. -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/examples/llava/../../common -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/. -isystem /usr/local/cuda/targets/sbsa-linux/include -O3 -DNDEBUG -fPIC -Wno-cast-qual -MD -MT vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/clip.cpp.o -MF vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/clip.cpp.o.d -o vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/clip.cpp.o -c /tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/examples/llava/clip.cpp
57.62 [16/23] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -DLLAMA_BUILD -DLLAMA_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dllama_EXPORTS -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/. -isystem /usr/local/cuda/targets/sbsa-linux/include -O3 -DNDEBUG -std=gnu++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Wno-array-bounds -Wno-format-truncation -Wextra-semi -MD -MT vendor/llama.cpp/CMakeFiles/llama.dir/llama.cpp.o -MF vendor/llama.cpp/CMakeFiles/llama.dir/llama.cpp.o.d -o vendor/llama.cpp/CMakeFiles/llama.dir/llama.cpp.o -c /tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/llama.cpp
57.62 ninja: build stopped: subcommand failed.
57.62
57.62
57.62 *** CMake build failed
57.62 [end of output]
57.62
57.62 note: This error originates from a subprocess, and is likely not a problem with pip.
57.62 ERROR: Failed building wheel for llama-cpp-python
57.62 Failed to build llama-cpp-python
57.62 ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects
------
ERROR: failed to solve: executor failed running [/bin/sh -c CMAKE_ARGS='-DLLAMA_CUBLAS=on' pip install --force-reinstall --no-cache-dir llama-cpp-python]: exit code: 1
make: *** [build-test] Error 1