cmake build error: identifier "half" is undefined with version 0.2.29

# Prerequisites

Please answer the following questions for yourself before submitting an issue.

- [x] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [x] I carefully followed the [README.md](https://github.com/abetlen/llama-cpp-python/blob/main/README.md).
- [x] I [searched using keywords relevant to my issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/filtering-and-searching-issues-and-pull-requests) to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the [Discussions](https://github.com/abetlen/llama-cpp-python/discussions), and have a new bug or useful enhancement to share.

# Expected Behavior

`llama-cpp-python` can be built with cmake

# Current Behavior

building `llama-cpp-python` with cmake throws an error.

# Environment and Context

Docker container


# Failure Information (for bugs)

# Steps to Reproduce

running `CMAKE_ARGS='-DLLAMA_CUBLAS=on' pip install --force-reinstall --no-cache-dir llama-cpp-python` (currently latest  version 0.2.29)

Can be solved by pinning to an earlier version: 

`CMAKE_ARGS='-DLLAMA_CUBLAS=on' pip install --force-reinstall --no-cache-dir llama-cpp-python==0.2.27`

# Failure Logs

```

 => ERROR [7/8] RUN CMAKE_ARGS='-DLLAMA_CUBLAS=on' pip install --force-reinstall --no-cache-dir llama-cpp-python                                                                                                               58.8s
------                                                                                                                                                                                                                               
 > [7/8] RUN CMAKE_ARGS='-DLLAMA_CUBLAS=on' pip install --force-reinstall --no-cache-dir llama-cpp-python:                                                                                                                           
0.865 Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com                                                                                                                                                       
2.003 Collecting llama-cpp-python                                                                                                                                                                                                    
2.160   Downloading llama_cpp_python-0.2.29.tar.gz (9.5 MB)                                                                                                                                                                          
2.887      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 9.5/9.5 MB 13.1 MB/s eta 0:00:00                                                                                                                                                 
3.241   Installing build dependencies: started
10.39   Installing build dependencies: finished with status 'done'
10.39   Getting requirements to build wheel: started
10.51   Getting requirements to build wheel: finished with status 'done'
10.51   Installing backend dependencies: started
38.70   Installing backend dependencies: finished with status 'done'
38.70   Preparing metadata (pyproject.toml): started
38.80   Preparing metadata (pyproject.toml): finished with status 'done'
39.70 Collecting typing-extensions>=4.5.0 (from llama-cpp-python)
39.72   Downloading typing_extensions-4.9.0-py3-none-any.whl.metadata (3.0 kB)
40.65 Collecting numpy>=1.20.0 (from llama-cpp-python)
40.67   Downloading numpy-1.26.3-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.metadata (62 kB)
40.67      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 62.7/62.7 kB 24.8 MB/s eta 0:00:00
41.52 Collecting diskcache>=5.6.1 (from llama-cpp-python)
41.54   Downloading diskcache-5.6.3-py3-none-any.whl.metadata (20 kB)
41.56 Downloading diskcache-5.6.3-py3-none-any.whl (45 kB)
41.57    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 45.5/45.5 kB 26.1 MB/s eta 0:00:00
41.59 Downloading numpy-1.26.3-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (14.2 MB)
43.16    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.2/14.2 MB 8.6 MB/s eta 0:00:00
43.18 Downloading typing_extensions-4.9.0-py3-none-any.whl (32 kB)
43.20 Building wheels for collected packages: llama-cpp-python
43.20   Building wheel for llama-cpp-python (pyproject.toml): started
57.60   Building wheel for llama-cpp-python (pyproject.toml): finished with status 'error'
57.62   error: subprocess-exited-with-error
57.62   
57.62   × Building wheel for llama-cpp-python (pyproject.toml) did not run successfully.
57.62   │ exit code: 1
57.62   ╰─> [77 lines of output]
57.62       *** scikit-build-core 0.7.1 using CMake 3.28.1 (wheel)
57.62       *** Configuring CMake...
57.62       loading initial cache file /tmp/tmp3fkbwnax/build/CMakeInit.txt
57.62       -- The C compiler identification is GNU 11.3.0
57.62       -- The CXX compiler identification is GNU 11.3.0
57.62       -- Detecting C compiler ABI info
57.62       -- Detecting C compiler ABI info - done
57.62       -- Check for working C compiler: /usr/bin/cc - skipped
57.62       -- Detecting C compile features
57.62       -- Detecting C compile features - done
57.62       -- Detecting CXX compiler ABI info
57.62       -- Detecting CXX compiler ABI info - done
57.62       -- Check for working CXX compiler: /usr/bin/c++ - skipped
57.62       -- Detecting CXX compile features
57.62       -- Detecting CXX compile features - done
57.62       -- Found Git: /usr/bin/git (found version "2.34.1")
57.62       -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
57.62       -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
57.62       -- Found Threads: TRUE
57.62       -- Found CUDAToolkit: /usr/local/cuda/targets/sbsa-linux/include (found version "12.1.105")
57.62       -- cuBLAS found
57.62       -- The CUDA compiler identification is NVIDIA 12.1.105
57.62       -- Detecting CUDA compiler ABI info
57.62       -- Detecting CUDA compiler ABI info - done
57.62       -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
57.62       -- Detecting CUDA compile features
57.62       -- Detecting CUDA compile features - done
57.62       -- Using CUDA architectures: 52;61;70
57.62       -- CUDA host compiler is GNU 11.3.0
57.62       
57.62       -- CMAKE_SYSTEM_PROCESSOR: aarch64
57.62       -- ARM detected
57.62       -- Performing Test COMPILER_SUPPORTS_FP16_FORMAT_I3E
57.62       -- Performing Test COMPILER_SUPPORTS_FP16_FORMAT_I3E - Failed
57.62       CMake Warning (dev) at CMakeLists.txt:21 (install):
57.62         Target llama has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
57.62       This warning is for project developers.  Use -Wno-dev to suppress it.
57.62       
57.62       CMake Warning (dev) at CMakeLists.txt:30 (install):
57.62         Target llama has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
57.62       This warning is for project developers.  Use -Wno-dev to suppress it.
57.62       
57.62       -- Configuring done (2.9s)
57.62       -- Generating done (0.0s)
57.62       -- Build files have been written to: /tmp/tmp3fkbwnax/build
57.62       *** Building project with Ninja...
57.62       Change Dir: '/tmp/tmp3fkbwnax/build'
57.62       
57.62       Run Build Command(s): /tmp/pip-build-env-mb7_4c15/normal/local/lib/python3.10/dist-packages/ninja/data/bin/ninja -v
57.62       [1/23] cd /tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp && /tmp/pip-build-env-mb7_4c15/normal/local/lib/python3.10/dist-packages/cmake/data/bin/cmake -DMSVC= -DCMAKE_C_COMPILER_VERSION=11.3.0 -DCMAKE_C_COMPILER_ID=GNU -DCMAKE_VS_PLATFORM_NAME= -DCMAKE_C_COMPILER=/usr/bin/cc -P /tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/common/../scripts/gen-build-info-cpp.cmake
57.62       -- Found Git: /usr/bin/git (found version "2.34.1")
57.62       [2/23] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600  -O3 -DNDEBUG -std=gnu++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Wno-array-bounds -Wno-format-truncation -Wextra-semi -MD -MT vendor/llama.cpp/common/CMakeFiles/build_info.dir/build-info.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/build_info.dir/build-info.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/build_info.dir/build-info.cpp.o -c /tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/common/build-info.cpp
57.62       [3/23] /usr/bin/cc -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/. -isystem /usr/local/cuda/targets/sbsa-linux/include -O3 -DNDEBUG -std=gnu11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o.d -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o -c /tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/ggml-alloc.c
57.62       [4/23] /usr/bin/c++ -DLLAMA_BUILD -DLLAMA_SHARED -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/examples/llava/. -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/examples/llava/../.. -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/examples/llava/../../common -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/. -isystem /usr/local/cuda/targets/sbsa-linux/include -O3 -DNDEBUG -fPIC -Wno-cast-qual -MD -MT vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/llava.cpp.o -MF vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/llava.cpp.o.d -o vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/llava.cpp.o -c /tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/examples/llava/llava.cpp
57.62       [5/23] /usr/bin/cc -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/. -isystem /usr/local/cuda/targets/sbsa-linux/include -O3 -DNDEBUG -std=gnu11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-backend.c.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-backend.c.o.d -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-backend.c.o -c /tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/ggml-backend.c
57.62       [6/23] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/common/. -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Wno-array-bounds -Wno-format-truncation -Wextra-semi -MD -MT vendor/llama.cpp/common/CMakeFiles/common.dir/console.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/common.dir/console.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/common.dir/console.cpp.o -c /tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/common/console.cpp
57.62       [7/23] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/common/. -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Wno-array-bounds -Wno-format-truncation -Wextra-semi -MD -MT vendor/llama.cpp/common/CMakeFiles/common.dir/sampling.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/common.dir/sampling.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/common.dir/sampling.cpp.o -c /tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/common/sampling.cpp
57.62       [8/23] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/common/. -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Wno-array-bounds -Wno-format-truncation -Wextra-semi -MD -MT vendor/llama.cpp/common/CMakeFiles/common.dir/grammar-parser.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/common.dir/grammar-parser.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/common.dir/grammar-parser.cpp.o -c /tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/common/grammar-parser.cpp
57.62       [9/23] /usr/bin/c++  -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/common/. -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/. -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/examples/llava/. -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/examples/llava/../.. -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/examples/llava/../../common -O3 -DNDEBUG -MD -MT vendor/llama.cpp/examples/llava/CMakeFiles/llava-cli.dir/llava-cli.cpp.o -MF vendor/llama.cpp/examples/llava/CMakeFiles/llava-cli.dir/llava-cli.cpp.o.d -o vendor/llama.cpp/examples/llava/CMakeFiles/llava-cli.dir/llava-cli.cpp.o -c /tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/examples/llava/llava-cli.cpp
57.62       [10/23] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/. -isystem /usr/local/cuda/targets/sbsa-linux/include -O3 -DNDEBUG -std=c++11 "--generate-code=arch=compute_52,code=[compute_52,sm_52]" "--generate-code=arch=compute_61,code=[compute_61,sm_61]" "--generate-code=arch=compute_70,code=[compute_70,sm_70]" -Xcompiler=-fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -use_fast_math -Wno-pedantic -Xcompiler "-Wno-array-bounds -Wno-format-truncation -Wextra-semi" -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o.d -x cu -c /tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/ggml-cuda.cu -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o
57.62       FAILED: vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o
57.62       /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/. -isystem /usr/local/cuda/targets/sbsa-linux/include -O3 -DNDEBUG -std=c++11 "--generate-code=arch=compute_52,code=[compute_52,sm_52]" "--generate-code=arch=compute_61,code=[compute_61,sm_61]" "--generate-code=arch=compute_70,code=[compute_70,sm_70]" -Xcompiler=-fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -use_fast_math -Wno-pedantic -Xcompiler "-Wno-array-bounds -Wno-format-truncation -Wextra-semi" -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o.d -x cu -c /tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/ggml-cuda.cu -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-cuda.cu.o
57.62       /tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/ggml.h(309): error: identifier "half" is undefined
57.62             typedef half ggml_fp16_t;
57.62                     ^
57.62       
57.62       1 error detected in the compilation of "/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/ggml-cuda.cu".
57.62       [11/23] /usr/bin/cc -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/. -isystem /usr/local/cuda/targets/sbsa-linux/include -O3 -DNDEBUG -std=gnu11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-quants.c.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-quants.c.o.d -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml-quants.c.o -c /tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/ggml-quants.c
57.62       [12/23] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/common/. -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Wno-array-bounds -Wno-format-truncation -Wextra-semi -MD -MT vendor/llama.cpp/common/CMakeFiles/common.dir/train.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/common.dir/train.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/common.dir/train.cpp.o -c /tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/common/train.cpp
57.62       [13/23] /usr/bin/cc -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/. -isystem /usr/local/cuda/targets/sbsa-linux/include -O3 -DNDEBUG -std=gnu11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -MD -MT vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o -MF vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o.d -o vendor/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o -c /tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/ggml.c
57.62       [14/23] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/common/. -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/. -O3 -DNDEBUG -std=gnu++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Wno-array-bounds -Wno-format-truncation -Wextra-semi -MD -MT vendor/llama.cpp/common/CMakeFiles/common.dir/common.cpp.o -MF vendor/llama.cpp/common/CMakeFiles/common.dir/common.cpp.o.d -o vendor/llama.cpp/common/CMakeFiles/common.dir/common.cpp.o -c /tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/common/common.cpp
57.62       [15/23] /usr/bin/c++ -DLLAMA_BUILD -DLLAMA_SHARED -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/examples/llava/. -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/examples/llava/../.. -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/examples/llava/../../common -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/. -isystem /usr/local/cuda/targets/sbsa-linux/include -O3 -DNDEBUG -fPIC -Wno-cast-qual -MD -MT vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/clip.cpp.o -MF vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/clip.cpp.o.d -o vendor/llama.cpp/examples/llava/CMakeFiles/llava.dir/clip.cpp.o -c /tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/examples/llava/clip.cpp
57.62       [16/23] /usr/bin/c++ -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -DLLAMA_BUILD -DLLAMA_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dllama_EXPORTS -I/tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/. -isystem /usr/local/cuda/targets/sbsa-linux/include -O3 -DNDEBUG -std=gnu++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Wno-array-bounds -Wno-format-truncation -Wextra-semi -MD -MT vendor/llama.cpp/CMakeFiles/llama.dir/llama.cpp.o -MF vendor/llama.cpp/CMakeFiles/llama.dir/llama.cpp.o.d -o vendor/llama.cpp/CMakeFiles/llama.dir/llama.cpp.o -c /tmp/pip-install-6r2a80i4/llama-cpp-python_046dbbf33d364353b4b1039cd6021ed3/vendor/llama.cpp/llama.cpp
57.62       ninja: build stopped: subcommand failed.
57.62       
57.62       
57.62       *** CMake build failed
57.62       [end of output]
57.62   
57.62   note: This error originates from a subprocess, and is likely not a problem with pip.
57.62   ERROR: Failed building wheel for llama-cpp-python
57.62 Failed to build llama-cpp-python
57.62 ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects
------
ERROR: failed to solve: executor failed running [/bin/sh -c CMAKE_ARGS='-DLLAMA_CUBLAS=on' pip install --force-reinstall --no-cache-dir llama-cpp-python]: exit code: 1
make: *** [build-test] Error 1

```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cmake build error: identifier "half" is undefined with version 0.2.29 #1100

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Failure Logs

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

cmake build error: identifier "half" is undefined with version 0.2.29 #1100

Description

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Failure Logs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions