Skip to content

Commit 0f64857

Browse files
abhilash1910NeoZhangJianyuluoyu-intelairMengggerganov
authored
ggml : add unified SYCL backend for Intel GPUs (ggml-org#2690)
* first update for migration * update init_cublas * add debug functio, commit all help code * step 1 * step 2 * step3 add fp16, slower 31->28 * add GGML_LIST_DEVICE function * step 5 format device and print * step6, enhance error check, remove CUDA macro, enhance device id to fix none-zero id issue * support main device is non-zero * step7 add debug for code path, rm log * step 8, rename all macro & func from cuda by sycl * fix error of select non-zero device, format device list * ren ggml-sycl.hpp -> ggml-sycl.h * clear CMAKE to rm unused lib and options * correct queue: rm dtct:get_queue * add print tensor function to debug * fix error: wrong result in 658746bb26702e50f2c59c0e4ada8e9da6010481 * summary dpct definition in one header file to replace folder:dpct * refactor device log * mv dpct definition from folder dpct to ggml-sycl.h * update readme, refactor build script * fix build with sycl * set nthread=1 when sycl, increase performance * add run script, comment debug code * add ls-sycl-device tool * add ls-sycl-device, rm unused files * rm rear space * dos2unix * Update README_sycl.md * fix return type * remove sycl version from include path * restore rm code to fix hang issue * add syc and link for sycl readme * rm original sycl code before refactor * fix code err * add know issue for pvc hang issue * enable SYCL_F16 support * align pr4766 * check for sycl blas, better performance * cleanup 1 * remove extra endif * add build&run script, clean CMakefile, update guide by review comments * rename macro to intel hardware * editor config format * format fixes * format fixes * editor format fix * Remove unused headers * skip build sycl tool for other code path * replace tab by space * fix blas matmul function * fix mac build * restore hip dependency * fix conflict * ren as review comments * mv internal function to .cpp file * export funciton print_sycl_devices(), mv class dpct definition to source file * update CI/action for sycl code, fix CI error of repeat/dup * fix action ID format issue * rm unused strategy * enable llama_f16 in ci * fix conflict * fix build break on MacOS, due to CI of MacOS depend on external ggml, instead of internal ggml * fix ci cases for unsupported data type * revert unrelated changed in cuda cmake remove useless nommq fix typo of GGML_USE_CLBLAS_SYCL * revert hip cmake changes * fix indent * add prefix in func name * revert no mmq * rm cpu blas duplicate * fix no_new_line * fix src1->type==F16 bug. * pass batch offset for F16 src1 * fix batch error * fix wrong code * revert sycl checking in test-sampling * pass void as arguments of ggml_backend_sycl_print_sycl_devices * remove extra blank line in test-sampling * revert setting n_threads in sycl * implement std::isinf for icpx with fast math. * Update ci/run.sh Co-authored-by: Georgi Gerganov <[email protected]> * Update examples/sycl/run-llama2.sh Co-authored-by: Georgi Gerganov <[email protected]> * Update examples/sycl/run-llama2.sh Co-authored-by: Georgi Gerganov <[email protected]> * Update CMakeLists.txt Co-authored-by: Georgi Gerganov <[email protected]> * Update CMakeLists.txt Co-authored-by: Georgi Gerganov <[email protected]> * Update CMakeLists.txt Co-authored-by: Georgi Gerganov <[email protected]> * Update CMakeLists.txt Co-authored-by: Georgi Gerganov <[email protected]> * add copyright and MIT license declare * update the cmd example --------- Co-authored-by: jianyuzh <[email protected]> Co-authored-by: luoyu-intel <[email protected]> Co-authored-by: Meng, Hengyu <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]>
1 parent b764b8f commit 0f64857

22 files changed

+15764
-24
lines changed

.github/workflows/build.yml

+41
Original file line numberDiff line numberDiff line change
@@ -143,6 +143,47 @@ jobs:
143143
cd build
144144
ctest -L main --verbose
145145
146+
ubuntu-22-cmake-sycl:
147+
runs-on: ubuntu-22.04
148+
149+
continue-on-error: true
150+
151+
steps:
152+
- uses: actions/checkout@v2
153+
154+
- name: add oneAPI to apt
155+
shell: bash
156+
run: |
157+
cd /tmp
158+
wget https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB
159+
sudo apt-key add GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB
160+
rm GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB
161+
sudo add-apt-repository "deb https://apt.repos.intel.com/oneapi all main"
162+
163+
- name: install oneAPI dpcpp compiler
164+
shell: bash
165+
run: |
166+
sudo apt update
167+
sudo apt install intel-oneapi-compiler-dpcpp-cpp
168+
169+
- name: install oneAPI MKL library
170+
shell: bash
171+
run: |
172+
sudo apt install intel-oneapi-mkl-devel
173+
174+
- name: Clone
175+
id: checkout
176+
uses: actions/checkout@v3
177+
178+
- name: Build
179+
id: cmake_build
180+
run: |
181+
source /opt/intel/oneapi/setvars.sh
182+
mkdir build
183+
cd build
184+
cmake -DLLAMA_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx ..
185+
cmake --build . --config Release -j $(nproc)
186+
146187
# TODO: build with LLAMA_NO_METAL because test-backend-ops fail on "Apple Paravirtual device" and I don't know
147188
# how to debug it.
148189
# ref: https://github.com/ggerganov/llama.cpp/actions/runs/7131777249/job/19420981052#step:5:1124

CMakeLists.txt

+41-5
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
cmake_minimum_required(VERSION 3.14) # for add_link_options and implicit target directories.
22
project("llama.cpp" C CXX)
3+
include(CheckIncludeFileCXX)
34

45
set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
56

@@ -103,6 +104,8 @@ option(LLAMA_METAL_NDEBUG "llama: disable Metal debugging"
103104
option(LLAMA_METAL_SHADER_DEBUG "llama: compile Metal with -fno-fast-math" OFF)
104105
option(LLAMA_MPI "llama: use MPI" OFF)
105106
option(LLAMA_QKK_64 "llama: use super-block size of 64 for k-quants" OFF)
107+
option(LLAMA_SYCL "llama: use SYCL" OFF)
108+
option(LLAMA_SYCL_F16 "llama: use 16 bit floats for sycl calculations" OFF)
106109

107110
option(LLAMA_BUILD_TESTS "llama: build tests" ${LLAMA_STANDALONE})
108111
option(LLAMA_BUILD_EXAMPLES "llama: build examples" ${LLAMA_STANDALONE})
@@ -121,8 +124,12 @@ include(${CMAKE_CURRENT_SOURCE_DIR}/scripts/build-info.cmake)
121124
#
122125
# Compile flags
123126
#
127+
if (LLAMA_SYCL)
128+
set(CMAKE_CXX_STANDARD 17)
129+
else()
130+
set(CMAKE_CXX_STANDARD 11)
131+
endif()
124132

125-
set(CMAKE_CXX_STANDARD 11)
126133
set(CMAKE_CXX_STANDARD_REQUIRED true)
127134
set(CMAKE_C_STANDARD 11)
128135
set(CMAKE_C_STANDARD_REQUIRED true)
@@ -454,6 +461,32 @@ if (LLAMA_HIPBLAS)
454461
endif()
455462
endif()
456463

464+
465+
if (LLAMA_SYCL)
466+
if ( NOT DEFINED ENV{ONEAPI_ROOT})
467+
message(FATAL_ERROR "Not detect ENV {ONEAPI_ROOT}, please install oneAPI & source it, like: source /opt/intel/oneapi/setvars.sh")
468+
endif()
469+
#todo: AOT
470+
471+
find_package(IntelSYCL REQUIRED)
472+
if (LLAMA_SYCL_F16)
473+
add_compile_definitions(GGML_SYCL_F16)
474+
endif()
475+
add_compile_definitions(GGML_USE_SYCL)
476+
477+
add_compile_options(-I./) #include DPCT
478+
add_compile_options(-I/${SYCL_INCLUDE_DIR})
479+
480+
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-narrowing")
481+
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -O3")
482+
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fsycl -L${MKLROOT}/lib")
483+
484+
set(GGML_HEADERS_SYCL ggml.h ggml-sycl.h)
485+
set(GGML_SOURCES_SYCL ggml-sycl.cpp)
486+
487+
set(LLAMA_EXTRA_LIBS ${LLAMA_EXTRA_LIBS} sycl OpenCL mkl_core pthread m dl mkl_sycl_blas mkl_intel_ilp64 mkl_tbb_thread)
488+
endif()
489+
457490
function(get_flags CCID CCVER)
458491
set(C_FLAGS "")
459492
set(CXX_FLAGS "")
@@ -479,10 +512,12 @@ function(get_flags CCID CCVER)
479512
list(APPEND CXX_FLAGS -Wextra-semi)
480513
endif()
481514
elseif (CCID MATCHES "Intel")
482-
# enable max optimization level when using Intel compiler
483-
set(C_FLAGS -ipo -O3 -static -fp-model=fast -flto -fno-stack-protector)
484-
set(CXX_FLAGS -ipo -O3 -static -fp-model=fast -flto -fno-stack-protector)
485-
add_link_options(-fuse-ld=lld -static-intel)
515+
if (NOT LLAMA_SYCL)
516+
# enable max optimization level when using Intel compiler
517+
set(C_FLAGS -ipo -O3 -static -fp-model=fast -flto -fno-stack-protector)
518+
set(CXX_FLAGS -ipo -O3 -static -fp-model=fast -flto -fno-stack-protector)
519+
add_link_options(-fuse-ld=lld -static-intel)
520+
endif()
486521
endif()
487522

488523
set(GF_C_FLAGS ${C_FLAGS} PARENT_SCOPE)
@@ -799,6 +834,7 @@ add_library(ggml OBJECT
799834
${GGML_SOURCES_METAL} ${GGML_HEADERS_METAL}
800835
${GGML_SOURCES_MPI} ${GGML_HEADERS_MPI}
801836
${GGML_SOURCES_EXTRA} ${GGML_HEADERS_EXTRA}
837+
${GGML_SOURCES_SYCL} ${GGML_HEADERS_SYCL}
802838
)
803839

804840
target_include_directories(ggml PUBLIC . ${LLAMA_EXTRA_INCLUDES})

README.md

+10-1
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ The main goal of `llama.cpp` is to run the LLaMA model using 4-bit integer quant
6363
- AVX, AVX2 and AVX512 support for x86 architectures
6464
- Mixed F16 / F32 precision
6565
- 2-bit, 3-bit, 4-bit, 5-bit, 6-bit and 8-bit integer quantization support
66-
- CUDA, Metal and OpenCL GPU backend support
66+
- CUDA, Metal, OpenCL, SYCL GPU backend support
6767

6868
The original implementation of `llama.cpp` was [hacked in an evening](https://github.com/ggerganov/llama.cpp/issues/33#issuecomment-1465108022).
6969
Since then, the project has improved significantly thanks to many contributions. This project is mainly for educational purposes and serves
@@ -599,6 +599,15 @@ Building the program with BLAS support may lead to some performance improvements
599599

600600
You can get a list of platforms and devices from the `clinfo -l` command, etc.
601601

602+
- #### SYCL
603+
604+
SYCL is a higher-level programming model to improve programming productivity on various hardware accelerators.
605+
606+
llama.cpp based on SYCL is used to support Intel GPU (Data Center Max series, Flex series, Arc series, Built-in GPU and iGPU).
607+
608+
For detailed info, please refer to [llama.cpp for SYCL](README_sycl.md).
609+
610+
602611
### Prepare Data & Run
603612

604613
```bash

0 commit comments

Comments
 (0)