Skip to content

sycl : reviewing the backend documentation #13544

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

Alcpz
Copy link
Collaborator

@Alcpz Alcpz commented May 14, 2025

Small updates to the docs and examples of the SYCL backend.
I removed the seed from the examples, as its purpose wasn't clear, raise any objections if you have them.

Also, added a mention of the SYCL docker images that are being build in the CI (in the docker.md).

@github-actions github-actions bot added documentation Improvements or additions to documentation examples SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels May 14, 2025
Copy link
Collaborator

@NeoZhangJianyu NeoZhangJianyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't remove llama2 in example guide and example/sycl/scripts.
Please don't remove "-s 0" in script.

They are important to me to track the quality of SYCL backend.

In recently, I use them to find the accuracy in the reorder feature PRs.

| GGML_SYCL_GRAPH | ON *(default)* \|OFF *(Optional)* | Enable build with [SYCL Graph extension](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc). |
| CMAKE_C_COMPILER | `icx` *(Linux)*, `icx/cl` *(Windows)* | Set `icx` compiler for SYCL code path. |
| CMAKE_CXX_COMPILER | `icpx` *(Linux)*, `icx` *(Windows)* | Set `icpx/icx` compiler for SYCL code path. |

* The FP32 codepath used to have better on quantized models but latest results show similar performance in text generation. Check both `GGML_SYCL_F16` ON and OFF to check in your system, but take into accound that FP32 reduces Prompt processing performance.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to tell info to user, which data type is recommended.
FP32 or FP16.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the description. So far all the models I've checked have the same performance in FP16 vs FP32. I'm hesitant to change the default build from FP32 to FP16 though.

@@ -250,7 +252,7 @@ sycl-ls

- **Intel GPU**

When targeting an intel GPU, the user should expect one or more level-zero devices among the available SYCL devices. Please make sure that at least one GPU is present, for instance [`level_zero:gpu`] in the sample output below:
When targeting an intel GPU, the user should expect one or more level-zero devices among the available SYCL devices. Please make sure that at least one GPU is present, for instance `[level_zero:gpu]` in the sample output below:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to convey that level zero is absolutely required, Could we maybe rephrase it to something along the lines of:
"One can check whether sycl detects a GPU via sycl-ls and should see their device being listed in the output of the same"

# Copyright (C) 2025 Intel Corporation
# SPDX-License-Identifier: MIT

export ONEAPI_DEVICE_SELECTOR="level_zero:0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason we are selecting this particularly ? Is this required ?
I would say let the runtime choose whatever it prefers

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code will choose all Intel GPUs in the PC.
In iGPU + dGPU case, the two GPUs work together will reduce the performance.

Comment on lines +110 to +111
| Intel built-in Arc GPU | Support | built-in Arc GPU in Meteor Lake, Arrow Lake, Lunar Lake |
| Intel iGPU | Support | iGPU in 13700k, 13400, i5-1250P, i7-1260P, i7-1165G7 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two lines convey more or less the same thing ?
Maybe let's remove the below one ? so that we needn't keep appending it time and again.
Also it's better to just mention the architecture than a very specific CPU model, as it will cover all the cases for that series.

Copy link
Contributor

@AD2605 AD2605 May 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also remove the word "Arc" from above, as technically B580 and Lunar lake are not a part of the Arc series.
Lets rename that column from "Intel Arc Series" to say "Intel Discrete GPUs" ?
Similarly for the one below, "Intel Built-in Arc GPU" to just "Intel iGPUs"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

List the detailed models will help use to check easily.
First line are the new iGPU based on Arc GPU. They are powerful than next line.
Next line is the old arch iGPU. Some user still hope run LLM on them. SYCL backend support it will make the ecosystem better on Intel GPU.

"Arc" is the common commercial name of Intel GPU.
B580's full name should be: Intel Arc B580.
So, here is correct.

@@ -750,7 +767,7 @@ use 1 SYCL GPUs: [0] with Max compute units:512

## Q&A

- Error: `error while loading shared libraries: libsycl.so.7: cannot open shared object file: No such file or directory`.
- Error: `error while loading shared libraries: libsycl.so.8: cannot open shared object file: No such file or directory`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Error: `error while loading shared libraries: libsycl.so.8: cannot open shared object file: No such file or directory`.
- Error: `error while loading shared libraries: libsycl.so: cannot open shared object file: No such file or directory`.

so that we needn't update with every release, just mentioning libsycl.so is sufficient

@call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" intel64 --force


.\build\bin\llama-cli.exe -m models\Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf -p %INPUT2% -n 400 -e -ngl 33
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.\build\bin\llama-cli.exe -m models\Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf -p %INPUT2% -n 400 -e -ngl 33
.\build\bin\llama-cli.exe -m models\Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf -p %INPUT2% -n 400 -e -ngl 99

Copy link
Collaborator

@NeoZhangJianyu NeoZhangJianyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Alcpz
Very thank your understanding and support recently!

I spent 3 days to finish the migration from CUDA to SYCL code and run llama.cpp on Intel GPU.
I spent another 3 weeks to remigrate the code and build up SYCL backend.
But this work was asked to merge to another PR. That's why you will see my ID in the first PR commit.

Then I spent 2 months to finish the key tasks:

  1. Support build on linux/windows by one click.
  2. Develop the scripts in example/sycl folder to quick build & try SYCL backend.
  3. Support multiple GPUs for PVCs, support iGPU+dGPU too.
  4. Draft the SYCL.md to guide.
  5. Add SYCL backend in CI.

This PR let me remember the history.

I spend spare time to maintain SYCL backend.
I can't contribute more and review more due to less and less spare time.

I hope SYCL backend be stable and available as official product, instead of POC of new technology.
I use private PCs to test and track the quality of SYCL backend. There is no powerful CI of SYCL backend, the quality is easy broken by PRs.

Review PR and discussion take more time than coding. It's not my good point.

So, next I will focus on:

  1. answer issue of SYCL backend.
  2. continue tracking the quality.
  3. other ecosystem of llama.cpp SYCL backend.
  4. support developer for SYCL backend.
  5. maybe contribute code to SYCL backend.

Thank you and other developers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation examples SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants