Documentation: CLBlast support in Kubernetes, to enable AMD and Intel iGPU #1182

lenaxia · 2023-06-05T00:07:41Z

lenaxia
Jun 5, 2023

Spawned off of #404

This is a runbook for enabling clblast in kubernetes, and can be applied to docker as well with a bit of work. This will enable AMD GPUs and Intel iGPUs.

The main steps that need to be done are:

Enable GPU Passthrough
Install OpenCL drivers and clblas
Configure GPU offloading
Set Build_Type and env variables

To jump to the end, this is a working helm release for LocalAI which contains everything except step 1: https://github.com/lenaxia/home-ops-prod/blob/5039ba39489347e2753e7a333d53664dc3f8daf7/cluster/apps/home/localai/app/helm-release.yaml

Step 1: Enable GPU passthrough (Intel iGPU)

This is done through three helm releases which combine to automatically identify what features are available on a given node and label each node accordingly. In the case of Intel iGPUs, it also enables resource requests for GPU. If you have other ways of tagging your nodes with GPU resources, then that should work too.

Step 2: Install OpenCL drivers and clblas

Reference the latest Intel OpenCL driver and installation instructions here: https://github.com/intel/compute-runtime/releases

In order to get your helm release to automatically install these drivers, you can utilize the pod lifecycle postStart option:

    lifecycle:
      postStart:
        exec:
          command:
            - /bin/bash
            - -c
            - >
              apt install libclblast-dev -y &&
              mkdir /tmp/neo &&
              cd /tmp/neo &&
              wget https://github.com/intel/intel-graphics-compiler/releases/download/igc-1.0.13700.14/intel-igc-core_1.0.13700.14_amd64.deb &&
              wget https://github.com/intel/intel-graphics-compiler/releases/download/igc-1.0.13700.14/intel-igc-opencl_1.0.13700.14_amd64.deb &&
              wget https://github.com/intel/compute-runtime/releases/download/23.13.26032.30/intel-level-zero-gpu-dbgsym_1.3.26032.30_amd64.ddeb &&
              wget https://github.com/intel/compute-runtime/releases/download/23.13.26032.30/intel-level-zero-gpu_1.3.26032.30_amd64.deb &&
              wget https://github.com/intel/compute-runtime/releases/download/23.13.26032.30/intel-opencl-icd-dbgsym_23.13.26032.30_amd64.ddeb &&
              wget https://github.com/intel/compute-runtime/releases/download/23.13.26032.30/intel-opencl-icd_23.13.26032.30_amd64.deb &&
              wget https://github.com/intel/compute-runtime/releases/download/23.13.26032.30/libigdgmm12_22.3.0_amd64.deb &&
              dpkg -i *.deb

Step 3: Configure GPU offloading

As defined here: https://github.com/go-skynet/LocalAI/blob/cdf0a6e7667e1fb3412951f078aaf017a6fd6437/api/config.go#L35, each model should contain a gpu_layers configuration that defines how many layers should be offloaded.

In the case of Vicuna, the Model Library yaml can be found here: https://raw.githubusercontent.com/go-skynet/model-gallery/main/vicuna.yaml. Under config_file, add a reference to this:

name: "vicuna"

description: |
    Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality

license: "LLaMA"
urls:
- https://github.com/lm-sys/FastChat

config_file: |
    backend: llama
    parameters:
      model: vicuna
      top_k: 80
      temperature: 0.2
      top_p: 0.7
    context_size: 1024
    template:
      completion: vicuna-completion
      chat: vicuna-chat
    gpu_layers: 32

prompt_templates:
- name: "vicuna-completion"
  content: |
      {{.Input}}

- name: "vicuna-chat"
  content: |
    Below is an instruction that describes a task. Write a response that appropriately completes the request.

    ### Instruction:
    {{.Input}}

    ### Response:

Step 4: Environment Variables

Set both BUILD_TYPE and LLAMA_CLBLAST in order to ensure that the pod is built with CLBLAST support

    env:
    - name: BUILD_TYPE
      value: clblas
    - name: LLAMA_CLBLAST
      value: 1

Run a query

In order to verify that the GPU is being used, check your pod logs, and you should see something along the following lines:

ggml_opencl: selecting platform: 'Intel(R) OpenCL HD Graphics'
ggml_opencl: selecting device: 'Intel(R) HD Graphics 530'
ggml_opencl: device FP16 support: true
llama_model_load_internal: mem required  = 10583.26 MB (+ 3216.00 MB per state)
ggml_opencl: offloading 32 layers to GPU
ggml_opencl: total VRAM used: 6655 MB

localai-bot · 2023-07-18T21:41:50Z

localai-bot
Jul 18, 2023

⚠️⚠️⚠️⚠️⚠️

Hi! I'm a bot running with LocalAI ( a crazy experiment of @mudler ) - please beware that I might hallucinate sometimes!

but.... I can also be funny or helpful 😸 and I can provide generally speaking good tips or places where to look after in the documentation or in the code based on what you wrote in the issue.

Don't engage in conversation with me,
I don't support (yet) replying!

⚠️⚠️⚠️⚠️⚠️

I hope this helps! Let me know if you have any further questions or issues.

Sources:

0 replies

nickp27 · 2023-09-04T20:14:59Z

nickp27
Sep 4, 2023

This looks amazing. I am far more familiar with docker compose in my homelab but can't seem to find an example of a compose file with clblas set up. Any tips (I see you mentioned your set up could be modified for docker)?

0 replies

Aisuko · 2023-10-17T06:07:48Z

Aisuko
Oct 17, 2023
Collaborator

I migrate the issue to discussion because it is really helpful. More people can see this in the discussion.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation: CLBlast support in Kubernetes, to enable AMD and Intel iGPU #1182

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Documentation: CLBlast support in Kubernetes, to enable AMD and Intel iGPU #1182

lenaxia Jun 5, 2023

Step 1: Enable GPU passthrough (Intel iGPU)

Step 2: Install OpenCL drivers and clblas

Step 3: Configure GPU offloading

Step 4: Environment Variables

Run a query

Replies: 3 comments

localai-bot Jul 18, 2023

⚠️⚠️⚠️⚠️⚠️

⚠️⚠️⚠️⚠️⚠️

nickp27 Sep 4, 2023

Aisuko Oct 17, 2023 Collaborator

lenaxia
Jun 5, 2023

localai-bot
Jul 18, 2023

nickp27
Sep 4, 2023

Aisuko
Oct 17, 2023
Collaborator