Skip to content

llama-cpp-python bindings not working for multiple GPUs  #1310

@y6t4

Description

@y6t4

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • [Yes] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • [Yes] I carefully followed the README.md.
  • [Yes] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • [Yes] I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

I have two RTX 2070s. llama.cpp and llama-cpp-python work when using the CPU, but I want to use both GPUs to perform inference (that is, split larger models between the two). Having followed the instructions for CUDA GPU build for llama.cpp and llama-cpp-python, and having written a Python script that explicitly enables GPU usage, I expect llama-cpp-python to use the GPUs for inference.

Current Behavior

I have given detailed/extensive descriptions of the current behavior for various scenarios in this llama.cpp issue. Basically, it seems that, no matter what I do, the GPUs aren't being used.

Environment and Context

I have given the details of my setup in the aforementioned llama.cpp issue, including environment variables, terminal commands for setup, the Python script being used, full outputs, etc.

$ lscpu

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         43 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  16
  On-line CPU(s) list:   0-15
Vendor ID:               AuthenticAMD
  Model name:            AMD Ryzen 7 2700X Eight-Core Processor
    CPU family:          23
    Model:               8
    Thread(s) per core:  2
    Core(s) per socket:  8
    Socket(s):           1
    Stepping:            2
    Frequency boost:     enabled
    CPU max MHz:         3700.0000
    CPU min MHz:         2200.0000
    BogoMIPS:            7385.27
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr
                          sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl non
                         stop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 s
                         se4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy a
                         bm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb b
                         pext perfctr_llc mwaitx cpb hw_pstate ssbd ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rds
                         eed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 clzero irperf xsaveerptr arat npt
                          lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfth
                         reshold avic v_vmsave_vmload vgif overflow_recov succor smca sev sev_es
Virtualization features: 
  Virtualisation:        AMD-V
Caches (sum of all):     
  L1d:                   256 KiB (8 instances)
  L1i:                   512 KiB (8 instances)
  L2:                    4 MiB (8 instances)
  L3:                    16 MiB (2 instances)
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-15
Vulnerabilities:         
  Gather data sampling:  Not affected
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Mmio stale data:       Not affected
  Retbleed:              Mitigation; untrained return thunk; SMT vulnerable
  Spec rstack overflow:  Mitigation; Safe RET
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not a
                         ffected
  Srbds:                 Not affected
  Tsx async abort:       Not affected

$ uname -a

Linux me-System-Product-Name 6.5.0-26-generic #26~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Mar 12 10:22:43 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

  • SDK version, e.g. for Linux:
$ python3 --version
$ make --version
$ g++ --version

Python 3.10.12

GNU Make 4.3
Built for x86_64-pc-linux-gnu
Copyright (C) 1988-2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

In the aforementioned llama.cpp issue, we have so far been unable to fix the problem, and one of the respondents recommended that I open an issue here (llama-cpp-python). I would greatly appreciate it if people would help me fix this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions