-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Prerequisites
Please answer the following questions for yourself before submitting an issue.
- [Yes] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [Yes] I carefully followed the README.md.
- [Yes] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- [Yes] I reviewed the Discussions, and have a new bug or useful enhancement to share.
Expected Behavior
I have two RTX 2070s. llama.cpp and llama-cpp-python work when using the CPU, but I want to use both GPUs to perform inference (that is, split larger models between the two). Having followed the instructions for CUDA GPU build for llama.cpp and llama-cpp-python, and having written a Python script that explicitly enables GPU usage, I expect llama-cpp-python to use the GPUs for inference.
Current Behavior
I have given detailed/extensive descriptions of the current behavior for various scenarios in this llama.cpp issue. Basically, it seems that, no matter what I do, the GPUs aren't being used.
Environment and Context
I have given the details of my setup in the aforementioned llama.cpp issue, including environment variables, terminal commands for setup, the Python script being used, full outputs, etc.
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 43 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Vendor ID: AuthenticAMD
Model name: AMD Ryzen 7 2700X Eight-Core Processor
CPU family: 23
Model: 8
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
Stepping: 2
Frequency boost: enabled
CPU max MHz: 3700.0000
CPU min MHz: 2200.0000
BogoMIPS: 7385.27
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr
sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl non
stop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 s
se4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy a
bm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb b
pext perfctr_llc mwaitx cpb hw_pstate ssbd ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rds
eed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 clzero irperf xsaveerptr arat npt
lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfth
reshold avic v_vmsave_vmload vgif overflow_recov succor smca sev sev_es
Virtualization features:
Virtualisation: AMD-V
Caches (sum of all):
L1d: 256 KiB (8 instances)
L1i: 512 KiB (8 instances)
L2: 4 MiB (8 instances)
L3: 16 MiB (2 instances)
NUMA:
NUMA node(s): 1
NUMA node0 CPU(s): 0-15
Vulnerabilities:
Gather data sampling: Not affected
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Mmio stale data: Not affected
Retbleed: Mitigation; untrained return thunk; SMT vulnerable
Spec rstack overflow: Mitigation; Safe RET
Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Spectre v2: Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not a
ffected
Srbds: Not affected
Tsx async abort: Not affected
$ uname -a
Linux me-System-Product-Name 6.5.0-26-generic #26~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Mar 12 10:22:43 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
- SDK version, e.g. for Linux:
$ python3 --version
$ make --version
$ g++ --version
Python 3.10.12
GNU Make 4.3
Built for x86_64-pc-linux-gnu
Copyright (C) 1988-2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
In the aforementioned llama.cpp issue, we have so far been unable to fix the problem, and one of the respondents recommended that I open an issue here (llama-cpp-python). I would greatly appreciate it if people would help me fix this.