-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only use 4 CPU threads in P2P worker cluster #3410
Comments
Hey @titogrima - LocalAI doesn't set any thread when running in |
Hi! I checked llama.cpp repo https://github.com/ggerganov/llama.cpp but don't see any issue with this problem but if LocalAI don't set any thread in p2p mode maybe is better open issue in llama.cpp repo Thanks and sorry for my english XD!! |
Also might be worth noting that you can pass any command options of llama.cpp from LocalAI with
|
Hi! I tried this LOCALAI_EXTRA_LLAMA_CPP_ARGS=--threads 7 11:12AM INF Starting llama-cpp-rpc-server on '127.0.0.1:35291' options: And fail to boot with threads option |
Well Reading code of llama.cpp the "problem" is that in rcp-server code Thanks for your help!! |
I recompiled ggml with /build/backend/cpp/llama/llama.cpp/ggml/include/ggml.h variable GGML_DEFAULT_N_THREADS change and works Regards! |
LocalAI version:
v2.20.1-ffmpeg-core docker image for two workers and latest-aio-cpu for master
Environment, CPU architecture, OS, and Version:
Cluster P2P lab in docker machine with heteregeneus CPU AMD64 and ARM
Linux clusteria1 6.6.45-0-virt #1-Alpine SMP PREEMPT_DYNAMIC 2024-08-13 08:10:32 aarch64 Linux
8 CPU 7 GB RAM run one worker
Linux ia 6.1.0-23-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.99-1 (2024-07-15) x86_64 GNU/Linux
12 CPU 10 GB RAM run master and one worker
Describe the bug
When use P2P workers mode work fine but always use only 4 CPU to inference in each node,
I try env file with LOCALAI_THREADS=12 and --threads 12 in 12 CPU node and LOCALAI_THREADS=7 and --threads 7 in 8 CPU node
Else try THREADS variable in env file
If only run a master without workers work without any problem THREADS variable
To Reproduce
Launch a P2P worker cluster and set threads distinct from 4 threads
Expected behavior
Node use the threads defined
Logs
Logs from one worker
create_backend: using CPU backend
Starting RPC server on 127.0.0.1:37885, backend memory: 9936 MB
^C@@@@@
Skipping rebuild
@@@@@
If you are experiencing issues with the pre-compiled builds, try setting REBUILD=true
If you are still experiencing issues with the build, try setting CMAKE_ARGS and disable the instructions set as needed:
CMAKE_ARGS="-DGGML_F16C=OFF -DGGML_AVX512=OFF -DGGML_AVX2=OFF -DGGML_FMA=OFF"
see the documentation at: https://localai.io/basics/build/index.html
Note: See also #288
@@@@@
CPU info:
model name : AMD Ryzen 9 5900X 12-Core Processor
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw perfctr_core ssbd ibrs ibpb stibp vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves clzero xsaveerptr wbnoinvd arat npt lbrv nrip_save tsc_scale vmcb_clean flushbyasid pausefilter pfthreshold v_vmsave_vmload vgif umip pku ospke vaes vpclmulqdq rdpid fsrm arch_capabilities
CPU: AVX found OK
CPU: AVX2 found OK
CPU: no AVX512 found
@@@@@
9:22PM INF env file found, loading environment variables from file envFile=.env
9:22PM DBG Setting logging to debug
9:22PM DBG Extracting backend assets files to /tmp/localai/backend_data
{"level":"INFO","time":"2024-08-26T21:22:10.977Z","caller":"config/config.go:288","message":"connmanager disabled\n"}
{"level":"INFO","time":"2024-08-26T21:22:10.977Z","caller":"config/config.go:292","message":" go-libp2p resource manager protection disabled"}
9:22PM INF Starting llama-cpp-rpc-server on '127.0.0.1:34015'
{"level":"INFO","time":"2024-08-26T21:22:10.978Z","caller":"node/node.go:118","message":" Starting EdgeVPN network"}
create_backend: using CPU backend
Starting RPC server on 127.0.0.1:34015, backend memory: 9936 MB
2024/08/26 21:22:10 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 7168 kiB, got: 416 kiB). See https://github.com/quic-go/quic-go/wiki/UDP-Buffer-Sizes for details.
{"level":"INFO","time":"2024-08-26T21:22:10.987Z","caller":"node/node.go:172","message":" Node ID: 12D3KooWFvq7aNHpre5tyQDZN9Gn2tZh84E3Vf9tfBuCmB5ULJSB"}
{"level":"INFO","time":"2024-08-26T21:22:10.987Z","caller":"node/node.go:173","message":" Node Addresses: [/ip4/127.0.0.1/tcp/41065 /ip4/127.0.0.1/udp/43346/quic-v1/webtransport/certhash/uEiA46crpiIhxfL7skSKai7WxlGHkv8mZNXzAYoogm_qhow/certhash/uEiCt91_kaygLCTKWpqX6PEOTzb617BIH7KHDTRrw_eyurw /ip4/127.0.0.1/udp/47629/webrtc-direct/certhash/uEiDbmMPnLfeQJBvFRcfp-zDNXx-_CjljBg0ia3Nr20Xs7g /ip4/127.0.0.1/udp/59911/quic-v1 /ip4/192.168.XX.XX/tcp/41065 /ip4/192.168.XX.XX/udp/43346/quic-v1/webtransport/certhash/uEiA46crpiIhxfL7skSKai7WxlGHkv8mZNXzAYoogm_qhow/certhash/uEiCt91_kaygLCTKWpqX6PEOTzb617BIH7KHDTRrw_eyurw /ip4/192.168.XX.XX/udp/47629/webrtc-direct/certhash/uEiDbmMPnLfeQJBvFRcfp-zDNXx-_CjljBg0ia3Nr20Xs7g /ip4/192.168.XX.XX/udp/59911/quic-v1 /ip6/::1/tcp/33785 /ip6/::1/udp/46892/webrtc-direct/certhash/uEiDbmMPnLfeQJBvFRcfp-zDNXx-_CjljBg0ia3Nr20Xs7g /ip6/::1/udp/49565/quic-v1/webtransport/certhash/uEiA46crpiIhxfL7skSKai7WxlGHkv8mZNXzAYoogm_qhow/certhash/uEiCt91_kaygLCTKWpqX6PEOTzb617BIH7KHDTRrw_eyurw /ip6/::1/udp/59078/quic-v1 /ip6/fda7:761c:127e:4::26/tcp/33785 /ip6/fda7:761c:127e:4::26/udp/46892/webrtc-direct/certhash/uEiDbmMPnLfeQJBvFRcfp-zDNXx-_CjljBg0ia3Nr20Xs7g /ip6/fda7:761c:XXXX:XX::XX/udp/49565/quic-v1/webtransport/certhash/uEiA46crpiIhxfL7skSKai7WxlGHkv8mZNXzAYoogm_qhow/certhash/uEiCt91_kaygLCTKWpqX6PEOTzb617BIH7KHDTRrw_eyurw /ip6/fda7:761c:XXX:XX::XX/udp/59078/quic-v1]"}
{"level":"INFO","time":"2024-08-26T21:22:10.987Z","caller":"discovery/dht.go:104","message":" Bootstrapping DHT"}
Accepted client connection, free_mem=10418868224, total_mem=10418868224
Client connection closed
Accepted client connection, free_mem=10418868224, total_mem=10418868224
Client connection closed
Accepted client connection, free_mem=10418868224, total_mem=10418868224
Client connection closed
Additional context
The text was updated successfully, but these errors were encountered: