GPU config help #1058
thebrahman
started this conversation in
General
GPU config help
#1058
Replies: 1 comment 2 replies
-
That yaml file would not work, it needs to be formatted like - https://localai.io/howtos/easy-model-import-downloaded/ |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am struggling to get models to run on my 4090. my OS is windows and I am running docker. It recognises my gpu, but doesn't offload any layers.
I followed this to setup:
https://localai.io/howtos/easy-setup-docker-gpu/
have this yaml file in the models folder:
terminal:
2023-09-15 00:05:22 localai-api-1 | 2:05PM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:<nil>} sizeCache:0 unknownFields:[] Model:llama-2-7b-chat.ggmlv3.q4_K_M.bin ContextSize:512 Seed:0 NBatch:512 F16Memory:false MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:0 MainGPU: TensorSplit: Threads:2 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/models/llama-2-7b-chat.ggmlv3.q4_K_M.bin Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 Tokenizer: LoraBase: LoraAdapter: NoMulMatQ:false AudioPath:}
2023-09-15 00:05:22 localai-api-1 | 2:05PM DBG GRPC(llama-2-7b-chat.ggmlv3.q4_K_M.bin-127.0.0.1:43703): stderr ggml_init_cublas: found 1 CUDA devices: 2023-09-15 00:05:22 localai-api-1 | 2:05PM DBG GRPC(llama-2-7b-chat.ggmlv3.q4_K_M.bin-127.0.0.1:43703): stderr Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9
03): stderr llama_model_load_internal: ggml ctx size = 3891.33 MB 2023-09-15 00:05:24 localai-api-1 | 2:05PM DBG GRPC(llama-2-7b-chat.ggmlv3.q4_K_M.bin-127.0.0.1:43703): stderr WARNING: failed to allocate 3891.33 MB of pinned memory: out of memory 2023-09-15 00:05:24 localai-api-1 | 2:05PM DBG GRPC(llama-2-7b-chat.ggmlv3.q4_K_M.bin-127.0.0.1:43703): stderr llama_model_load_internal: using CUDA for GPU acceleration 2023-09-15 00:05:24 localai-api-1 | 2:05PM DBG GRPC(llama-2-7b-chat.ggmlv3.q4_K_M.bin-127.0.0.1:43703): stderr llama_model_load_internal: mem required = 4193.33 MB (+ 512.00 MB per state) 2023-09-15 00:05:24 localai-api-1 | 2:05PM DBG GRPC(llama-2-7b-chat.ggmlv3.q4_K_M.bin-127.0.0.1:43703): stderr llama_model_load_internal: offloading 0 repeating layers to GPU 2023-09-15 00:05:24 localai-api-1 | 2:05PM DBG GRPC(llama-2-7b-chat.ggmlv3.q4_K_M.bin-127.0.0.1:43703): stderr llama_model_load_internal: offloaded 0/35 layers to GPU 2023-09-15 00:05:24 localai-api-1 | 2:05PM DBG GRPC(llama-2-7b-chat.ggmlv3.q4_K_M.bin-127.0.0.1:43703): stderr llama_model_load_internal: total VRAM used: 288 MB 2023-09-15 00:05:29 localai-api-1 | 2:05PM DBG GRPC(llama-2-7b-chat.ggmlv3.q4_K_M.bin-127.0.0.1:43703): stderr llama_new_context_with_model: kv self size = 512.00 MB
Beta Was this translation helpful? Give feedback.
All reactions