You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
version: 4850 (ea002810)
built with cc (GCC) 14.2.1 20250128 for x86_64-pc-linux-gnu
Operating systems
Linux
GGML backends
CPU, CUDA
Hardware
NVIDIA RTX 3060 12 GB VRAM and an AMD Ryzen 9 7900
Models
Mistral Small 3, quantized to Q4_K_M
Problem description & steps to reproduce
llama-server segfaults in ggml_compute_forward_dup_same_cont (__memcpy_avx512_unaligned_erms) after a couple of concurrent inputs, when --parallel 4 is passed. This does not happen when parallel processing is disabled (remove the flag)
First Bad Commit
No response
Relevant log output
$ gdb --args ./build/bin/llama-server -m /ollama/data/ollama/models/blobs/sha256-102a747c137683e81d431dab05d8f2158df4ab6f162f8f9019425a43d51e0e9f --port 8080 -ngl 30 --temp 0.15 -c 20000 -ctk q4_0 -ctv q4_0 -t 12 --batch-size 512 -fa --grammar-file grammar.gbnf --n-predict 100 --no-context-shift --parallel 4
__memcpy_avx512_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:461
461 VMOVU -VEC_SIZE(%rsi, %rdx), %VMM(5)
(gdb) bt
#0 __memcpy_avx512_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:461#1 0x00007ffff78ef70e in ggml_compute_forward_dup_same_cont (params=0x7ffefffd77b0, dst=0x55555a175410)
at /home/tmp/llamacpp/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3120
#2 0x00007ffff78f40d8 in ggml_compute_forward_dup_bytes (params=0x7ffefffd77b0, dst=0x55555a175410)
at /home/tmp/llamacpp/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:4067
#3 0x00007ffff78f501e in ggml_compute_forward_dup (params=0x7ffefffd77b0, dst=0x55555a175410)
at /home/tmp/llamacpp/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:4260
#4 0x00007ffff790c3e8 in ggml_compute_forward_cpy (params=0x7ffefffd77b0, dst=0x55555a175410)
at /home/tmp/llamacpp/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:9611
#5 0x00007ffff791f93a in ggml_compute_forward (params=0x7ffefffd77b0, tensor=0x55555a175410)
at /home/tmp/llamacpp/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:14195
#6 0x00007ffff79215d8 in ggml_graph_compute_thread (data=0x55555d8445f0)
at /home/tmp/llamacpp/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:15203
#7 0x00007ffff7921f0c in ggml_graph_compute._omp_fn.0(void) ()
at /home/tmp/llamacpp/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:15478
#8 0x00007ffff7f0b637 in gomp_thread_start (xdata=<optimized out>) at /usr/src/debug/gcc/gcc/libgomp/team.c:129#9 0x00007ffff40a370a in start_thread (arg=<optimized out>) at pthread_create.c:448#10 0x00007ffff4127aac in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
Name and Version
Current master, 80a02aa
Operating systems
Linux
GGML backends
CPU, CUDA
Hardware
NVIDIA RTX 3060 12 GB VRAM and an AMD Ryzen 9 7900
Models
Mistral Small 3, quantized to Q4_K_M
Problem description & steps to reproduce
llama-server segfaults in
ggml_compute_forward_dup_same_cont
(__memcpy_avx512_unaligned_erms
) after a couple of concurrent inputs, when--parallel 4
is passed. This does not happen when parallel processing is disabled (remove the flag)First Bad Commit
No response
Relevant log output
The lines:
llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c
Lines 3120 to 3123 in 80a02aa
The text was updated successfully, but these errors were encountered: