Skip to content

Eval bug: Segfault in ggml_compute_forward_dup_bytes #12354

Closed
@cpg314

Description

@cpg314

Name and Version

Current master, 80a02aa

ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
version: 4850 (ea002810)
built with cc (GCC) 14.2.1 20250128 for x86_64-pc-linux-gnu

Operating systems

Linux

GGML backends

CPU, CUDA

Hardware

NVIDIA RTX 3060 12 GB VRAM and an AMD Ryzen 9 7900

Models

Mistral Small 3, quantized to Q4_K_M

Problem description & steps to reproduce

llama-server segfaults in ggml_compute_forward_dup_same_cont (__memcpy_avx512_unaligned_erms) after a couple of concurrent inputs, when --parallel 4 is passed. This does not happen when parallel processing is disabled (remove the flag)

First Bad Commit

No response

Relevant log output

$ gdb --args ./build/bin/llama-server -m /ollama/data/ollama/models/blobs/sha256-102a747c137683e81d431dab05d8f2158df4ab6f162f8f9019425a43d51e0e9f --port 8080 -ngl 30  --temp 0.15 -c 20000  -ctk q4_0 -ctv q4_0 -t 12 --batch-size 512 -fa --grammar-file grammar.gbnf --n-predict 100 --no-context-shift  --parallel 4
__memcpy_avx512_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:461
461             VMOVU   -VEC_SIZE(%rsi, %rdx), %VMM(5)
(gdb) bt
#0  __memcpy_avx512_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:461
#1  0x00007ffff78ef70e in ggml_compute_forward_dup_same_cont (params=0x7ffefffd77b0, dst=0x55555a175410)
    at /home/tmp/llamacpp/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3120
#2  0x00007ffff78f40d8 in ggml_compute_forward_dup_bytes (params=0x7ffefffd77b0, dst=0x55555a175410)
    at /home/tmp/llamacpp/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:4067
#3  0x00007ffff78f501e in ggml_compute_forward_dup (params=0x7ffefffd77b0, dst=0x55555a175410)
    at /home/tmp/llamacpp/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:4260
#4  0x00007ffff790c3e8 in ggml_compute_forward_cpy (params=0x7ffefffd77b0, dst=0x55555a175410)
    at /home/tmp/llamacpp/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:9611
#5  0x00007ffff791f93a in ggml_compute_forward (params=0x7ffefffd77b0, tensor=0x55555a175410)
    at /home/tmp/llamacpp/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:14195
#6  0x00007ffff79215d8 in ggml_graph_compute_thread (data=0x55555d8445f0)
    at /home/tmp/llamacpp/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:15203
#7  0x00007ffff7921f0c in ggml_graph_compute._omp_fn.0(void) ()
    at /home/tmp/llamacpp/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:15478
#8  0x00007ffff7f0b637 in gomp_thread_start (xdata=<optimized out>) at /usr/src/debug/gcc/gcc/libgomp/team.c:129
#9  0x00007ffff40a370a in start_thread (arg=<optimized out>) at pthread_create.c:448
#10 0x00007ffff4127aac in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

The lines:

memcpy(
((char *) dst->data + ie0*nb0),
((char *) src0->data + ie0*nb0),
(ie1 - ie0) * nb0);

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions