Eval bug: Segfault in `ggml_compute_forward_dup_bytes`

### Name and Version

Current master, 80a02aa8588ef167d616f76f1781b104c245ace0
```
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
version: 4850 (ea002810)
built with cc (GCC) 14.2.1 20250128 for x86_64-pc-linux-gnu
```

### Operating systems

Linux

### GGML backends

CPU, CUDA

### Hardware

NVIDIA RTX 3060 12 GB VRAM and an AMD Ryzen 9 7900

### Models

Mistral Small 3, quantized to Q4_K_M

### Problem description & steps to reproduce

llama-server segfaults in `ggml_compute_forward_dup_same_cont` (`__memcpy_avx512_unaligned_erms`) after a couple of concurrent inputs, when `--parallel 4` is passed. This does not happen when parallel processing is disabled (remove the flag)

### First Bad Commit

_No response_

### Relevant log output

```shell
$ gdb --args ./build/bin/llama-server -m /ollama/data/ollama/models/blobs/sha256-102a747c137683e81d431dab05d8f2158df4ab6f162f8f9019425a43d51e0e9f --port 8080 -ngl 30  --temp 0.15 -c 20000  -ctk q4_0 -ctv q4_0 -t 12 --batch-size 512 -fa --grammar-file grammar.gbnf --n-predict 100 --no-context-shift  --parallel 4
__memcpy_avx512_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:461
461             VMOVU   -VEC_SIZE(%rsi, %rdx), %VMM(5)
(gdb) bt
#0  __memcpy_avx512_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:461
#1  0x00007ffff78ef70e in ggml_compute_forward_dup_same_cont (params=0x7ffefffd77b0, dst=0x55555a175410)
    at /home/tmp/llamacpp/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3120
#2  0x00007ffff78f40d8 in ggml_compute_forward_dup_bytes (params=0x7ffefffd77b0, dst=0x55555a175410)
    at /home/tmp/llamacpp/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:4067
#3  0x00007ffff78f501e in ggml_compute_forward_dup (params=0x7ffefffd77b0, dst=0x55555a175410)
    at /home/tmp/llamacpp/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:4260
#4  0x00007ffff790c3e8 in ggml_compute_forward_cpy (params=0x7ffefffd77b0, dst=0x55555a175410)
    at /home/tmp/llamacpp/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:9611
#5  0x00007ffff791f93a in ggml_compute_forward (params=0x7ffefffd77b0, tensor=0x55555a175410)
    at /home/tmp/llamacpp/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:14195
#6  0x00007ffff79215d8 in ggml_graph_compute_thread (data=0x55555d8445f0)
    at /home/tmp/llamacpp/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:15203
#7  0x00007ffff7921f0c in ggml_graph_compute._omp_fn.0(void) ()
    at /home/tmp/llamacpp/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:15478
#8  0x00007ffff7f0b637 in gomp_thread_start (xdata=<optimized out>) at /usr/src/debug/gcc/gcc/libgomp/team.c:129
#9  0x00007ffff40a370a in start_thread (arg=<optimized out>) at pthread_create.c:448
#10 0x00007ffff4127aac in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
```

The lines: https://github.com/ggml-org/llama.cpp/blob/80a02aa8588ef167d616f76f1781b104c245ace0/ggml/src/ggml-cpu/ggml-cpu.c#L3120-L3123

	memcpy(
	((char ) dst->data + ie0nb0),
	((char ) src0->data + ie0nb0),
	(ie1 - ie0) * nb0);

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eval bug: Segfault in `ggml_compute_forward_dup_bytes` #12354

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval bug: Segfault in ggml_compute_forward_dup_bytes #12354

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Eval bug: Segfault in `ggml_compute_forward_dup_bytes` #12354