Skip to content

Conversation

ngxson
Copy link
Collaborator

@ngxson ngxson commented Sep 20, 2025

Ref discussion: #16095 (comment)

Also add test case in test-backend-ops

@github-actions github-actions bot added testing Everything test related examples labels Sep 20, 2025
@ngxson
Copy link
Collaborator Author

ngxson commented Sep 20, 2025

Tests passed on Metal backend:

[vision] OK:   llama-mtmd-cli ggml-org/SmolVLM-500M-Instruct-GGUF:Q8_0
[vision] OK:   llama-mtmd-cli ggml-org/SmolVLM2-2.2B-Instruct-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli ggml-org/SmolVLM2-500M-Video-Instruct-GGUF:Q8_0
[vision] OK:   llama-mtmd-cli ggml-org/gemma-3-4b-it-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli THUDM/glm-edge-v-5b-gguf:Q4_K_M
[vision] OK:   llama-mtmd-cli second-state/Llava-v1.5-7B-GGUF:Q2_K
[vision] OK:   llama-mtmd-cli cjpais/llava-1.6-mistral-7b-gguf:Q3_K_M
[vision] OK:   llama-mtmd-cli ibm-research/granite-vision-3.2-2b-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli second-state/MiniCPM-Llama3-V-2_5-GGUF:Q2_K
[vision] OK:   llama-mtmd-cli openbmb/MiniCPM-V-2_6-gguf:Q2_K
[vision] OK:   llama-mtmd-cli openbmb/MiniCPM-o-2_6-gguf:Q4_0
[vision] OK:   llama-mtmd-cli bartowski/Qwen2-VL-2B-Instruct-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli ggml-org/Qwen2.5-VL-3B-Instruct-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli ggml-org/InternVL2_5-1B-GGUF:Q8_0
[vision] OK:   llama-mtmd-cli ggml-org/InternVL3-1B-Instruct-GGUF:Q8_0
[vision] OK:   llama-mtmd-cli ggml-org/Qwen2.5-Omni-3B-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli ggml-org/LFM2-VL-450M-GGUF:Q8_0
[audio]  OK:   llama-mtmd-cli ggml-org/ultravox-v0_5-llama-3_2-1b-GGUF:Q8_0
[audio]  OK:   llama-mtmd-cli ggml-org/Qwen2.5-Omni-3B-GGUF:Q4_K_M
[audio]  OK:   llama-mtmd-cli ggml-org/Voxtral-Mini-3B-2507-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli ggml-org/pixtral-12b-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli ggml-org/Mistral-Small-3.1-24B-Instruct-2503-GGUF
[vision] OK:   llama-mtmd-cli ggml-org/Qwen2-VL-2B-Instruct-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli ggml-org/Qwen2-VL-7B-Instruct-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli ggml-org/Qwen2.5-VL-3B-Instruct-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli ggml-org/Qwen2.5-VL-7B-Instruct-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli ggml-org/InternVL3-8B-Instruct-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli ggml-org/InternVL3-14B-Instruct-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli ggml-org/Qwen2.5-Omni-7B-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli ggml-org/Kimi-VL-A3B-Thinking-2506-GGUF:Q4_K_M
[audio]  OK:   llama-mtmd-cli ggml-org/ultravox-v0_5-llama-3_1-8b-GGUF:Q4_K_M
[audio]  OK:   llama-mtmd-cli ggml-org/Qwen2.5-Omni-7B-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli ggml-org/Qwen2.5-VL-72B-Instruct-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli ggml-org/Llama-4-Scout-17B-16E-Instruct-GGUF:IQ1_S

@ngxson
Copy link
Collaborator Author

ngxson commented Sep 20, 2025

Also passed on CUDA backend:

[vision] OK:   llama-mtmd-cli ggml-org/SmolVLM-500M-Instruct-GGUF:Q8_0
[vision] OK:   llama-mtmd-cli ggml-org/SmolVLM2-2.2B-Instruct-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli ggml-org/SmolVLM2-500M-Video-Instruct-GGUF:Q8_0
[vision] OK:   llama-mtmd-cli ggml-org/gemma-3-4b-it-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli THUDM/glm-edge-v-5b-gguf:Q4_K_M
[vision] OK:   llama-mtmd-cli second-state/Llava-v1.5-7B-GGUF:Q2_K
[vision] OK:   llama-mtmd-cli cjpais/llava-1.6-mistral-7b-gguf:Q3_K_M
[vision] OK:   llama-mtmd-cli ibm-research/granite-vision-3.2-2b-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli second-state/MiniCPM-Llama3-V-2_5-GGUF:Q2_K
[vision] OK:   llama-mtmd-cli openbmb/MiniCPM-V-2_6-gguf:Q2_K
[vision] OK:   llama-mtmd-cli openbmb/MiniCPM-o-2_6-gguf:Q4_0
[vision] OK:   llama-mtmd-cli bartowski/Qwen2-VL-2B-Instruct-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli ggml-org/Qwen2.5-VL-3B-Instruct-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli ggml-org/InternVL2_5-1B-GGUF:Q8_0
[vision] OK:   llama-mtmd-cli ggml-org/InternVL3-1B-Instruct-GGUF:Q8_0
[vision] OK:   llama-mtmd-cli ggml-org/Qwen2.5-Omni-3B-GGUF:Q4_K_M
[vision] OK:   llama-mtmd-cli ggml-org/LFM2-VL-450M-GGUF:Q8_0
[audio]  OK:   llama-mtmd-cli ggml-org/ultravox-v0_5-llama-3_2-1b-GGUF:Q8_0
[audio]  OK:   llama-mtmd-cli ggml-org/Qwen2.5-Omni-3B-GGUF:Q4_K_M
[audio]  OK:   llama-mtmd-cli ggml-org/Voxtral-Mini-3B-2507-GGUF:Q4_K_M

@ngxson ngxson marked this pull request as ready for review September 20, 2025 11:55
@ngxson ngxson requested review from ggerganov and CISC September 20, 2025 11:55
@CISC
Copy link
Collaborator

CISC commented Sep 20, 2025

Hmmm, this breaks Pixtral at least, it's seeing nonsense.

@ngxson
Copy link
Collaborator Author

ngxson commented Sep 20, 2025

@CISC which backend you're using? for now I cannot test pixtral on CUDA

@theo77186
Copy link

theo77186 commented Sep 20, 2025

possibly CUDA as this PR also breaks Pixtral 12B for me, RTX 3060, CUDA backend
edit 1: CUDA test-backend-ops

Failed cases
[ROPE] NMSE = 1.564746124 > 0.000000100   ROPE(type=f32,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=0,v=0): FAIL
[ROPE] NMSE = 1.673478461 > 0.000000100   ROPE(type=f32,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=0,v=1): FAIL
[ROPE] NMSE = 1.902755476 > 0.000000100   ROPE(type=f32,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=1,v=0): FAIL
[ROPE] NMSE = 1.485322469 > 0.000000100   ROPE(type=f32,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=1,v=1): FAIL
[ROPE] NMSE = 1.681313375 > 0.000000100   ROPE(type=f16,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=0,v=0): FAIL
[ROPE] NMSE = 1.803845030 > 0.000000100   ROPE(type=f16,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=0,v=1): FAIL
[ROPE] NMSE = 1.694055747 > 0.000000100   ROPE(type=f16,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=1,v=0): FAIL
[ROPE] NMSE = 1.433777271 > 0.000000100   ROPE(type=f16,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.000000,ff=1,v=1): FAIL
[ROPE] NMSE = 1.865425286 > 0.000000100   ROPE(type=f32,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.424500,ff=0,v=0): FAIL
[ROPE] NMSE = 2.093839867 > 0.000000100   ROPE(type=f32,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.424500,ff=0,v=1): FAIL
[ROPE] NMSE = 1.886700974 > 0.000000100   ROPE(type=f32,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.424500,ff=1,v=0): FAIL
[ROPE] NMSE = 2.008959773 > 0.000000100   ROPE(type=f32,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.424500,ff=1,v=1): FAIL
[ROPE] NMSE = 1.950763331 > 0.000000100   ROPE(type=f16,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.424500,ff=0,v=0): FAIL
[ROPE] NMSE = 1.830177796 > 0.000000100   ROPE(type=f16,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.424500,ff=0,v=1): FAIL
[ROPE] NMSE = 1.994590249 > 0.000000100   ROPE(type=f16,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.424500,ff=1,v=0): FAIL
[ROPE] NMSE = 1.837849172 > 0.000000100   ROPE(type=f16,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.000000,ef=0.000000,af=1.424500,ff=1,v=1): FAIL
[ROPE] NMSE = 1.473079035 > 0.000000100   ROPE(type=f32,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.000000,ef=0.746500,af=1.000000,ff=0,v=0): FAIL
[ROPE] NMSE = 1.524445895 > 0.000000100   ROPE(type=f32,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.000000,ef=0.746500,af=1.000000,ff=0,v=1): FAIL
[ROPE] NMSE = 1.552798178 > 0.000000100   ROPE(type=f32,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.000000,ef=0.746500,af=1.000000,ff=1,v=0): FAIL
[ROPE] NMSE = 1.500256582 > 0.000000100   ROPE(type=f32,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.000000,ef=0.746500,af=1.000000,ff=1,v=1): FAIL
[ROPE] NMSE = 1.458912577 > 0.000000100   ROPE(type=f16,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.000000,ef=0.746500,af=1.000000,ff=0,v=0): FAIL
[ROPE] NMSE = 1.661911994 > 0.000000100   ROPE(type=f16,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.000000,ef=0.746500,af=1.000000,ff=0,v=1): FAIL
[ROPE] NMSE = 1.701647462 > 0.000000100   ROPE(type=f16,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.000000,ef=0.746500,af=1.000000,ff=1,v=0): FAIL
[ROPE] NMSE = 1.650559507 > 0.000000100   ROPE(type=f16,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.000000,ef=0.746500,af=1.000000,ff=1,v=1): FAIL
[ROPE] NMSE = 2.066614532 > 0.000000100   ROPE(type=f32,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.000000,ef=0.746500,af=1.424500,ff=0,v=0): FAIL
[ROPE] NMSE = 1.932990766 > 0.000000100   ROPE(type=f32,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.000000,ef=0.746500,af=1.424500,ff=0,v=1): FAIL
[ROPE] NMSE = 1.907465734 > 0.000000100   ROPE(type=f32,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.000000,ef=0.746500,af=1.424500,ff=1,v=0): FAIL
[ROPE] NMSE = 2.006598435 > 0.000000100   ROPE(type=f32,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.000000,ef=0.746500,af=1.424500,ff=1,v=1): FAIL
[ROPE] NMSE = 2.049808000 > 0.000000100   ROPE(type=f16,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.000000,ef=0.746500,af=1.424500,ff=0,v=0): FAIL
[ROPE] NMSE = 1.768807242 > 0.000000100   ROPE(type=f16,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.000000,ef=0.746500,af=1.424500,ff=0,v=1): FAIL
[ROPE] NMSE = 1.918092557 > 0.000000100   ROPE(type=f16,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.000000,ef=0.746500,af=1.424500,ff=1,v=0): FAIL
[ROPE] NMSE = 1.837483170 > 0.000000100   ROPE(type=f16,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.000000,ef=0.746500,af=1.424500,ff=1,v=1): FAIL
[ROPE] NMSE = 1.660523632 > 0.000000100   ROPE(type=f32,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.424500,ef=0.000000,af=1.000000,ff=0,v=0): FAIL
[ROPE] NMSE = 1.475247025 > 0.000000100   ROPE(type=f32,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.424500,ef=0.000000,af=1.000000,ff=0,v=1): FAIL
[ROPE] NMSE = 1.663181522 > 0.000000100   ROPE(type=f32,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.424500,ef=0.000000,af=1.000000,ff=1,v=0): FAIL
[ROPE] NMSE = 1.853815226 > 0.000000100   ROPE(type=f32,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.424500,ef=0.000000,af=1.000000,ff=1,v=1): FAIL
[ROPE] NMSE = 1.582442336 > 0.000000100   ROPE(type=f16,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.424500,ef=0.000000,af=1.000000,ff=0,v=0): FAIL
[ROPE] NMSE = 1.798423730 > 0.000000100   ROPE(type=f16,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.424500,ef=0.000000,af=1.000000,ff=0,v=1): FAIL
[ROPE] NMSE = 1.727213102 > 0.000000100   ROPE(type=f16,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.424500,ef=0.000000,af=1.000000,ff=1,v=0): FAIL
[ROPE] NMSE = 1.688991798 > 0.000000100   ROPE(type=f16,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.424500,ef=0.000000,af=1.000000,ff=1,v=1): FAIL
[ROPE] NMSE = 1.920576857 > 0.000000100   ROPE(type=f32,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.424500,ef=0.000000,af=1.424500,ff=0,v=0): FAIL
[ROPE] NMSE = 2.042724797 > 0.000000100   ROPE(type=f32,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.424500,ef=0.000000,af=1.424500,ff=0,v=1): FAIL
[ROPE] NMSE = 2.005006117 > 0.000000100   ROPE(type=f32,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.424500,ef=0.000000,af=1.424500,ff=1,v=0): FAIL
[ROPE] NMSE = 2.088975956 > 0.000000100   ROPE(type=f32,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.424500,ef=0.000000,af=1.424500,ff=1,v=1): FAIL
[ROPE] NMSE = 2.196888671 > 0.000000100   ROPE(type=f16,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.424500,ef=0.000000,af=1.424500,ff=0,v=0): FAIL
[ROPE] NMSE = 2.015039856 > 0.000000100   ROPE(type=f16,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.424500,ef=0.000000,af=1.424500,ff=0,v=1): FAIL
[ROPE] NMSE = 2.062078679 > 0.000000100   ROPE(type=f16,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.424500,ef=0.000000,af=1.424500,ff=1,v=0): FAIL
[ROPE] NMSE = 1.416624280 > 0.000000100   ROPE(type=f16,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.424500,ef=0.000000,af=1.424500,ff=1,v=1): FAIL
[ROPE] NMSE = 1.730256274 > 0.000000100   ROPE(type=f32,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.424500,ef=0.746500,af=1.000000,ff=0,v=0): FAIL
[ROPE] NMSE = 1.648190325 > 0.000000100   ROPE(type=f32,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.424500,ef=0.746500,af=1.000000,ff=0,v=1): FAIL
[ROPE] NMSE = 1.422733465 > 0.000000100   ROPE(type=f32,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.424500,ef=0.746500,af=1.000000,ff=1,v=0): FAIL
[ROPE] NMSE = 1.656913958 > 0.000000100   ROPE(type=f32,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.424500,ef=0.746500,af=1.000000,ff=1,v=1): FAIL
[ROPE] NMSE = 1.416131407 > 0.000000100   ROPE(type=f16,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.424500,ef=0.746500,af=1.000000,ff=0,v=0): FAIL
[ROPE] NMSE = 1.765720316 > 0.000000100   ROPE(type=f16,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.424500,ef=0.746500,af=1.000000,ff=0,v=1): FAIL
[ROPE] NMSE = 1.675386975 > 0.000000100   ROPE(type=f16,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.424500,ef=0.746500,af=1.000000,ff=1,v=0): FAIL
[ROPE] NMSE = 1.539980859 > 0.000000100   ROPE(type=f16,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.424500,ef=0.746500,af=1.000000,ff=1,v=1): FAIL
[ROPE] NMSE = 1.831464701 > 0.000000100   ROPE(type=f32,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.424500,ef=0.746500,af=1.424500,ff=0,v=0): FAIL
[ROPE] NMSE = 2.233042777 > 0.000000100   ROPE(type=f32,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.424500,ef=0.746500,af=1.424500,ff=0,v=1): FAIL
[ROPE] NMSE = 1.944985188 > 0.000000100   ROPE(type=f32,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.424500,ef=0.746500,af=1.424500,ff=1,v=0): FAIL
[ROPE] NMSE = 1.978410006 > 0.000000100   ROPE(type=f32,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.424500,ef=0.746500,af=1.424500,ff=1,v=1): FAIL
[ROPE] NMSE = 2.031789269 > 0.000000100   ROPE(type=f16,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.424500,ef=0.746500,af=1.424500,ff=0,v=0): FAIL
[ROPE] NMSE = 1.952964872 > 0.000000100   ROPE(type=f16,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.424500,ef=0.746500,af=1.424500,ff=0,v=1): FAIL
[ROPE] NMSE = 1.971218422 > 0.000000100   ROPE(type=f16,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.424500,ef=0.746500,af=1.424500,ff=1,v=0): FAIL
[ROPE] NMSE = 2.017971459 > 0.000000100   ROPE(type=f16,ne_a=[128,64,2,1],n_dims=64,offset=64,mode=2,n_ctx=512,fs=1.424500,ef=0.746500,af=1.424500,ff=1,v=1): FAIL

@CISC
Copy link
Collaborator

CISC commented Sep 20, 2025

Yes, CUDA.

Edit: it's broken on CPU too, so not a backend bug I think.

ggml_row_size(cur->type, n_dim*n_head),
0);
first = ggml_rope_ext(
cur = ggml_rope_ext(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this is the problem, you're creating a new tensor with only the first half RoPEd.

Unfortunately I don't think you can just inplace here either as you'll mess up the original tensor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
examples testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants