Invalid probability vector error with AMD iGPU on Vulkan backend Environment #2596

gn64 · 2024-11-29T23:00:52Z

Environment
OS: Windows 11
CPU: AMD Ryzen 7 7840u
GPU: AMD Radeon 780M (iGPU)
Model: ggml-tiny.bin
whisper.cpp: Both latest version from main branch and 1.7.2

Issue Description
When using Release build with Vulkan backend on AMD GPU, the output becomes garbled (showing timestamps with exclamation marks) and the output changes between runs. To investigate this issue, I switched to Debug build which revealed an underlying problem with probability calculations.

Steps to Reproduce
First with Release build:

cmake -B build -DGGML_VULKAN=ON
cmake --build build --config Release
.\main.exe -m .\ggml-tiny.bin -l ja .\jfk.wav

Output from Release build:

whisper_init_from_file_with_params_no_state: loading model from '.\ggml-tiny.bin'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon(TM) 780M (AMD proprietary driver) | uma: 1 | fp16: 1 | warp size: 64
register_backend: registered backend Vulkan (1 devices)
register_device: registered device Vulkan0 (AMD Radeon(TM) 780M)
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (AMD Ryzen 7 7840U w/ Radeon  780M Graphics     )
whisper_init_with_params_no_state: backends   = 2
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 384
whisper_model_load: n_audio_head  = 6
whisper_model_load: n_audio_layer = 4
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 384
whisper_model_load: n_text_head   = 6
whisper_model_load: n_text_layer  = 4
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 1 (tiny)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs       = 99
ggml_vulkan: Compiling shaders.............................Done!
whisper_model_load:  Vulkan0 total size =    77.11 MB
whisper_model_load: model size    =   77.11 MB
whisper_backend_init_gpu: using Vulkan backend
whisper_init_state: kv self size  =    3.15 MB
whisper_init_state: kv cross size =    9.44 MB
whisper_init_state: kv pad  size  =    2.36 MB
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 1)
ggml_gallocr_reserve_n: reallocating Vulkan0 buffer from size 0.00 MiB to 11.08 MiB
ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 0.92 MiB
whisper_init_state: compute buffer (conv)   =   14.15 MB
ggml_gallocr_needs_realloc: graph has different number of nodes
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0)
ggml_gallocr_reserve_n: reallocating Vulkan0 buffer from size 0.00 MiB to 60.29 MiB
ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 0.00 MiB
whisper_init_state: compute buffer (encode) =   64.79 MB
ggml_gallocr_needs_realloc: graph has different number of nodes
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0)
ggml_gallocr_reserve_n: reallocating Vulkan0 buffer from size 0.00 MiB to 2.20 MiB
ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 0.00 MiB
whisper_init_state: compute buffer (cross)  =    3.88 MB
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 1)
ggml_gallocr_reserve_n: reallocating Vulkan0 buffer from size 0.00 MiB to 89.95 MiB
ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 0.88 MiB
whisper_init_state: compute buffer (decode) =   96.81 MB

system_info: n_threads = 4 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0 | CANN = 0

main: processing '.\jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = ja, task = transcribe, timestamps = 1 ...

PS C:\Users\HidetoshiMATSUO\Desktop\whisper.cpp\test1> .\main.exe -m .\ggml-tiny.bin -l ja .\jfk.wav
whisper_init_from_file_with_params_no_state: loading model from '.\ggml-tiny.bin'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon(TM) 780M (AMD proprietary driver) | uma: 1 | fp16: 1 | warp size: 64
whisper_init_with_params_no_state: backends   = 2
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 384
whisper_model_load: n_audio_head  = 6
whisper_model_load: n_audio_layer = 4
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 384
whisper_model_load: n_text_head   = 6
whisper_model_load: n_text_layer  = 4
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 1 (tiny)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs       = 99
ggml_vulkan: Compiling shaders.............................Done!
whisper_model_load:  Vulkan0 total size =    77.11 MB
whisper_model_load: model size    =   77.11 MB
whisper_backend_init_gpu: using Vulkan backend
whisper_init_state: kv self size  =    3.15 MB
whisper_init_state: kv cross size =    9.44 MB
whisper_init_state: kv pad  size  =    2.36 MB
whisper_init_state: compute buffer (conv)   =   14.15 MB
whisper_init_state: compute buffer (encode) =   64.79 MB
whisper_init_state: compute buffer (cross)  =    3.88 MB
whisper_init_state: compute buffer (decode) =   96.81 MB

system_info: n_threads = 4 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0 | CANN = 0

main: processing '.\jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = ja, task = transcribe, timestamps = 1 ..

[00:00:00.000 --> 00:00:30.000]  !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!


whisper_print_timings:     load time =    63.88 ms
whisper_print_timings:     fallbacks =   5 p /   0 h
whisper_print_timings:      mel time =     5.75 ms
whisper_print_timings:   sample time =  3073.97 ms /  6600 runs (    0.47 ms per run)
whisper_print_timings:   encode time =    92.47 ms /     1 runs (   92.47 ms per run)
whisper_print_timings:   decode time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:   batchd time = 13938.27 ms /  6588 runs (    2.12 ms per run)
whisper_print_timings:   prompt time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:    total time = 17224.50 ms

Then with Debug build to investigate:

cmake -B build -DGGML_VULKAN=ON
cmake --build build --config Debug
.\main.exe -m .\ggml-tiny.bin -l ja .\jfk.wav
# Results in assertion error

Error in Debug Build

Debug Assertion Failed!
Program: ...whisper.dll
File: C:\Program Files\Microsoft VisualStudio\2022\Community\VC\Tools\MSVC\14.41.34120\include\random
Line:4924
Expression: invalid probability vector for discrete_distribution

This issue appears to be related to whisper.cpp#2400.
I thought it might be connected to llama.cpp#10434, so I tried applying the same fix, but it didn't improve the situation. Any suggestions on how to resolve this would be appreciated.

The text was updated successfully, but these errors were encountered:

DickyQi · 2024-11-30T08:18:21Z

MacBook Pro 16 inch 2019 with AMD 5500M has issue with Vulkan-loader 1.3.302
input data is Chinese, but auto language detect as af(p=0.01000) and no output
test log info as following:

ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon Pro 5500M (MoltenVK) | uma: 0 | fp16: 1 | warp size: 64
whisper_init_with_params_no_state: devices = 3
whisper_init_with_params_no_state: backends = 3
whisper_model_load: loading model
whisper_model_load: n_vocab = 51865
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 384
whisper_model_load: n_audio_head = 6
whisper_model_load: n_audio_layer = 4
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 384
whisper_model_load: n_text_head = 6
whisper_model_load: n_text_layer = 4
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 1 (tiny)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs = 99
whisper_default_buffer_type: using device Vulkan0 (AMD Radeon Pro 5500M)
ggml_vulkan: Compiling shaders..............................Done!
whisper_model_load: Vulkan0 total size = 77.11 MB
whisper_model_load: model size = 77.11 MB
whisper_backend_init_gpu: using Vulkan0 backend
whisper_backend_init: using BLAS backend
whisper_init_state: kv self size = 3.15 MB
whisper_init_state: kv cross size = 9.44 MB
whisper_init_state: kv pad size = 2.36 MB
whisper_init_state: compute buffer (conv) = 14.15 MB
whisper_init_state: compute buffer (encode) = 64.79 MB
whisper_init_state: compute buffer (cross) = 3.88 MB
whisper_init_state: compute buffer (decode) = 96.81 MB

system_info: n_threads = 4 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | COREML = 0 | OPENVINO = 0 |

main: processing '/Volumes/Share/Streams/audio/voices/c4.wav' (160000 samples, 10.0 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = auto, task = transcribe, timestamps = 1 ...

whisper_full_with_state: auto-detected language: af (p = 0.010000)

whisper_print_timings: load time = 368.12 ms
whisper_print_timings: fallbacks = 5 p / 0 h
whisper_print_timings: mel time = 8.73 ms
whisper_print_timings: sample time = 5.37 ms / 30 runs ( 0.18 ms per run)
whisper_print_timings: encode time = 218.31 ms / 2 runs ( 109.15 ms per run)
whisper_print_timings: decode time = 6.32 ms / 1 runs ( 6.32 ms per run)
whisper_print_timings: batchd time = 97.96 ms / 18 runs ( 5.44 ms per run)
whisper_print_timings: prompt time = 0.00 ms / 1 runs ( 0.00 ms per run)
whisper_print_timings: total time = 734.51 ms

DickyQi · 2024-11-30T08:29:47Z

BTW, if I revert to version 1.7.2 release(git: 6266a9f), then result is OK, and output right. And same code in Linux with 1080Ti vulkan backend also is correct。

gn64 · 2024-12-02T01:12:34Z

The issue was fixed in my environment by modifying the line
const uint rowy = rowx % p.KY;
to
const uint rowy = (p.KY > 0) ? (rowx % p.KY) : 0;
in the void soft_max(uint num_iters) function within ggml-vulkan/vulkan-shaders/soft_max.comp.
This change prevents a division by zero error when p.KY is 0.

gn64 linked a pull request Dec 2, 2024 that will close this issue

fix: prevent division by zero in soft_max vulkan shader #2604

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Invalid probability vector error with AMD iGPU on Vulkan backend Environment #2596

Invalid probability vector error with AMD iGPU on Vulkan backend Environment #2596

gn64 commented Nov 29, 2024 •

edited

Loading

DickyQi commented Nov 30, 2024

DickyQi commented Nov 30, 2024 •

edited

Loading

gn64 commented Dec 2, 2024

Invalid probability vector error with AMD iGPU on Vulkan backend Environment #2596

Invalid probability vector error with AMD iGPU on Vulkan backend Environment #2596

Comments

gn64 commented Nov 29, 2024 • edited Loading

DickyQi commented Nov 30, 2024

DickyQi commented Nov 30, 2024 • edited Loading

gn64 commented Dec 2, 2024

gn64 commented Nov 29, 2024 •

edited

Loading

DickyQi commented Nov 30, 2024 •

edited

Loading