Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid probability vector error with AMD iGPU on Vulkan backend Environment #2596

Open
gn64 opened this issue Nov 29, 2024 · 3 comments · May be fixed by #2604
Open

Invalid probability vector error with AMD iGPU on Vulkan backend Environment #2596

gn64 opened this issue Nov 29, 2024 · 3 comments · May be fixed by #2604

Comments

@gn64
Copy link

gn64 commented Nov 29, 2024

Environment
OS: Windows 11
CPU: AMD Ryzen 7 7840u
GPU: AMD Radeon 780M (iGPU)
Model: ggml-tiny.bin
whisper.cpp: Both latest version from main branch and 1.7.2

Issue Description
When using Release build with Vulkan backend on AMD GPU, the output becomes garbled (showing timestamps with exclamation marks) and the output changes between runs. To investigate this issue, I switched to Debug build which revealed an underlying problem with probability calculations.

Steps to Reproduce
First with Release build:

cmake -B build -DGGML_VULKAN=ON
cmake --build build --config Release
.\main.exe -m .\ggml-tiny.bin -l ja .\jfk.wav

Output from Release build:

whisper_init_from_file_with_params_no_state: loading model from '.\ggml-tiny.bin'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon(TM) 780M (AMD proprietary driver) | uma: 1 | fp16: 1 | warp size: 64
register_backend: registered backend Vulkan (1 devices)
register_device: registered device Vulkan0 (AMD Radeon(TM) 780M)
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (AMD Ryzen 7 7840U w/ Radeon  780M Graphics     )
whisper_init_with_params_no_state: backends   = 2
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 384
whisper_model_load: n_audio_head  = 6
whisper_model_load: n_audio_layer = 4
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 384
whisper_model_load: n_text_head   = 6
whisper_model_load: n_text_layer  = 4
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 1 (tiny)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs       = 99
ggml_vulkan: Compiling shaders.............................Done!
whisper_model_load:  Vulkan0 total size =    77.11 MB
whisper_model_load: model size    =   77.11 MB
whisper_backend_init_gpu: using Vulkan backend
whisper_init_state: kv self size  =    3.15 MB
whisper_init_state: kv cross size =    9.44 MB
whisper_init_state: kv pad  size  =    2.36 MB
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 1)
ggml_gallocr_reserve_n: reallocating Vulkan0 buffer from size 0.00 MiB to 11.08 MiB
ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 0.92 MiB
whisper_init_state: compute buffer (conv)   =   14.15 MB
ggml_gallocr_needs_realloc: graph has different number of nodes
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0)
ggml_gallocr_reserve_n: reallocating Vulkan0 buffer from size 0.00 MiB to 60.29 MiB
ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 0.00 MiB
whisper_init_state: compute buffer (encode) =   64.79 MB
ggml_gallocr_needs_realloc: graph has different number of nodes
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 0)
ggml_gallocr_reserve_n: reallocating Vulkan0 buffer from size 0.00 MiB to 2.20 MiB
ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 0.00 MiB
whisper_init_state: compute buffer (cross)  =    3.88 MB
ggml_backend_sched_alloc_splits: failed to allocate graph, reserving (backend_ids_changed = 1)
ggml_gallocr_reserve_n: reallocating Vulkan0 buffer from size 0.00 MiB to 89.95 MiB
ggml_gallocr_reserve_n: reallocating CPU buffer from size 0.00 MiB to 0.88 MiB
whisper_init_state: compute buffer (decode) =   96.81 MB

system_info: n_threads = 4 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0 | CANN = 0

main: processing '.\jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = ja, task = transcribe, timestamps = 1 ...

PS C:\Users\HidetoshiMATSUO\Desktop\whisper.cpp\test1> .\main.exe -m .\ggml-tiny.bin -l ja .\jfk.wav
whisper_init_from_file_with_params_no_state: loading model from '.\ggml-tiny.bin'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon(TM) 780M (AMD proprietary driver) | uma: 1 | fp16: 1 | warp size: 64
whisper_init_with_params_no_state: backends   = 2
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 384
whisper_model_load: n_audio_head  = 6
whisper_model_load: n_audio_layer = 4
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 384
whisper_model_load: n_text_head   = 6
whisper_model_load: n_text_layer  = 4
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 1 (tiny)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs       = 99
ggml_vulkan: Compiling shaders.............................Done!
whisper_model_load:  Vulkan0 total size =    77.11 MB
whisper_model_load: model size    =   77.11 MB
whisper_backend_init_gpu: using Vulkan backend
whisper_init_state: kv self size  =    3.15 MB
whisper_init_state: kv cross size =    9.44 MB
whisper_init_state: kv pad  size  =    2.36 MB
whisper_init_state: compute buffer (conv)   =   14.15 MB
whisper_init_state: compute buffer (encode) =   64.79 MB
whisper_init_state: compute buffer (cross)  =    3.88 MB
whisper_init_state: compute buffer (decode) =   96.81 MB

system_info: n_threads = 4 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0 | CANN = 0

main: processing '.\jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = ja, task = transcribe, timestamps = 1 ..

[00:00:00.000 --> 00:00:30.000]  !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!


whisper_print_timings:     load time =    63.88 ms
whisper_print_timings:     fallbacks =   5 p /   0 h
whisper_print_timings:      mel time =     5.75 ms
whisper_print_timings:   sample time =  3073.97 ms /  6600 runs (    0.47 ms per run)
whisper_print_timings:   encode time =    92.47 ms /     1 runs (   92.47 ms per run)
whisper_print_timings:   decode time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:   batchd time = 13938.27 ms /  6588 runs (    2.12 ms per run)
whisper_print_timings:   prompt time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:    total time = 17224.50 ms

Then with Debug build to investigate:

cmake -B build -DGGML_VULKAN=ON
cmake --build build --config Debug
.\main.exe -m .\ggml-tiny.bin -l ja .\jfk.wav
# Results in assertion error

Error in Debug Build

Debug Assertion Failed!
Program: ...whisper.dll
File: C:\Program Files\Microsoft VisualStudio\2022\Community\VC\Tools\MSVC\14.41.34120\include\random
Line:4924
Expression: invalid probability vector for discrete_distribution

This issue appears to be related to whisper.cpp#2400.
I thought it might be connected to llama.cpp#10434, so I tried applying the same fix, but it didn't improve the situation. Any suggestions on how to resolve this would be appreciated.

@DickyQi
Copy link

DickyQi commented Nov 30, 2024

MacBook Pro 16 inch 2019 with AMD 5500M has issue with Vulkan-loader 1.3.302
input data is Chinese, but auto language detect as af(p=0.01000) and no output
test log info as following:

ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon Pro 5500M (MoltenVK) | uma: 0 | fp16: 1 | warp size: 64
whisper_init_with_params_no_state: devices = 3
whisper_init_with_params_no_state: backends = 3
whisper_model_load: loading model
whisper_model_load: n_vocab = 51865
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 384
whisper_model_load: n_audio_head = 6
whisper_model_load: n_audio_layer = 4
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 384
whisper_model_load: n_text_head = 6
whisper_model_load: n_text_layer = 4
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 1 (tiny)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs = 99
whisper_default_buffer_type: using device Vulkan0 (AMD Radeon Pro 5500M)
ggml_vulkan: Compiling shaders..............................Done!
whisper_model_load: Vulkan0 total size = 77.11 MB
whisper_model_load: model size = 77.11 MB
whisper_backend_init_gpu: using Vulkan0 backend
whisper_backend_init: using BLAS backend
whisper_init_state: kv self size = 3.15 MB
whisper_init_state: kv cross size = 9.44 MB
whisper_init_state: kv pad size = 2.36 MB
whisper_init_state: compute buffer (conv) = 14.15 MB
whisper_init_state: compute buffer (encode) = 64.79 MB
whisper_init_state: compute buffer (cross) = 3.88 MB
whisper_init_state: compute buffer (decode) = 96.81 MB

system_info: n_threads = 4 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | COREML = 0 | OPENVINO = 0 |

main: processing '/Volumes/Share/Streams/audio/voices/c4.wav' (160000 samples, 10.0 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = auto, task = transcribe, timestamps = 1 ...

whisper_full_with_state: auto-detected language: af (p = 0.010000)

whisper_print_timings: load time = 368.12 ms
whisper_print_timings: fallbacks = 5 p / 0 h
whisper_print_timings: mel time = 8.73 ms
whisper_print_timings: sample time = 5.37 ms / 30 runs ( 0.18 ms per run)
whisper_print_timings: encode time = 218.31 ms / 2 runs ( 109.15 ms per run)
whisper_print_timings: decode time = 6.32 ms / 1 runs ( 6.32 ms per run)
whisper_print_timings: batchd time = 97.96 ms / 18 runs ( 5.44 ms per run)
whisper_print_timings: prompt time = 0.00 ms / 1 runs ( 0.00 ms per run)
whisper_print_timings: total time = 734.51 ms

@DickyQi
Copy link

DickyQi commented Nov 30, 2024

BTW, if I revert to version 1.7.2 release(git: 6266a9f), then result is OK, and output right. And same code in Linux with 1080Ti vulkan backend also is correct。

@gn64
Copy link
Author

gn64 commented Dec 2, 2024

The issue was fixed in my environment by modifying the line
const uint rowy = rowx % p.KY;
to
const uint rowy = (p.KY > 0) ? (rowx % p.KY) : 0;
in the void soft_max(uint num_iters) function within ggml-vulkan/vulkan-shaders/soft_max.comp.
This change prevents a division by zero error when p.KY is 0.

@gn64 gn64 linked a pull request Dec 2, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants