Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transcript Generation Stuck at Intermittent Time Marks #2597

Open
OmarKhaled511 opened this issue Nov 30, 2024 · 0 comments
Open

Transcript Generation Stuck at Intermittent Time Marks #2597

OmarKhaled511 opened this issue Nov 30, 2024 · 0 comments

Comments

@OmarKhaled511
Copy link

It’s working well, but sometimes when it's generating the transcript, it gets stuck at minute 4, 7, or 10, and it won’t finish. I’ve tried all possible solutions: lowering the temperature, reducing the number of threads, splitting the long 30-minute WAV file into smaller parts (I even tried 5-minute chunks), but the same issue keeps happening

this is my log

./bin/main -f output.wav -m models/models/ggml-large-v2.bin  --ov-e-device GPU
whisper_init_from_file_with_params_no_state: loading model from 'models/models/ggml-large-v2.bin'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4070 Ti, compute capability 8.9, VMM: yes
whisper_init_with_params_no_state: devices    = 2
whisper_init_with_params_no_state: backends   = 2
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head  = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1280
whisper_model_load: n_text_head   = 20
whisper_model_load: n_text_layer  = 32
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 5 (large)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs       = 99
whisper_default_buffer_type: using device CUDA0 (NVIDIA GeForce RTX 4070 Ti)
whisper_model_load:    CUDA0 total size =  3093.99 MB
whisper_model_load: model size    = 3093.99 MB
whisper_backend_init_gpu: using CUDA0 backend
whisper_init_state: kv self size  =   83.89 MB
whisper_init_state: kv cross size =  251.66 MB
whisper_init_state: kv pad  size  =    7.86 MB
whisper_init_state: compute buffer (conv)   =   35.65 MB
whisper_init_state: compute buffer (encode) =  212.29 MB
whisper_init_state: compute buffer (cross)  =    9.25 MB
whisper_init_state: compute buffer (decode) =  100.02 MB

system_info: n_threads = 4 / 20 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | COREML = 0 | OPENVINO = 0 |

main: processing 'output.wav' (25758918 samples, 1609.9 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant