Skip to content

zero-size array to reduction operation maximum which has no identity None in vllm asr #242

@zshnb

Description

@zshnb

I started vllm server, the log is below

============================================================
  VibeVoice vLLM ASR Server - One-Click Deployment
============================================================

============================================================
  Downloading model: microsoft/VibeVoice-ASR
============================================================

Fetching 17 files: 100%|███████████████████████████████████████████████████████████████████████████████| 17/17 [00:00<00:00, 200289.80it/s]

============================================================
  ✅ Model downloaded successfully!
  📁 Path: /root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944
============================================================


============================================================
  Generating tokenizer files
============================================================

=== Generating VibeVoice tokenizer files to /root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944 ===

Downloading vocab.json from Qwen/Qwen2.5-7B...
/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/huggingface_hub/file_download.py:982: UserWarning: `local_dir_use_symlinks` parameter is deprecated and will be ignored. The process to download files to a local folder has been updated and do not rely on symlinks anymore. You only need to pass a destination folder as`local_dir`.
For more details, check out https://huggingface.co/docs/huggingface_hub/main/en/guides/download#download-files-to-local-folder.
  warnings.warn(
Downloading merges.txt from Qwen/Qwen2.5-7B...
Downloading tokenizer.json from Qwen/Qwen2.5-7B...
Downloading tokenizer_config.json from Qwen/Qwen2.5-7B...
Patched /root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944/tokenizer_config.json
Patched /root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944/tokenizer.json
Generated /root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944/added_tokens.json
Generated /root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944/special_tokens_map.json

✅ All 6 tokenizer files generated in /root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944

============================================================
  Starting vLLM server on port 8000
============================================================

(APIServer pid=2309206) INFO 02-12 22:38:37 [api_server.py:1272] vLLM API server version 0.14.1
(APIServer pid=2309206) INFO 02-12 22:38:37 [utils.py:263] non-default args: {'model_tag': '/root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944', 'chat_template_content_format': 'openai', 'model': '/root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944', 'trust_remote_code': True, 'dtype': 'bfloat16', 'allowed_local_media_path': '/root/podcast-asr-service', 'max_model_len': 65536, 'enforce_eager': True, 'served_model_name': ['vibevoice'], 'gpu_memory_utilization': 0.8, 'enable_prefix_caching': False, 'max_num_batched_tokens': 32768, 'max_num_seqs': 64, 'enable_chunked_prefill': True}
(APIServer pid=2309206) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=2309206) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=2309206) INFO 02-12 22:38:37 [model.py:530] Resolved architecture: VibeVoiceForASRTraining
(APIServer pid=2309206) INFO 02-12 22:38:37 [model.py:1866] Downcasting torch.float32 to torch.bfloat16.
(APIServer pid=2309206) INFO 02-12 22:38:37 [model.py:1545] Using max model len 65536
(APIServer pid=2309206) INFO 02-12 22:38:37 [scheduler.py:229] Chunked prefill is enabled with max_num_batched_tokens=32768.
(APIServer pid=2309206) INFO 02-12 22:38:37 [vllm.py:630] Asynchronous scheduling is enabled.
(APIServer pid=2309206) INFO 02-12 22:38:37 [vllm.py:637] Disabling NCCL for DP synchronization when using async scheduling.
(APIServer pid=2309206) WARNING 02-12 22:38:37 [vllm.py:665] Enforce eager set, overriding optimization level to -O0
(APIServer pid=2309206) INFO 02-12 22:38:37 [vllm.py:765] Cudagraph is disabled under eager mode
(APIServer pid=2309206) The tokenizer you are loading from '/root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue.
(APIServer pid=2309206) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
(EngineCore_DP0 pid=2309990) INFO 02-12 22:38:43 [core.py:97] Initializing a V1 LLM engine (v0.14.1) with config: model='/root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944', speculative_config=None, tokenizer='/root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=65536, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=True, enable_return_routed_experts=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='', reasoning_parser_plugin='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, kv_cache_metrics=False, kv_cache_metrics_sample=0.01, cudagraph_metrics=False, enable_layerwise_nvtx_tracing=False, enable_mfu_metrics=False, enable_mm_processor_stats=False, enable_logging_iteration_details=False), seed=0, served_model_name=vibevoice, enable_prefix_caching=False, enable_chunked_prefill=True, pooler_config=None, compilation_config={'level': None, 'mode': <CompilationMode.NONE: 0>, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['all'], 'splitting_ops': [], 'compile_mm_encoder': False, 'compile_sizes': [], 'compile_ranges_split_points': [32768], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'combo_kernels': True, 'benchmark_combo_kernel': True}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.NONE: 0>, 'cudagraph_num_of_warmups': 0, 'cudagraph_capture_sizes': [], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': False, 'fuse_attn_quant': False, 'eliminate_noops': False, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': False}, 'max_cudagraph_capture_size': 0, 'dynamic_shapes_config': {'type': <DynamicShapesType.BACKED: 'backed'>, 'evaluate_guards': False, 'assume_32_bit_indexing': True}, 'local_cache_dir': None}
(EngineCore_DP0 pid=2309990) The tokenizer you are loading from '/root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue.
(EngineCore_DP0 pid=2309990) INFO 02-12 22:38:44 [parallel_state.py:1214] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://10.3.194.79:46259 backend=nccl
(EngineCore_DP0 pid=2309990) INFO 02-12 22:38:44 [parallel_state.py:1425] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 0, EP rank N/A
(EngineCore_DP0 pid=2309990) /root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/transformers/audio_utils.py:525: UserWarning: At least one mel filter has all zero values. The value for `num_mel_filters` (128) may be set too high. Or, the value for `num_frequency_bins` (201) may be set too low.
(EngineCore_DP0 pid=2309990)   warnings.warn(
(EngineCore_DP0 pid=2309990) INFO 02-12 22:38:44 [gpu_model_runner.py:3808] Starting to load model /root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944...
(EngineCore_DP0 pid=2309990) `torch_dtype` is deprecated! Use `dtype` instead!
(EngineCore_DP0 pid=2309990) INFO 02-12 22:38:44 [vllm.py:630] Asynchronous scheduling is enabled.
(EngineCore_DP0 pid=2309990) WARNING 02-12 22:38:44 [vllm.py:672] Inductor compilation was disabled by user settings, optimizations settings that are only active during inductor compilation will be ignored.
(EngineCore_DP0 pid=2309990) INFO 02-12 22:38:44 [vllm.py:765] Cudagraph is disabled under eager mode
(EngineCore_DP0 pid=2309990) /root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/tvm_ffi/_optional_torch_c_dlpack.py:174: UserWarning: Failed to JIT torch c dlpack extension, EnvTensorAllocator will not be enabled.
(EngineCore_DP0 pid=2309990) We recommend installing via `pip install torch-c-dlpack-ext`
(EngineCore_DP0 pid=2309990)   warnings.warn(
(EngineCore_DP0 pid=2309990) INFO 02-12 22:38:45 [cuda.py:351] Using FLASH_ATTN attention backend out of potential backends: ('FLASH_ATTN', 'FLASHINFER', 'TRITON_ATTN', 'FLEX_ATTENTION')
(EngineCore_DP0 pid=2309990) [VibeVoice] Converted acoustic_tokenizer to torch.float32 (was torch.bfloat16)
(EngineCore_DP0 pid=2309990) [VibeVoice] Converted semantic_tokenizer to torch.float32 (was torch.bfloat16)
(EngineCore_DP0 pid=2309990) [VibeVoice] Converted acoustic_connector to torch.float32 (was torch.bfloat16)
(EngineCore_DP0 pid=2309990) [VibeVoice] Converted semantic_connector to torch.float32 (was torch.bfloat16)
Loading safetensors checkpoint shards:   0% Completed | 0/8 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:  12% Completed | 1/8 [00:00<00:02,  2.92it/s]
Loading safetensors checkpoint shards:  25% Completed | 2/8 [00:00<00:01,  3.35it/s]
Loading safetensors checkpoint shards:  38% Completed | 3/8 [00:00<00:01,  3.01it/s]
Loading safetensors checkpoint shards:  50% Completed | 4/8 [00:01<00:01,  3.97it/s]
Loading safetensors checkpoint shards:  62% Completed | 5/8 [00:01<00:00,  3.92it/s]
Loading safetensors checkpoint shards:  75% Completed | 6/8 [00:01<00:00,  3.81it/s]
Loading safetensors checkpoint shards:  88% Completed | 7/8 [00:01<00:00,  3.76it/s]
Loading safetensors checkpoint shards: 100% Completed | 8/8 [00:02<00:00,  3.75it/s]
Loading safetensors checkpoint shards: 100% Completed | 8/8 [00:02<00:00,  3.65it/s]
(EngineCore_DP0 pid=2309990) 
(EngineCore_DP0 pid=2309990) INFO 02-12 22:38:47 [default_loader.py:291] Loading weights took 2.21 seconds
(EngineCore_DP0 pid=2309990) INFO 02-12 22:38:47 [gpu_model_runner.py:3905] Model loading took 18.22 GiB memory and 2.694686 seconds
(EngineCore_DP0 pid=2309990) INFO 02-12 22:38:48 [gpu_model_runner.py:4715] Encoder cache will be initialized with a budget of 32768 tokens, and profiled with 145 audio items of the maximum feature size.
(EngineCore_DP0 pid=2309990) /root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/transformers/audio_utils.py:525: UserWarning: At least one mel filter has all zero values. The value for `num_mel_filters` (128) may be set too high. Or, the value for `num_frequency_bins` (201) may be set too low.
(EngineCore_DP0 pid=2309990)   warnings.warn(
(EngineCore_DP0 pid=2309990) INFO 02-12 22:39:22 [gpu_worker.py:358] Available KV cache memory: 12.52 GiB
(EngineCore_DP0 pid=2309990) INFO 02-12 22:39:22 [kv_cache_utils.py:1305] GPU KV cache size: 234,432 tokens
(EngineCore_DP0 pid=2309990) INFO 02-12 22:39:22 [kv_cache_utils.py:1310] Maximum concurrency for 65,536 tokens per request: 3.58x
(EngineCore_DP0 pid=2309990) INFO 02-12 22:39:22 [core.py:273] init engine (profile, create kv cache, warmup model) took 34.80 seconds
(EngineCore_DP0 pid=2309990) WARNING 02-12 22:39:22 [vllm.py:672] Inductor compilation was disabled by user settings, optimizations settings that are only active during inductor compilation will be ignored.
(EngineCore_DP0 pid=2309990) INFO 02-12 22:39:22 [vllm.py:765] Cudagraph is disabled under eager mode
(APIServer pid=2309206) INFO 02-12 22:39:23 [api_server.py:1014] Supported tasks: ['generate', 'transcription']
(APIServer pid=2309206) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=2309206) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=2309206) INFO 02-12 22:39:23 [serving_chat.py:182] Warming up chat template processing...
(APIServer pid=2309206) INFO 02-12 22:39:23 [serving_chat.py:218] Chat template warmup completed in 29.0ms
(APIServer pid=2309206) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=2309206) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=2309206) INFO 02-12 22:39:23 [speech_to_text.py:138] Warming up audio preprocessing libraries...
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] Audio preprocessing warmup failed (non-fatal): %s. First request may experience higher latency.
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] Traceback (most recent call last):
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]   File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/transformers/feature_extraction_utils.py", line 529, in get_feature_extractor_dict
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]     resolved_feature_extractor_file = resolved_feature_extractor_files[0]
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]                                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] IndexError: list index out of range
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] 
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] During handling of the above exception, another exception occurred:
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] 
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] Traceback (most recent call last):
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]   File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/vllm/entrypoints/openai/speech_to_text.py", line 152, in _warmup_audio_preprocessing
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]     processor = cached_processor_from_config(self.model_config)
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]   File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/vllm/transformers_utils/processor.py", line 251, in cached_processor_from_config
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]     return cached_get_processor_without_dynamic_kwargs(
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]   File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/vllm/transformers_utils/processor.py", line 210, in cached_get_processor_without_dynamic_kwargs
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]     processor = cached_get_processor(
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]                 ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]   File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/vllm/transformers_utils/processor.py", line 120, in get_processor
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]     processor = AutoProcessor.from_pretrained(
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]   File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/transformers/models/auto/processing_auto.py", line 401, in from_pretrained
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]     return PROCESSOR_MAPPING[type(config)].from_pretrained(pretrained_model_name_or_path, **kwargs)
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]   File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/transformers/processing_utils.py", line 1394, in from_pretrained
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]     args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, **kwargs)
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]   File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/transformers/processing_utils.py", line 1453, in _get_arguments_from_pretrained
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]     args.append(attribute_class.from_pretrained(pretrained_model_name_or_path, **kwargs))
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]   File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/transformers/feature_extraction_utils.py", line 382, in from_pretrained
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]     feature_extractor_dict, kwargs = cls.get_feature_extractor_dict(pretrained_model_name_or_path, **kwargs)
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]   File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/transformers/feature_extraction_utils.py", line 536, in get_feature_extractor_dict
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]     raise OSError(
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] OSError: Can't load feature extractor for '/root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure '/root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944' is the correct path to a directory containing a preprocessor_config.json file
(APIServer pid=2309206) INFO 02-12 22:39:23 [speech_to_text.py:201] Warming up multimodal input processor...
(APIServer pid=2309206) INFO 02-12 22:39:23 [speech_to_text.py:234] Input processor warmup completed in 0.00s
(APIServer pid=2309206) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=2309206) INFO 02-12 22:39:23 [speech_to_text.py:138] Warming up audio preprocessing libraries...
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] Audio preprocessing warmup failed (non-fatal): %s. First request may experience higher latency.
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] Traceback (most recent call last):
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]   File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/transformers/feature_extraction_utils.py", line 529, in get_feature_extractor_dict
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]     resolved_feature_extractor_file = resolved_feature_extractor_files[0]
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]                                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] IndexError: list index out of range
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] 
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] During handling of the above exception, another exception occurred:
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] 
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] Traceback (most recent call last):
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]   File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/vllm/entrypoints/openai/speech_to_text.py", line 152, in _warmup_audio_preprocessing
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]     processor = cached_processor_from_config(self.model_config)
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]   File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/vllm/transformers_utils/processor.py", line 251, in cached_processor_from_config
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]     return cached_get_processor_without_dynamic_kwargs(
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]   File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/vllm/transformers_utils/processor.py", line 210, in cached_get_processor_without_dynamic_kwargs
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]     processor = cached_get_processor(
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]                 ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]   File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/vllm/transformers_utils/processor.py", line 120, in get_processor
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]     processor = AutoProcessor.from_pretrained(
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]   File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/transformers/models/auto/processing_auto.py", line 401, in from_pretrained
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]     return PROCESSOR_MAPPING[type(config)].from_pretrained(pretrained_model_name_or_path, **kwargs)
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]   File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/transformers/processing_utils.py", line 1394, in from_pretrained
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]     args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, **kwargs)
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]   File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/transformers/processing_utils.py", line 1453, in _get_arguments_from_pretrained
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]     args.append(attribute_class.from_pretrained(pretrained_model_name_or_path, **kwargs))
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]   File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/transformers/feature_extraction_utils.py", line 382, in from_pretrained
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]     feature_extractor_dict, kwargs = cls.get_feature_extractor_dict(pretrained_model_name_or_path, **kwargs)
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]   File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/transformers/feature_extraction_utils.py", line 536, in get_feature_extractor_dict
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]     raise OSError(
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] OSError: Can't load feature extractor for '/root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure '/root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944' is the correct path to a directory containing a preprocessor_config.json file
(APIServer pid=2309206) INFO 02-12 22:39:23 [speech_to_text.py:201] Warming up multimodal input processor...
(APIServer pid=2309206) INFO 02-12 22:39:23 [speech_to_text.py:234] Input processor warmup completed in 0.00s
(APIServer pid=2309206) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=2309206) INFO 02-12 22:39:23 [api_server.py:1346] Starting vLLM API server 0 on http://0.0.0.0:8000
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:38] Available routes are:
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /openapi.json, Methods: GET, HEAD
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /docs, Methods: GET, HEAD
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /docs/oauth2-redirect, Methods: GET, HEAD
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /redoc, Methods: GET, HEAD
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /scale_elastic_ep, Methods: POST
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /is_scaling_elastic_ep, Methods: POST
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /tokenize, Methods: POST
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /detokenize, Methods: POST
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /inference/v1/generate, Methods: POST
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /pause, Methods: POST
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /resume, Methods: POST
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /is_paused, Methods: GET
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /metrics, Methods: GET
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /health, Methods: GET
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /load, Methods: GET
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /v1/models, Methods: GET
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /version, Methods: GET
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /v1/responses, Methods: POST
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /v1/responses/{response_id}, Methods: GET
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /v1/responses/{response_id}/cancel, Methods: POST
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /v1/messages, Methods: POST
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /v1/chat/completions, Methods: POST
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /v1/completions, Methods: POST
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /v1/audio/transcriptions, Methods: POST
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /v1/audio/translations, Methods: POST
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /ping, Methods: GET
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /ping, Methods: POST
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /invocations, Methods: POST
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /classify, Methods: POST
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /v1/embeddings, Methods: POST
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /score, Methods: POST
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /v1/score, Methods: POST
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /rerank, Methods: POST
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /v1/rerank, Methods: POST
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /v2/rerank, Methods: POST
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /pooling, Methods: POST
(APIServer pid=2309206) INFO:     Started server process [2309206]
(APIServer pid=2309206) INFO:     Waiting for application startup.
(APIServer pid=2309206) INFO:     Application startup complete.

then I run with a local audio file

Loading audio from: /root/podcast-asr-service/tmp/698b401ee329a4d9e48af376/segments/1.m4a
Audio duration: 1800.02 seconds
Audio size: 29111558 bytes

Sending request to http://localhost:8000/v1/chat/completions (Streaming Mode)...
Prompt: This is a 1800.02 seconds audio, please transcribe it with these keys: Start time, End time, Speaker ID, Content

finally I got error

(APIServer pid=2309206) /root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/numpy/_core/fromnumeric.py:3860: RuntimeWarning: Mean of empty slice.
(APIServer pid=2309206)   return _methods._mean(a, axis=axis, dtype=dtype,
(APIServer pid=2309206) /root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/numpy/_core/_methods.py:145: RuntimeWarning: invalid value encountered in divide
(APIServer pid=2309206)   ret = ret.dtype.type(ret / rcount)
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335] Error in preprocessing prompt inputs
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335] Traceback (most recent call last):
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335]   File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/vllm/entrypoints/openai/serving_chat.py", line 313, in create_chat_completion
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335]     conversation, engine_prompts = await self._preprocess_chat(
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335]                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335]   File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/vllm/entrypoints/openai/serving_engine.py", line 1234, in _preprocess_chat
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335]     mm_data = await mm_data_future
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335]               ^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335]   File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/vllm/entrypoints/chat_utils.py", line 819, in all_mm_data
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335]     modality: await asyncio.gather(*coros)
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335]   File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/vllm/multimodal/utils.py", line 242, in fetch_audio_async
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335]     return await self.load_from_url_async(
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335]   File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/vllm/multimodal/utils.py", line 208, in load_from_url_async
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335]     return await future
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335]            ^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335]   File "/usr/lib64/python3.12/concurrent/futures/thread.py", line 58, in run
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335]     result = self.fn(*self.args, **self.kwargs)
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335]   File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/vllm/multimodal/utils.py", line 117, in _load_data_url
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335]     return media_io.load_base64(media_type, data)
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335]   File "/root/podcast-asr-service/toolchain/lib/VibeVoice/vllm_plugin/model.py", line 85, in load_base64
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335]     return _ffmpeg_load_bytes(base64.b64decode(data), media_type=media_type)
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335]   File "/root/podcast-asr-service/toolchain/lib/VibeVoice/vllm_plugin/model.py", line 60, in _ffmpeg_load_bytes
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335]     audio = normalizer(audio)
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335]             ^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335]   File "/root/podcast-asr-service/toolchain/lib/VibeVoice/vibevoice/processor/audio_utils.py", line 216, in __call__
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335]     audio, _ = self.avoid_clipping(audio)
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335]                ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335]   File "/root/podcast-asr-service/toolchain/lib/VibeVoice/vibevoice/processor/audio_utils.py", line 195, in avoid_clipping
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335]     max_val = np.max(np.abs(audio))
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335]               ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335]   File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/numpy/_core/fromnumeric.py", line 3164, in max
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335]     return _wrapreduction(a, np.maximum, 'max', axis, None, out,
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335]   File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/numpy/_core/fromnumeric.py", line 86, in _wrapreduction
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335]     return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335] ValueError: zero-size array to reduction operation maximum which has no identity

how can I solve this issue?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions