-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Open
Description
I started vllm server, the log is below
============================================================
VibeVoice vLLM ASR Server - One-Click Deployment
============================================================
============================================================
Downloading model: microsoft/VibeVoice-ASR
============================================================
Fetching 17 files: 100%|███████████████████████████████████████████████████████████████████████████████| 17/17 [00:00<00:00, 200289.80it/s]
============================================================
✅ Model downloaded successfully!
📁 Path: /root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944
============================================================
============================================================
Generating tokenizer files
============================================================
=== Generating VibeVoice tokenizer files to /root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944 ===
Downloading vocab.json from Qwen/Qwen2.5-7B...
/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/huggingface_hub/file_download.py:982: UserWarning: `local_dir_use_symlinks` parameter is deprecated and will be ignored. The process to download files to a local folder has been updated and do not rely on symlinks anymore. You only need to pass a destination folder as`local_dir`.
For more details, check out https://huggingface.co/docs/huggingface_hub/main/en/guides/download#download-files-to-local-folder.
warnings.warn(
Downloading merges.txt from Qwen/Qwen2.5-7B...
Downloading tokenizer.json from Qwen/Qwen2.5-7B...
Downloading tokenizer_config.json from Qwen/Qwen2.5-7B...
Patched /root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944/tokenizer_config.json
Patched /root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944/tokenizer.json
Generated /root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944/added_tokens.json
Generated /root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944/special_tokens_map.json
✅ All 6 tokenizer files generated in /root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944
============================================================
Starting vLLM server on port 8000
============================================================
(APIServer pid=2309206) INFO 02-12 22:38:37 [api_server.py:1272] vLLM API server version 0.14.1
(APIServer pid=2309206) INFO 02-12 22:38:37 [utils.py:263] non-default args: {'model_tag': '/root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944', 'chat_template_content_format': 'openai', 'model': '/root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944', 'trust_remote_code': True, 'dtype': 'bfloat16', 'allowed_local_media_path': '/root/podcast-asr-service', 'max_model_len': 65536, 'enforce_eager': True, 'served_model_name': ['vibevoice'], 'gpu_memory_utilization': 0.8, 'enable_prefix_caching': False, 'max_num_batched_tokens': 32768, 'max_num_seqs': 64, 'enable_chunked_prefill': True}
(APIServer pid=2309206) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=2309206) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=2309206) INFO 02-12 22:38:37 [model.py:530] Resolved architecture: VibeVoiceForASRTraining
(APIServer pid=2309206) INFO 02-12 22:38:37 [model.py:1866] Downcasting torch.float32 to torch.bfloat16.
(APIServer pid=2309206) INFO 02-12 22:38:37 [model.py:1545] Using max model len 65536
(APIServer pid=2309206) INFO 02-12 22:38:37 [scheduler.py:229] Chunked prefill is enabled with max_num_batched_tokens=32768.
(APIServer pid=2309206) INFO 02-12 22:38:37 [vllm.py:630] Asynchronous scheduling is enabled.
(APIServer pid=2309206) INFO 02-12 22:38:37 [vllm.py:637] Disabling NCCL for DP synchronization when using async scheduling.
(APIServer pid=2309206) WARNING 02-12 22:38:37 [vllm.py:665] Enforce eager set, overriding optimization level to -O0
(APIServer pid=2309206) INFO 02-12 22:38:37 [vllm.py:765] Cudagraph is disabled under eager mode
(APIServer pid=2309206) The tokenizer you are loading from '/root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue.
(APIServer pid=2309206) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
(EngineCore_DP0 pid=2309990) INFO 02-12 22:38:43 [core.py:97] Initializing a V1 LLM engine (v0.14.1) with config: model='/root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944', speculative_config=None, tokenizer='/root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=65536, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=True, enable_return_routed_experts=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='', reasoning_parser_plugin='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, kv_cache_metrics=False, kv_cache_metrics_sample=0.01, cudagraph_metrics=False, enable_layerwise_nvtx_tracing=False, enable_mfu_metrics=False, enable_mm_processor_stats=False, enable_logging_iteration_details=False), seed=0, served_model_name=vibevoice, enable_prefix_caching=False, enable_chunked_prefill=True, pooler_config=None, compilation_config={'level': None, 'mode': <CompilationMode.NONE: 0>, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['all'], 'splitting_ops': [], 'compile_mm_encoder': False, 'compile_sizes': [], 'compile_ranges_split_points': [32768], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'combo_kernels': True, 'benchmark_combo_kernel': True}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.NONE: 0>, 'cudagraph_num_of_warmups': 0, 'cudagraph_capture_sizes': [], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': False, 'fuse_attn_quant': False, 'eliminate_noops': False, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': False}, 'max_cudagraph_capture_size': 0, 'dynamic_shapes_config': {'type': <DynamicShapesType.BACKED: 'backed'>, 'evaluate_guards': False, 'assume_32_bit_indexing': True}, 'local_cache_dir': None}
(EngineCore_DP0 pid=2309990) The tokenizer you are loading from '/root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue.
(EngineCore_DP0 pid=2309990) INFO 02-12 22:38:44 [parallel_state.py:1214] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://10.3.194.79:46259 backend=nccl
(EngineCore_DP0 pid=2309990) INFO 02-12 22:38:44 [parallel_state.py:1425] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 0, EP rank N/A
(EngineCore_DP0 pid=2309990) /root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/transformers/audio_utils.py:525: UserWarning: At least one mel filter has all zero values. The value for `num_mel_filters` (128) may be set too high. Or, the value for `num_frequency_bins` (201) may be set too low.
(EngineCore_DP0 pid=2309990) warnings.warn(
(EngineCore_DP0 pid=2309990) INFO 02-12 22:38:44 [gpu_model_runner.py:3808] Starting to load model /root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944...
(EngineCore_DP0 pid=2309990) `torch_dtype` is deprecated! Use `dtype` instead!
(EngineCore_DP0 pid=2309990) INFO 02-12 22:38:44 [vllm.py:630] Asynchronous scheduling is enabled.
(EngineCore_DP0 pid=2309990) WARNING 02-12 22:38:44 [vllm.py:672] Inductor compilation was disabled by user settings, optimizations settings that are only active during inductor compilation will be ignored.
(EngineCore_DP0 pid=2309990) INFO 02-12 22:38:44 [vllm.py:765] Cudagraph is disabled under eager mode
(EngineCore_DP0 pid=2309990) /root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/tvm_ffi/_optional_torch_c_dlpack.py:174: UserWarning: Failed to JIT torch c dlpack extension, EnvTensorAllocator will not be enabled.
(EngineCore_DP0 pid=2309990) We recommend installing via `pip install torch-c-dlpack-ext`
(EngineCore_DP0 pid=2309990) warnings.warn(
(EngineCore_DP0 pid=2309990) INFO 02-12 22:38:45 [cuda.py:351] Using FLASH_ATTN attention backend out of potential backends: ('FLASH_ATTN', 'FLASHINFER', 'TRITON_ATTN', 'FLEX_ATTENTION')
(EngineCore_DP0 pid=2309990) [VibeVoice] Converted acoustic_tokenizer to torch.float32 (was torch.bfloat16)
(EngineCore_DP0 pid=2309990) [VibeVoice] Converted semantic_tokenizer to torch.float32 (was torch.bfloat16)
(EngineCore_DP0 pid=2309990) [VibeVoice] Converted acoustic_connector to torch.float32 (was torch.bfloat16)
(EngineCore_DP0 pid=2309990) [VibeVoice] Converted semantic_connector to torch.float32 (was torch.bfloat16)
Loading safetensors checkpoint shards: 0% Completed | 0/8 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 12% Completed | 1/8 [00:00<00:02, 2.92it/s]
Loading safetensors checkpoint shards: 25% Completed | 2/8 [00:00<00:01, 3.35it/s]
Loading safetensors checkpoint shards: 38% Completed | 3/8 [00:00<00:01, 3.01it/s]
Loading safetensors checkpoint shards: 50% Completed | 4/8 [00:01<00:01, 3.97it/s]
Loading safetensors checkpoint shards: 62% Completed | 5/8 [00:01<00:00, 3.92it/s]
Loading safetensors checkpoint shards: 75% Completed | 6/8 [00:01<00:00, 3.81it/s]
Loading safetensors checkpoint shards: 88% Completed | 7/8 [00:01<00:00, 3.76it/s]
Loading safetensors checkpoint shards: 100% Completed | 8/8 [00:02<00:00, 3.75it/s]
Loading safetensors checkpoint shards: 100% Completed | 8/8 [00:02<00:00, 3.65it/s]
(EngineCore_DP0 pid=2309990)
(EngineCore_DP0 pid=2309990) INFO 02-12 22:38:47 [default_loader.py:291] Loading weights took 2.21 seconds
(EngineCore_DP0 pid=2309990) INFO 02-12 22:38:47 [gpu_model_runner.py:3905] Model loading took 18.22 GiB memory and 2.694686 seconds
(EngineCore_DP0 pid=2309990) INFO 02-12 22:38:48 [gpu_model_runner.py:4715] Encoder cache will be initialized with a budget of 32768 tokens, and profiled with 145 audio items of the maximum feature size.
(EngineCore_DP0 pid=2309990) /root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/transformers/audio_utils.py:525: UserWarning: At least one mel filter has all zero values. The value for `num_mel_filters` (128) may be set too high. Or, the value for `num_frequency_bins` (201) may be set too low.
(EngineCore_DP0 pid=2309990) warnings.warn(
(EngineCore_DP0 pid=2309990) INFO 02-12 22:39:22 [gpu_worker.py:358] Available KV cache memory: 12.52 GiB
(EngineCore_DP0 pid=2309990) INFO 02-12 22:39:22 [kv_cache_utils.py:1305] GPU KV cache size: 234,432 tokens
(EngineCore_DP0 pid=2309990) INFO 02-12 22:39:22 [kv_cache_utils.py:1310] Maximum concurrency for 65,536 tokens per request: 3.58x
(EngineCore_DP0 pid=2309990) INFO 02-12 22:39:22 [core.py:273] init engine (profile, create kv cache, warmup model) took 34.80 seconds
(EngineCore_DP0 pid=2309990) WARNING 02-12 22:39:22 [vllm.py:672] Inductor compilation was disabled by user settings, optimizations settings that are only active during inductor compilation will be ignored.
(EngineCore_DP0 pid=2309990) INFO 02-12 22:39:22 [vllm.py:765] Cudagraph is disabled under eager mode
(APIServer pid=2309206) INFO 02-12 22:39:23 [api_server.py:1014] Supported tasks: ['generate', 'transcription']
(APIServer pid=2309206) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=2309206) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=2309206) INFO 02-12 22:39:23 [serving_chat.py:182] Warming up chat template processing...
(APIServer pid=2309206) INFO 02-12 22:39:23 [serving_chat.py:218] Chat template warmup completed in 29.0ms
(APIServer pid=2309206) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=2309206) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=2309206) INFO 02-12 22:39:23 [speech_to_text.py:138] Warming up audio preprocessing libraries...
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] Audio preprocessing warmup failed (non-fatal): %s. First request may experience higher latency.
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] Traceback (most recent call last):
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/transformers/feature_extraction_utils.py", line 529, in get_feature_extractor_dict
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] resolved_feature_extractor_file = resolved_feature_extractor_files[0]
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] IndexError: list index out of range
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] During handling of the above exception, another exception occurred:
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] Traceback (most recent call last):
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/vllm/entrypoints/openai/speech_to_text.py", line 152, in _warmup_audio_preprocessing
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] processor = cached_processor_from_config(self.model_config)
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/vllm/transformers_utils/processor.py", line 251, in cached_processor_from_config
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] return cached_get_processor_without_dynamic_kwargs(
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/vllm/transformers_utils/processor.py", line 210, in cached_get_processor_without_dynamic_kwargs
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] processor = cached_get_processor(
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/vllm/transformers_utils/processor.py", line 120, in get_processor
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] processor = AutoProcessor.from_pretrained(
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/transformers/models/auto/processing_auto.py", line 401, in from_pretrained
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] return PROCESSOR_MAPPING[type(config)].from_pretrained(pretrained_model_name_or_path, **kwargs)
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/transformers/processing_utils.py", line 1394, in from_pretrained
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, **kwargs)
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/transformers/processing_utils.py", line 1453, in _get_arguments_from_pretrained
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] args.append(attribute_class.from_pretrained(pretrained_model_name_or_path, **kwargs))
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/transformers/feature_extraction_utils.py", line 382, in from_pretrained
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] feature_extractor_dict, kwargs = cls.get_feature_extractor_dict(pretrained_model_name_or_path, **kwargs)
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/transformers/feature_extraction_utils.py", line 536, in get_feature_extractor_dict
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] raise OSError(
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] OSError: Can't load feature extractor for '/root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure '/root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944' is the correct path to a directory containing a preprocessor_config.json file
(APIServer pid=2309206) INFO 02-12 22:39:23 [speech_to_text.py:201] Warming up multimodal input processor...
(APIServer pid=2309206) INFO 02-12 22:39:23 [speech_to_text.py:234] Input processor warmup completed in 0.00s
(APIServer pid=2309206) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=2309206) INFO 02-12 22:39:23 [speech_to_text.py:138] Warming up audio preprocessing libraries...
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] Audio preprocessing warmup failed (non-fatal): %s. First request may experience higher latency.
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] Traceback (most recent call last):
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/transformers/feature_extraction_utils.py", line 529, in get_feature_extractor_dict
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] resolved_feature_extractor_file = resolved_feature_extractor_files[0]
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] IndexError: list index out of range
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] During handling of the above exception, another exception occurred:
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177]
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] Traceback (most recent call last):
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/vllm/entrypoints/openai/speech_to_text.py", line 152, in _warmup_audio_preprocessing
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] processor = cached_processor_from_config(self.model_config)
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/vllm/transformers_utils/processor.py", line 251, in cached_processor_from_config
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] return cached_get_processor_without_dynamic_kwargs(
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/vllm/transformers_utils/processor.py", line 210, in cached_get_processor_without_dynamic_kwargs
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] processor = cached_get_processor(
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/vllm/transformers_utils/processor.py", line 120, in get_processor
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] processor = AutoProcessor.from_pretrained(
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/transformers/models/auto/processing_auto.py", line 401, in from_pretrained
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] return PROCESSOR_MAPPING[type(config)].from_pretrained(pretrained_model_name_or_path, **kwargs)
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/transformers/processing_utils.py", line 1394, in from_pretrained
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, **kwargs)
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/transformers/processing_utils.py", line 1453, in _get_arguments_from_pretrained
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] args.append(attribute_class.from_pretrained(pretrained_model_name_or_path, **kwargs))
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/transformers/feature_extraction_utils.py", line 382, in from_pretrained
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] feature_extractor_dict, kwargs = cls.get_feature_extractor_dict(pretrained_model_name_or_path, **kwargs)
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/transformers/feature_extraction_utils.py", line 536, in get_feature_extractor_dict
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] raise OSError(
(APIServer pid=2309206) ERROR 02-12 22:39:23 [speech_to_text.py:177] OSError: Can't load feature extractor for '/root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure '/root/.cache/huggingface/hub/models--microsoft--VibeVoice-ASR/snapshots/d0c9efdb8d614685062c04425d91e01b6f37d944' is the correct path to a directory containing a preprocessor_config.json file
(APIServer pid=2309206) INFO 02-12 22:39:23 [speech_to_text.py:201] Warming up multimodal input processor...
(APIServer pid=2309206) INFO 02-12 22:39:23 [speech_to_text.py:234] Input processor warmup completed in 0.00s
(APIServer pid=2309206) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=2309206) INFO 02-12 22:39:23 [api_server.py:1346] Starting vLLM API server 0 on http://0.0.0.0:8000
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:38] Available routes are:
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /openapi.json, Methods: GET, HEAD
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /docs, Methods: GET, HEAD
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /docs/oauth2-redirect, Methods: GET, HEAD
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /redoc, Methods: GET, HEAD
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /scale_elastic_ep, Methods: POST
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /is_scaling_elastic_ep, Methods: POST
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /tokenize, Methods: POST
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /detokenize, Methods: POST
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /inference/v1/generate, Methods: POST
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /pause, Methods: POST
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /resume, Methods: POST
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /is_paused, Methods: GET
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /metrics, Methods: GET
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /health, Methods: GET
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /load, Methods: GET
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /v1/models, Methods: GET
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /version, Methods: GET
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /v1/responses, Methods: POST
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /v1/responses/{response_id}, Methods: GET
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /v1/responses/{response_id}/cancel, Methods: POST
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /v1/messages, Methods: POST
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /v1/chat/completions, Methods: POST
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /v1/completions, Methods: POST
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /v1/audio/transcriptions, Methods: POST
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /v1/audio/translations, Methods: POST
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /ping, Methods: GET
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /ping, Methods: POST
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /invocations, Methods: POST
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /classify, Methods: POST
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /v1/embeddings, Methods: POST
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /score, Methods: POST
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /v1/score, Methods: POST
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /rerank, Methods: POST
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /v1/rerank, Methods: POST
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /v2/rerank, Methods: POST
(APIServer pid=2309206) INFO 02-12 22:39:23 [launcher.py:46] Route: /pooling, Methods: POST
(APIServer pid=2309206) INFO: Started server process [2309206]
(APIServer pid=2309206) INFO: Waiting for application startup.
(APIServer pid=2309206) INFO: Application startup complete.
then I run with a local audio file
Loading audio from: /root/podcast-asr-service/tmp/698b401ee329a4d9e48af376/segments/1.m4a
Audio duration: 1800.02 seconds
Audio size: 29111558 bytes
Sending request to http://localhost:8000/v1/chat/completions (Streaming Mode)...
Prompt: This is a 1800.02 seconds audio, please transcribe it with these keys: Start time, End time, Speaker ID, Content
finally I got error
(APIServer pid=2309206) /root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/numpy/_core/fromnumeric.py:3860: RuntimeWarning: Mean of empty slice.
(APIServer pid=2309206) return _methods._mean(a, axis=axis, dtype=dtype,
(APIServer pid=2309206) /root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/numpy/_core/_methods.py:145: RuntimeWarning: invalid value encountered in divide
(APIServer pid=2309206) ret = ret.dtype.type(ret / rcount)
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335] Error in preprocessing prompt inputs
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335] Traceback (most recent call last):
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335] File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/vllm/entrypoints/openai/serving_chat.py", line 313, in create_chat_completion
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335] conversation, engine_prompts = await self._preprocess_chat(
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335] File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/vllm/entrypoints/openai/serving_engine.py", line 1234, in _preprocess_chat
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335] mm_data = await mm_data_future
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335] ^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335] File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/vllm/entrypoints/chat_utils.py", line 819, in all_mm_data
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335] modality: await asyncio.gather(*coros)
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335] File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/vllm/multimodal/utils.py", line 242, in fetch_audio_async
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335] return await self.load_from_url_async(
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335] File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/vllm/multimodal/utils.py", line 208, in load_from_url_async
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335] return await future
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335] ^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335] File "/usr/lib64/python3.12/concurrent/futures/thread.py", line 58, in run
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335] result = self.fn(*self.args, **self.kwargs)
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335] File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/vllm/multimodal/utils.py", line 117, in _load_data_url
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335] return media_io.load_base64(media_type, data)
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335] File "/root/podcast-asr-service/toolchain/lib/VibeVoice/vllm_plugin/model.py", line 85, in load_base64
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335] return _ffmpeg_load_bytes(base64.b64decode(data), media_type=media_type)
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335] File "/root/podcast-asr-service/toolchain/lib/VibeVoice/vllm_plugin/model.py", line 60, in _ffmpeg_load_bytes
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335] audio = normalizer(audio)
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335] ^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335] File "/root/podcast-asr-service/toolchain/lib/VibeVoice/vibevoice/processor/audio_utils.py", line 216, in __call__
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335] audio, _ = self.avoid_clipping(audio)
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335] ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335] File "/root/podcast-asr-service/toolchain/lib/VibeVoice/vibevoice/processor/audio_utils.py", line 195, in avoid_clipping
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335] max_val = np.max(np.abs(audio))
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335] ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335] File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/numpy/_core/fromnumeric.py", line 3164, in max
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335] return _wrapreduction(a, np.maximum, 'max', axis, None, out,
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335] File "/root/podcast-asr-service/toolchain/.venv/lib64/python3.12/site-packages/numpy/_core/fromnumeric.py", line 86, in _wrapreduction
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335] return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2309206) ERROR 02-12 23:05:42 [serving_chat.py:335] ValueError: zero-size array to reduction operation maximum which has no identity
how can I solve this issue?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels