forked from vllm-project/vllm
-
Notifications
You must be signed in to change notification settings - Fork 8
Closed
Labels
MajorMajor BugMajor BugbugSomething isn't workingSomething isn't workinghigh prioritynew feature bug
Milestone
Description
Your current environment
The output of vllm python collect_env.py
vllm commit link: c1378b8
The output of vllm-ascend python collect_env.py
vllm-ascend commit link: f78db0894660f3e64afb29b204aeb204806ffe08
The output of llm-service
commit link: 5c37e8dbc71bfefd0c0fc2e00cca219221000e21🐛 Describe the bug
Run the following command to reproduce the error:
ASCEND_RT_VISIBLE_DEVICES=0,1,2,3 python3 -m vllm.entrypoints.openai.api_server --model Qwen3-VL-30B-A3B-Instruct/ --gpu-memory-utilization 0.9 --port 13808 --enforce-eager --enable-request-id-headers --no-enable-prefix-caching --max-num-batched-tokens 18000 --max-num-seqs 128 --no-enable-prefix-caching --tensor-parallel-size 4 --max-model-len 18000Error output:
(EngineCore_DP0 pid=1532787) INFO 12-12 12:28:11 [shm_broadcast.py:466] No available shared memory broadcast block found in 60 seconds. This typically happens when some processes are hanging or doing some time-consuming work (e.g. compilation). (EngineCore_DP0 pid=1532787) INFO 12-12 12:29:11 [shm_broadcast.py:466] No available shared memory broadcast block found in 60 seconds. This typically happens when some processes are hanging or doing some time-consuming work (e.g. compilation). (EngineCore_DP0 pid=1532787) INFO 12-12 12:30:11 [shm_broadcast.py:466] No available shared memory broadcast block found in 60 seconds. This typically happens when some processes are hanging or doing some time-consuming work (e.g. compilation). (EngineCore_DP0 pid=1532787) INFO 12-12 12:31:11 [shm_broadcast.py:466] No available shared memory broadcast block found in 60 seconds. This typically happens when some processes are hanging or doing some time-consuming work (e.g. compilation). (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:69] Dumping input data for V1 LLM engine (v0.11.0) with config: model='/data/models/Qwen3-VL-30B-A3B-Instruct/', speculative_config=None, tokenizer='/data/models/Qwen3-VL-30B-A3B-Instruct/', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=10000, download_dir=None, load_format=auto, tensor_parallel_size=4, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=True, kv_cache_dtype=auto, device_config=npu, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=/data/models/Qwen3-VL-30B-A3B-Instruct/, enable_prefix_caching=False, chunked_prefill_enabled=True, pooler_config=None, compilation_config={"level":0,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":["all"],"splitting_ops":null,"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":0,"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"use_inductor_graph_partition":false,"pass_config":{},"max_capture_size":0,"local_cache_dir":null}, (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] Dumping scheduler output for model execution: SchedulerOutput(scheduled_new_reqs=[NewRequestData(req_id=chatcmpl-47adc6f12a004694a5457d72b73910fd,prompt_token_ids_len=826,mm_features=[MultiModalFeatureSpec(data={'image_grid_thw': MultiModalFieldElem(modality='image', key='image_grid_thw', data=tensor([ 1, 50, 64]), field=MultiModalBatchedField()), 'pixel_values': MultiModalFieldElem(modality='image', key='pixel_values', data=tensor([[0.9922, 0.9922, 0.9844, ..., 1.0000, 1.0000, 1.0000], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [1.0000, 1.0000, 1.0000, ..., 1.0000, 1.0000, 1.0000], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [1.0000, 1.0000, 1.0000, ..., 1.0000, 1.0000, 1.0000], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] ..., (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [1.0000, 1.0000, 1.0000, ..., 1.0000, 1.0000, 1.0000], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [1.0000, 1.0000, 1.0000, ..., 1.0000, 1.0000, 1.0000], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [1.0000, 1.0000, 1.0000, ..., 1.0000, 1.0000, 1.0000]], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] dtype=torch.bfloat16), field=MultiModalFlatField(slices=[[slice(0, 3200, None)]], dim=0))}, modality='image', identifier='9aa5cfebd71d6edf0bd31a6e929c7e67f1720101d2b4ac0c98bd154948865243', mm_position=PlaceholderRange(offset=4, length=800, is_embed=None))],sampling_params=SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.0, top_p=1.0, top_k=0, min_p=0.0, seed=77, stop=[], stop_token_ids=[151643], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=2048, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, structured_outputs=None, extra_args=None),block_ids=([698, 997, 998, 999, 1000, 893, 894],),num_computed_tokens=0,lora_request=None,prompt_embeds_shape=None), NewRequestData(req_id=chatcmpl-ce8258fc5ef34634a97b50bcefcf369f,prompt_token_ids_len=794,mm_features=[MultiModalFeatureSpec(data={'image_grid_thw': MultiModalFieldElem(modality='image', key='image_grid_thw', data=tensor([ 1, 48, 64]), field=MultiModalBatchedField()), 'pixel_values': MultiModalFieldElem(modality='image', key='pixel_values', data=tensor([[-0.0275, -0.0275, -0.0510, ..., -0.6172, -0.5938, -0.5625], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [-0.0669, -0.0354, -0.0275, ..., -0.6641, -0.6641, -0.6250], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [-0.0825, -0.0981, -0.0825, ..., -0.5938, -0.5859, -0.5859], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] ..., (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [-0.7500, -0.7344, -0.7344, ..., -0.9375, -0.9375, -0.9375], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [-0.7266, -0.7578, -0.7812, ..., -0.9375, -0.9141, -0.8828], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [-0.8047, -0.7891, -0.7266, ..., -0.9141, -1.0000, -1.0000]], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] dtype=torch.bfloat16), field=MultiModalFlatField(slices=[[slice(0, 3072, None)]], dim=0))}, modality='image', identifier='b7a2987df2467b785f90a460fe2bfd5510ec2d313a4aa69de2cc487d5d5c0b2f', mm_position=PlaceholderRange(offset=4, length=768, is_embed=None))],sampling_params=SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.0, top_p=1.0, top_k=0, min_p=0.0, seed=77, stop=[], stop_token_ids=[151643], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=2048, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, structured_outputs=None, extra_args=None),block_ids=([895, 896, 897, 898, 899, 1213, 1214],),num_computed_tokens=0,lora_request=None,prompt_embeds_shape=None), NewRequestData(req_id=chatcmpl-f7e2b6dc767c4340b7597b8b81e8b97c,prompt_token_ids_len=702,mm_features=[MultiModalFeatureSpec(data={'image_grid_thw': MultiModalFieldElem(modality='image', key='image_grid_thw', data=tensor([ 1, 42, 64]), field=MultiModalBatchedField()), 'pixel_values': MultiModalFieldElem(modality='image', key='pixel_values', data=tensor([[ 0.1611, 0.1611, 0.1611, ..., -0.1533, -0.1533, -0.1533], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [ 0.1846, 0.1846, 0.1846, ..., -0.1299, -0.1216, -0.1216], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [ 0.1846, 0.1924, 0.1924, ..., -0.1377, -0.1299, -0.1216], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] ..., (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [ 0.0039, -0.0039, 0.0039, ..., -0.2871, -0.2637, -0.2871], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [ 0.0275, 0.0275, 0.0275, ..., -0.2793, -0.2871, -0.2871], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [-0.0039, 0.0118, 0.0197, ..., -0.2871, -0.2949, -0.3105]], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] dtype=torch.bfloat16), field=MultiModalFlatField(slices=[[slice(0, 2688, None)]], dim=0))}, modality='image', identifier='1d41b91d62557f18556549c9fc67c39935645f1cec7c277dcfa194bd13eff451', mm_position=PlaceholderRange(offset=4, length=672, is_embed=None))],sampling_params=SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.0, top_p=1.0, top_k=0, min_p=0.0, seed=77, stop=[], stop_token_ids=[151643], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=2048, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, structured_outputs=None, extra_args=None),block_ids=([1215, 1216, 1198, 1199, 1228, 1229],),num_computed_tokens=0,lora_request=None,prompt_embeds_shape=None), NewRequestData(req_id=chatcmpl-7cbc224c6b424eae9e884e3e93ffdb67,prompt_token_ids_len=793,mm_features=[MultiModalFeatureSpec(data={'image_grid_thw': MultiModalFieldElem(modality='image', key='image_grid_thw', data=tensor([ 1, 48, 64]), field=MultiModalBatchedField()), 'pixel_values': MultiModalFieldElem(modality='image', key='pixel_values', data=tensor([[-0.0275, -0.0275, -0.0510, ..., -0.6172, -0.5938, -0.5625], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [-0.0669, -0.0354, -0.0275, ..., -0.6641, -0.6641, -0.6250], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [-0.0825, -0.0981, -0.0825, ..., -0.5938, -0.5859, -0.5859], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] ..., (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [-0.7500, -0.7344, -0.7344, ..., -0.9375, -0.9375, -0.9375], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [-0.7266, -0.7578, -0.7812, ..., -0.9375, -0.9141, -0.8828], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [-0.8047, -0.7891, -0.7266, ..., -0.9141, -1.0000, -1.0000]], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] dtype=torch.bfloat16), field=MultiModalFlatField(slices=[[slice(0, 3072, None)]], dim=0))}, modality='image', identifier='b7a2987df2467b785f90a460fe2bfd5510ec2d313a4aa69de2cc487d5d5c0b2f', mm_position=PlaceholderRange(offset=4, length=768, is_embed=None))],sampling_params=SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.0, top_p=1.0, top_k=0, min_p=0.0, seed=77, stop=[], stop_token_ids=[151643], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=2048, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, structured_outputs=None, extra_args=None),block_ids=([1230, 1231, 1211, 1212, 1256, 1257, 1258],),num_computed_tokens=0,lora_request=None,prompt_embeds_shape=None), NewRequestData(req_id=chatcmpl-c00326c2f3504405bde8fb442cfebfdd,prompt_token_ids_len=796,mm_features=[MultiModalFeatureSpec(data={'image_grid_thw': MultiModalFieldElem(modality='image', key='image_grid_thw', data=tensor([ 1, 48, 64]), field=MultiModalBatchedField()), 'pixel_values': MultiModalFieldElem(modality='image', key='pixel_values', data=tensor([[-0.0275, 0.0039, -0.0118, ..., 0.2002, 0.1689, 0.1768], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [ 0.3105, 0.3105, 0.2949, ..., 0.1060, 0.0825, 0.0669], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [-0.0118, 0.0197, 0.0275, ..., 0.1924, 0.2002, 0.2002], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] ..., (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [ 0.3965, 0.4043, 0.4355, ..., -0.1377, -0.1455, -0.1455], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [ 0.3965, 0.3887, 0.3887, ..., -0.1689, -0.1846, -0.1689], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [ 0.3887, 0.4121, 0.3965, ..., -0.2637, -0.3652, -0.4199]], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] dtype=torch.bfloat16), field=MultiModalFlatField(slices=[[slice(0, 3072, None)]], dim=0))}, modality='image', identifier='90d4b40e1aaccc35741d4adbc632bfe125d33bb8b65acdfdbd94b30b77683768', mm_position=PlaceholderRange(offset=4, length=768, is_embed=None))],sampling_params=SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.0, top_p=1.0, top_k=0, min_p=0.0, seed=77, stop=[], stop_token_ids=[151643], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=2048, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, structured_outputs=None, extra_args=None),block_ids=([1259, 1260, 1261, 1318, 1319, 1293, 1294],),num_computed_tokens=0,lora_request=None,prompt_embeds_shape=None), NewRequestData(req_id=chatcmpl-47920a3a434246af9e1cb483cdb94f8b,prompt_token_ids_len=798,mm_features=[MultiModalFeatureSpec(data={'image_grid_thw': MultiModalFieldElem(modality='image', key='image_grid_thw', data=tensor([ 1, 48, 64]), field=MultiModalBatchedField()), 'pixel_values': MultiModalFieldElem(modality='image', key='pixel_values', data=tensor([[-0.0275, 0.0039, -0.0118, ..., 0.2002, 0.1689, 0.1768], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [ 0.3105, 0.3105, 0.2949, ..., 0.1060, 0.0825, 0.0669], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [-0.0118, 0.0197, 0.0275, ..., 0.1924, 0.2002, 0.2002], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] ..., (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [ 0.3965, 0.4043, 0.4355, ..., -0.1377, -0.1455, -0.1455], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [ 0.3965, 0.3887, 0.3887, ..., -0.1689, -0.1846, -0.1689], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [ 0.3887, 0.4121, 0.3965, ..., -0.2637, -0.3652, -0.4199]], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] dtype=torch.bfloat16), field=MultiModalFlatField(slices=[[slice(0, 3072, None)]], dim=0))}, modality='image', identifier='90d4b40e1aaccc35741d4adbc632bfe125d33bb8b65acdfdbd94b30b77683768', mm_position=PlaceholderRange(offset=4, length=768, is_embed=None))],sampling_params=SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.0, top_p=1.0, top_k=0, min_p=0.0, seed=77, stop=[], stop_token_ids=[151643], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=2048, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, structured_outputs=None, extra_args=None),block_ids=([1295, 1296, 1322, 1323, 1324, 1314, 1315],),num_computed_tokens=0,lora_request=None,prompt_embeds_shape=None)], scheduled_cached_reqs=CachedRequestData(req_ids=['chatcmpl-5ce7be5891984269afcb9e69951c03a2', 'chatcmpl-38c0f3b0324e41fa80445e94ea376ad8', 'chatcmpl-bb7abdd8cd9646c8b19061db190d478e', 'chatcmpl-5e48126b1bff4cbf9b32f95235878c00', 'chatcmpl-63e6f87bdb4d46cdb195647acaaf31af', 'chatcmpl-8a2a96bbe96f4a03be158c97a591ae78', 'chatcmpl-98ddce561aff4485a62193d238d4bdb5', 'chatcmpl-4504192b0b6c4e9c83d8e1f3457f0a54', 'chatcmpl-a9ffdec7f4a445b780d4fd36ff417dd3', 'chatcmpl-cb3b31082f624b1a970b7523566157a6', 'chatcmpl-91da0eeda2e045aba9a9caee61c07f82', 'chatcmpl-75bd4088b3264e0aa13976728b6e0886', 'chatcmpl-c19032e7951e4ecea3aed5667da5c39e', 'chatcmpl-9d0bdd59b5764c7eb87ca01e2c536900', 'chatcmpl-aa5b82a1b0ae4799b9c2e496f6f5244f', 'chatcmpl-5a5ff59d47f447e2a2262a825088a41d', 'chatcmpl-99b235a23fb440cb84ad6776bf95c36a', 'chatcmpl-5d82771fba9442c9955f47e6069c980c', 'chatcmpl-6066ac058d7c42459299ac31f8eb1ba9', 'chatcmpl-25f63f598dbb42bdb1eff1306be9ea1b', 'chatcmpl-09aed0366a37455385c03583047a6cf7', 'chatcmpl-df72cdd54a8046e8ba98634046335f26', 'chatcmpl-1bbc501aa189461da831e3321f73bac3', 'chatcmpl-17217b7df00d49f4939e9b8ee3acc0eb', 'chatcmpl-ee661349c5ba4328ab83b6cc5a8882dc'], resumed_from_preemption=[false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false], new_token_ids=[], new_block_ids=[null, null, null, null, null, null, null, null, null, null, null, [[697]], null, null, null, null, null, null, null, null, null, null, null, null, null], num_computed_tokens=[706, 706, 670, 799, 797, 702, 705, 797, 796, 704, 700, 768, 765, 604, 798, 797, 700, 697, 797, 800, 799, 797, 700, 697, 832]), num_scheduled_tokens={chatcmpl-5d82771fba9442c9955f47e6069c980c: 1, chatcmpl-bb7abdd8cd9646c8b19061db190d478e: 1, chatcmpl-6066ac058d7c42459299ac31f8eb1ba9: 1, chatcmpl-38c0f3b0324e41fa80445e94ea376ad8: 1, chatcmpl-99b235a23fb440cb84ad6776bf95c36a: 1, chatcmpl-c19032e7951e4ecea3aed5667da5c39e: 1, chatcmpl-17217b7df00d49f4939e9b8ee3acc0eb: 1, chatcmpl-4504192b0b6c4e9c83d8e1f3457f0a54: 1, chatcmpl-7cbc224c6b424eae9e884e3e93ffdb67: 793, chatcmpl-ce8258fc5ef34634a97b50bcefcf369f: 794, chatcmpl-5e48126b1bff4cbf9b32f95235878c00: 1, chatcmpl-a9ffdec7f4a445b780d4fd36ff417dd3: 1, chatcmpl-8a2a96bbe96f4a03be158c97a591ae78: 1, chatcmpl-63e6f87bdb4d46cdb195647acaaf31af: 1, chatcmpl-df72cdd54a8046e8ba98634046335f26: 1, chatcmpl-47adc6f12a004694a5457d72b73910fd: 826, chatcmpl-09aed0366a37455385c03583047a6cf7: 1, chatcmpl-47920a3a434246af9e1cb483cdb94f8b: 798, chatcmpl-98ddce561aff4485a62193d238d4bdb5: 1, chatcmpl-cb3b31082f624b1a970b7523566157a6: 1, chatcmpl-5ce7be5891984269afcb9e69951c03a2: 1, chatcmpl-ee661349c5ba4328ab83b6cc5a8882dc: 1, chatcmpl-25f63f598dbb42bdb1eff1306be9ea1b: 1, chatcmpl-c00326c2f3504405bde8fb442cfebfdd: 796, chatcmpl-aa5b82a1b0ae4799b9c2e496f6f5244f: 1, chatcmpl-f7e2b6dc767c4340b7597b8b81e8b97c: 702, chatcmpl-9d0bdd59b5764c7eb87ca01e2c536900: 1, chatcmpl-91da0eeda2e045aba9a9caee61c07f82: 1, chatcmpl-1bbc501aa189461da831e3321f73bac3: 1, chatcmpl-75bd4088b3264e0aa13976728b6e0886: 1, chatcmpl-5a5ff59d47f447e2a2262a825088a41d: 1}, total_num_scheduled_tokens=4734, scheduled_spec_decode_tokens={}, scheduled_encoder_inputs={chatcmpl-f7e2b6dc767c4340b7597b8b81e8b97c: [0], chatcmpl-ce8258fc5ef34634a97b50bcefcf369f: [0], chatcmpl-c00326c2f3504405bde8fb442cfebfdd: [0]}, num_common_prefix_blocks=[0], finished_req_ids=['chatcmpl-0b6eb56742694a359f96602a12e54a18'], free_encoder_mm_hashes=['4305c2c3017eea60f2497b664cbf3c6ee327aa09cdf75fcbb496db1f15a45d20', 'b2c42b958566d9d27241c6580c7b60225559d5d7049bb2623e088e517d27d0da', '2608f628d1f799b1e16abe874a9e70af7f065fb9654a7c602e4d7bc03fba8e78'], structured_output_request_ids={}, grammar_bitmask=null, kv_connector_metadata=null, ec_connector_metadata=null) (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:79] Dumping scheduler stats: SchedulerStats(num_running_reqs=31, num_waiting_reqs=0, step_counter=0, current_wave=0, kv_cache_usage=0.0613608748481167, prefix_cache_stats=PrefixCacheStats(reset=False, requests=0, queries=0, hits=0), spec_decoding_stats=None, kv_connector_stats=None, num_corrupted_reqs=0) (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] EngineCore encountered a fatal error. (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] Traceback (most recent call last): (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 264, in collective_rpc (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] result = get_response(w, dequeue_timeout, (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 244, in get_response (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] status, result = w.worker_response_mq.dequeue( (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] File "/vllm-workspace/vllm/vllm/distributed/device_communicators/shm_broadcast.py", line 511, in dequeue (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] with self.acquire_read(timeout, cancel, indefinite) as buf: (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] File "/usr/local/python3.11.13/lib/python3.11/contextlib.py", line 137, in __enter__ (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] return next(self.gen) (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] ^^^^^^^^^^^^^^ (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] File "/vllm-workspace/vllm/vllm/distributed/device_communicators/shm_broadcast.py", line 460, in acquire_read (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] raise TimeoutError (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] TimeoutError (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] The above exception was the direct cause of the following exception: (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] Traceback (most recent call last): (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 701, in run_engine_core (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] engine_core.run_busy_loop() (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 728, in run_busy_loop (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] self._process_engine_step() (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 754, in _process_engine_step (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] outputs, model_executed = self.step_fn() (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] ^^^^^^^^^^^^^^ (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 284, in step (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] model_output = self.execute_model_with_error_logging( (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 270, in execute_model_with_error_logging (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] raise err (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 261, in execute_model_with_error_logging (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] return model_fn(scheduler_output) (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] ^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 181, in execute_model (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] (output, ) = self.collective_rpc( (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] ^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 273, in collective_rpc (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] raise TimeoutError(f"RPC call to {method} timed out.") from e (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] TimeoutError: RPC call to execute_model timed out. (Worker_TP1 pid=1532925) INFO 12-12 12:32:11 [multiproc_executor.py:558] Parent process exited, terminating worker (Worker_TP0 pid=1532924) INFO 12-12 12:32:11 [multiproc_executor.py:558] Parent process exited, terminating worker (Worker_TP2 pid=1532926) INFO 12-12 12:32:11 [multiproc_executor.py:558] Parent process exited, terminating worker (APIServer pid=1532514) ERROR 12-12 12:32:11 [async_llm.py:480] AsyncLLM output_handler failed. (APIServer pid=1532514) ERROR 12-12 12:32:11 [async_llm.py:480] Traceback (most recent call last): (APIServer pid=1532514) ERROR 12-12 12:32:11 [async_llm.py:480] File "/vllm-workspace/vllm/vllm/v1/engine/async_llm.py", line 439, in output_handler (APIServer pid=1532514) ERROR 12-12 12:32:11 [async_llm.py:480] outputs = await engine_core.get_output_async() (APIServer pid=1532514) ERROR 12-12 12:32:11 [async_llm.py:480] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1532514) ERROR 12-12 12:32:11 [async_llm.py:480] File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 846, in get_output_async (APIServer pid=1532514) ERROR 12-12 12:32:11 [async_llm.py:480] raise self._format_exception(outputs) from None (APIServer pid=1532514) ERROR 12-12 12:32:11 [async_llm.py:480] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause. (Worker_TP3 pid=1532927) INFO 12-12 12:32:11 [multiproc_executor.py:558] Parent process exited, terminating workerBefore submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
MajorMajor BugMajor BugbugSomething isn't workingSomething isn't workinghigh prioritynew feature bug