Skip to content

[Bug]: PD合并 TP=4 拉起Qwen3-VL-30B-A3B-Instruct模型,并发32发送textvqa-subset数据集,请求卡死,5分钟后实例异常退出 #183

@yenuo26

Description

@yenuo26

Your current environment

The output of vllm python collect_env.py vllm commit link: c1378b8
The output of vllm-ascend python collect_env.py vllm-ascend commit link: f78db0894660f3e64afb29b204aeb204806ffe08
The output of llm-service commit link: 5c37e8dbc71bfefd0c0fc2e00cca219221000e21

🐛 Describe the bug

Run the following command to reproduce the error: ASCEND_RT_VISIBLE_DEVICES=0,1,2,3 python3 -m vllm.entrypoints.openai.api_server --model Qwen3-VL-30B-A3B-Instruct/ --gpu-memory-utilization 0.9 --port 13808 --enforce-eager --enable-request-id-headers --no-enable-prefix-caching --max-num-batched-tokens 18000 --max-num-seqs 128 --no-enable-prefix-caching --tensor-parallel-size 4 --max-model-len 18000
Error output: (EngineCore_DP0 pid=1532787) INFO 12-12 12:28:11 [shm_broadcast.py:466] No available shared memory broadcast block found in 60 seconds. This typically happens when some processes are hanging or doing some time-consuming work (e.g. compilation). (EngineCore_DP0 pid=1532787) INFO 12-12 12:29:11 [shm_broadcast.py:466] No available shared memory broadcast block found in 60 seconds. This typically happens when some processes are hanging or doing some time-consuming work (e.g. compilation). (EngineCore_DP0 pid=1532787) INFO 12-12 12:30:11 [shm_broadcast.py:466] No available shared memory broadcast block found in 60 seconds. This typically happens when some processes are hanging or doing some time-consuming work (e.g. compilation). (EngineCore_DP0 pid=1532787) INFO 12-12 12:31:11 [shm_broadcast.py:466] No available shared memory broadcast block found in 60 seconds. This typically happens when some processes are hanging or doing some time-consuming work (e.g. compilation). (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:69] Dumping input data for V1 LLM engine (v0.11.0) with config: model='/data/models/Qwen3-VL-30B-A3B-Instruct/', speculative_config=None, tokenizer='/data/models/Qwen3-VL-30B-A3B-Instruct/', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=10000, download_dir=None, load_format=auto, tensor_parallel_size=4, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=True, kv_cache_dtype=auto, device_config=npu, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=/data/models/Qwen3-VL-30B-A3B-Instruct/, enable_prefix_caching=False, chunked_prefill_enabled=True, pooler_config=None, compilation_config={"level":0,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":["all"],"splitting_ops":null,"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":0,"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"use_inductor_graph_partition":false,"pass_config":{},"max_capture_size":0,"local_cache_dir":null}, (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] Dumping scheduler output for model execution: SchedulerOutput(scheduled_new_reqs=[NewRequestData(req_id=chatcmpl-47adc6f12a004694a5457d72b73910fd,prompt_token_ids_len=826,mm_features=[MultiModalFeatureSpec(data={'image_grid_thw': MultiModalFieldElem(modality='image', key='image_grid_thw', data=tensor([ 1, 50, 64]), field=MultiModalBatchedField()), 'pixel_values': MultiModalFieldElem(modality='image', key='pixel_values', data=tensor([[0.9922, 0.9922, 0.9844, ..., 1.0000, 1.0000, 1.0000], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [1.0000, 1.0000, 1.0000, ..., 1.0000, 1.0000, 1.0000], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [1.0000, 1.0000, 1.0000, ..., 1.0000, 1.0000, 1.0000], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] ..., (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [1.0000, 1.0000, 1.0000, ..., 1.0000, 1.0000, 1.0000], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [1.0000, 1.0000, 1.0000, ..., 1.0000, 1.0000, 1.0000], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [1.0000, 1.0000, 1.0000, ..., 1.0000, 1.0000, 1.0000]], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] dtype=torch.bfloat16), field=MultiModalFlatField(slices=[[slice(0, 3200, None)]], dim=0))}, modality='image', identifier='9aa5cfebd71d6edf0bd31a6e929c7e67f1720101d2b4ac0c98bd154948865243', mm_position=PlaceholderRange(offset=4, length=800, is_embed=None))],sampling_params=SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.0, top_p=1.0, top_k=0, min_p=0.0, seed=77, stop=[], stop_token_ids=[151643], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=2048, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, structured_outputs=None, extra_args=None),block_ids=([698, 997, 998, 999, 1000, 893, 894],),num_computed_tokens=0,lora_request=None,prompt_embeds_shape=None), NewRequestData(req_id=chatcmpl-ce8258fc5ef34634a97b50bcefcf369f,prompt_token_ids_len=794,mm_features=[MultiModalFeatureSpec(data={'image_grid_thw': MultiModalFieldElem(modality='image', key='image_grid_thw', data=tensor([ 1, 48, 64]), field=MultiModalBatchedField()), 'pixel_values': MultiModalFieldElem(modality='image', key='pixel_values', data=tensor([[-0.0275, -0.0275, -0.0510, ..., -0.6172, -0.5938, -0.5625], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [-0.0669, -0.0354, -0.0275, ..., -0.6641, -0.6641, -0.6250], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [-0.0825, -0.0981, -0.0825, ..., -0.5938, -0.5859, -0.5859], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] ..., (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [-0.7500, -0.7344, -0.7344, ..., -0.9375, -0.9375, -0.9375], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [-0.7266, -0.7578, -0.7812, ..., -0.9375, -0.9141, -0.8828], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [-0.8047, -0.7891, -0.7266, ..., -0.9141, -1.0000, -1.0000]], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] dtype=torch.bfloat16), field=MultiModalFlatField(slices=[[slice(0, 3072, None)]], dim=0))}, modality='image', identifier='b7a2987df2467b785f90a460fe2bfd5510ec2d313a4aa69de2cc487d5d5c0b2f', mm_position=PlaceholderRange(offset=4, length=768, is_embed=None))],sampling_params=SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.0, top_p=1.0, top_k=0, min_p=0.0, seed=77, stop=[], stop_token_ids=[151643], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=2048, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, structured_outputs=None, extra_args=None),block_ids=([895, 896, 897, 898, 899, 1213, 1214],),num_computed_tokens=0,lora_request=None,prompt_embeds_shape=None), NewRequestData(req_id=chatcmpl-f7e2b6dc767c4340b7597b8b81e8b97c,prompt_token_ids_len=702,mm_features=[MultiModalFeatureSpec(data={'image_grid_thw': MultiModalFieldElem(modality='image', key='image_grid_thw', data=tensor([ 1, 42, 64]), field=MultiModalBatchedField()), 'pixel_values': MultiModalFieldElem(modality='image', key='pixel_values', data=tensor([[ 0.1611, 0.1611, 0.1611, ..., -0.1533, -0.1533, -0.1533], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [ 0.1846, 0.1846, 0.1846, ..., -0.1299, -0.1216, -0.1216], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [ 0.1846, 0.1924, 0.1924, ..., -0.1377, -0.1299, -0.1216], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] ..., (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [ 0.0039, -0.0039, 0.0039, ..., -0.2871, -0.2637, -0.2871], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [ 0.0275, 0.0275, 0.0275, ..., -0.2793, -0.2871, -0.2871], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [-0.0039, 0.0118, 0.0197, ..., -0.2871, -0.2949, -0.3105]], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] dtype=torch.bfloat16), field=MultiModalFlatField(slices=[[slice(0, 2688, None)]], dim=0))}, modality='image', identifier='1d41b91d62557f18556549c9fc67c39935645f1cec7c277dcfa194bd13eff451', mm_position=PlaceholderRange(offset=4, length=672, is_embed=None))],sampling_params=SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.0, top_p=1.0, top_k=0, min_p=0.0, seed=77, stop=[], stop_token_ids=[151643], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=2048, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, structured_outputs=None, extra_args=None),block_ids=([1215, 1216, 1198, 1199, 1228, 1229],),num_computed_tokens=0,lora_request=None,prompt_embeds_shape=None), NewRequestData(req_id=chatcmpl-7cbc224c6b424eae9e884e3e93ffdb67,prompt_token_ids_len=793,mm_features=[MultiModalFeatureSpec(data={'image_grid_thw': MultiModalFieldElem(modality='image', key='image_grid_thw', data=tensor([ 1, 48, 64]), field=MultiModalBatchedField()), 'pixel_values': MultiModalFieldElem(modality='image', key='pixel_values', data=tensor([[-0.0275, -0.0275, -0.0510, ..., -0.6172, -0.5938, -0.5625], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [-0.0669, -0.0354, -0.0275, ..., -0.6641, -0.6641, -0.6250], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [-0.0825, -0.0981, -0.0825, ..., -0.5938, -0.5859, -0.5859], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] ..., (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [-0.7500, -0.7344, -0.7344, ..., -0.9375, -0.9375, -0.9375], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [-0.7266, -0.7578, -0.7812, ..., -0.9375, -0.9141, -0.8828], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [-0.8047, -0.7891, -0.7266, ..., -0.9141, -1.0000, -1.0000]], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] dtype=torch.bfloat16), field=MultiModalFlatField(slices=[[slice(0, 3072, None)]], dim=0))}, modality='image', identifier='b7a2987df2467b785f90a460fe2bfd5510ec2d313a4aa69de2cc487d5d5c0b2f', mm_position=PlaceholderRange(offset=4, length=768, is_embed=None))],sampling_params=SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.0, top_p=1.0, top_k=0, min_p=0.0, seed=77, stop=[], stop_token_ids=[151643], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=2048, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, structured_outputs=None, extra_args=None),block_ids=([1230, 1231, 1211, 1212, 1256, 1257, 1258],),num_computed_tokens=0,lora_request=None,prompt_embeds_shape=None), NewRequestData(req_id=chatcmpl-c00326c2f3504405bde8fb442cfebfdd,prompt_token_ids_len=796,mm_features=[MultiModalFeatureSpec(data={'image_grid_thw': MultiModalFieldElem(modality='image', key='image_grid_thw', data=tensor([ 1, 48, 64]), field=MultiModalBatchedField()), 'pixel_values': MultiModalFieldElem(modality='image', key='pixel_values', data=tensor([[-0.0275, 0.0039, -0.0118, ..., 0.2002, 0.1689, 0.1768], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [ 0.3105, 0.3105, 0.2949, ..., 0.1060, 0.0825, 0.0669], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [-0.0118, 0.0197, 0.0275, ..., 0.1924, 0.2002, 0.2002], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] ..., (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [ 0.3965, 0.4043, 0.4355, ..., -0.1377, -0.1455, -0.1455], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [ 0.3965, 0.3887, 0.3887, ..., -0.1689, -0.1846, -0.1689], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [ 0.3887, 0.4121, 0.3965, ..., -0.2637, -0.3652, -0.4199]], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] dtype=torch.bfloat16), field=MultiModalFlatField(slices=[[slice(0, 3072, None)]], dim=0))}, modality='image', identifier='90d4b40e1aaccc35741d4adbc632bfe125d33bb8b65acdfdbd94b30b77683768', mm_position=PlaceholderRange(offset=4, length=768, is_embed=None))],sampling_params=SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.0, top_p=1.0, top_k=0, min_p=0.0, seed=77, stop=[], stop_token_ids=[151643], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=2048, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, structured_outputs=None, extra_args=None),block_ids=([1259, 1260, 1261, 1318, 1319, 1293, 1294],),num_computed_tokens=0,lora_request=None,prompt_embeds_shape=None), NewRequestData(req_id=chatcmpl-47920a3a434246af9e1cb483cdb94f8b,prompt_token_ids_len=798,mm_features=[MultiModalFeatureSpec(data={'image_grid_thw': MultiModalFieldElem(modality='image', key='image_grid_thw', data=tensor([ 1, 48, 64]), field=MultiModalBatchedField()), 'pixel_values': MultiModalFieldElem(modality='image', key='pixel_values', data=tensor([[-0.0275, 0.0039, -0.0118, ..., 0.2002, 0.1689, 0.1768], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [ 0.3105, 0.3105, 0.2949, ..., 0.1060, 0.0825, 0.0669], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [-0.0118, 0.0197, 0.0275, ..., 0.1924, 0.2002, 0.2002], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] ..., (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [ 0.3965, 0.4043, 0.4355, ..., -0.1377, -0.1455, -0.1455], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [ 0.3965, 0.3887, 0.3887, ..., -0.1689, -0.1846, -0.1689], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] [ 0.3887, 0.4121, 0.3965, ..., -0.2637, -0.3652, -0.4199]], (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:76] dtype=torch.bfloat16), field=MultiModalFlatField(slices=[[slice(0, 3072, None)]], dim=0))}, modality='image', identifier='90d4b40e1aaccc35741d4adbc632bfe125d33bb8b65acdfdbd94b30b77683768', mm_position=PlaceholderRange(offset=4, length=768, is_embed=None))],sampling_params=SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.0, top_p=1.0, top_k=0, min_p=0.0, seed=77, stop=[], stop_token_ids=[151643], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=2048, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, structured_outputs=None, extra_args=None),block_ids=([1295, 1296, 1322, 1323, 1324, 1314, 1315],),num_computed_tokens=0,lora_request=None,prompt_embeds_shape=None)], scheduled_cached_reqs=CachedRequestData(req_ids=['chatcmpl-5ce7be5891984269afcb9e69951c03a2', 'chatcmpl-38c0f3b0324e41fa80445e94ea376ad8', 'chatcmpl-bb7abdd8cd9646c8b19061db190d478e', 'chatcmpl-5e48126b1bff4cbf9b32f95235878c00', 'chatcmpl-63e6f87bdb4d46cdb195647acaaf31af', 'chatcmpl-8a2a96bbe96f4a03be158c97a591ae78', 'chatcmpl-98ddce561aff4485a62193d238d4bdb5', 'chatcmpl-4504192b0b6c4e9c83d8e1f3457f0a54', 'chatcmpl-a9ffdec7f4a445b780d4fd36ff417dd3', 'chatcmpl-cb3b31082f624b1a970b7523566157a6', 'chatcmpl-91da0eeda2e045aba9a9caee61c07f82', 'chatcmpl-75bd4088b3264e0aa13976728b6e0886', 'chatcmpl-c19032e7951e4ecea3aed5667da5c39e', 'chatcmpl-9d0bdd59b5764c7eb87ca01e2c536900', 'chatcmpl-aa5b82a1b0ae4799b9c2e496f6f5244f', 'chatcmpl-5a5ff59d47f447e2a2262a825088a41d', 'chatcmpl-99b235a23fb440cb84ad6776bf95c36a', 'chatcmpl-5d82771fba9442c9955f47e6069c980c', 'chatcmpl-6066ac058d7c42459299ac31f8eb1ba9', 'chatcmpl-25f63f598dbb42bdb1eff1306be9ea1b', 'chatcmpl-09aed0366a37455385c03583047a6cf7', 'chatcmpl-df72cdd54a8046e8ba98634046335f26', 'chatcmpl-1bbc501aa189461da831e3321f73bac3', 'chatcmpl-17217b7df00d49f4939e9b8ee3acc0eb', 'chatcmpl-ee661349c5ba4328ab83b6cc5a8882dc'], resumed_from_preemption=[false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false], new_token_ids=[], new_block_ids=[null, null, null, null, null, null, null, null, null, null, null, [[697]], null, null, null, null, null, null, null, null, null, null, null, null, null], num_computed_tokens=[706, 706, 670, 799, 797, 702, 705, 797, 796, 704, 700, 768, 765, 604, 798, 797, 700, 697, 797, 800, 799, 797, 700, 697, 832]), num_scheduled_tokens={chatcmpl-5d82771fba9442c9955f47e6069c980c: 1, chatcmpl-bb7abdd8cd9646c8b19061db190d478e: 1, chatcmpl-6066ac058d7c42459299ac31f8eb1ba9: 1, chatcmpl-38c0f3b0324e41fa80445e94ea376ad8: 1, chatcmpl-99b235a23fb440cb84ad6776bf95c36a: 1, chatcmpl-c19032e7951e4ecea3aed5667da5c39e: 1, chatcmpl-17217b7df00d49f4939e9b8ee3acc0eb: 1, chatcmpl-4504192b0b6c4e9c83d8e1f3457f0a54: 1, chatcmpl-7cbc224c6b424eae9e884e3e93ffdb67: 793, chatcmpl-ce8258fc5ef34634a97b50bcefcf369f: 794, chatcmpl-5e48126b1bff4cbf9b32f95235878c00: 1, chatcmpl-a9ffdec7f4a445b780d4fd36ff417dd3: 1, chatcmpl-8a2a96bbe96f4a03be158c97a591ae78: 1, chatcmpl-63e6f87bdb4d46cdb195647acaaf31af: 1, chatcmpl-df72cdd54a8046e8ba98634046335f26: 1, chatcmpl-47adc6f12a004694a5457d72b73910fd: 826, chatcmpl-09aed0366a37455385c03583047a6cf7: 1, chatcmpl-47920a3a434246af9e1cb483cdb94f8b: 798, chatcmpl-98ddce561aff4485a62193d238d4bdb5: 1, chatcmpl-cb3b31082f624b1a970b7523566157a6: 1, chatcmpl-5ce7be5891984269afcb9e69951c03a2: 1, chatcmpl-ee661349c5ba4328ab83b6cc5a8882dc: 1, chatcmpl-25f63f598dbb42bdb1eff1306be9ea1b: 1, chatcmpl-c00326c2f3504405bde8fb442cfebfdd: 796, chatcmpl-aa5b82a1b0ae4799b9c2e496f6f5244f: 1, chatcmpl-f7e2b6dc767c4340b7597b8b81e8b97c: 702, chatcmpl-9d0bdd59b5764c7eb87ca01e2c536900: 1, chatcmpl-91da0eeda2e045aba9a9caee61c07f82: 1, chatcmpl-1bbc501aa189461da831e3321f73bac3: 1, chatcmpl-75bd4088b3264e0aa13976728b6e0886: 1, chatcmpl-5a5ff59d47f447e2a2262a825088a41d: 1}, total_num_scheduled_tokens=4734, scheduled_spec_decode_tokens={}, scheduled_encoder_inputs={chatcmpl-f7e2b6dc767c4340b7597b8b81e8b97c: [0], chatcmpl-ce8258fc5ef34634a97b50bcefcf369f: [0], chatcmpl-c00326c2f3504405bde8fb442cfebfdd: [0]}, num_common_prefix_blocks=[0], finished_req_ids=['chatcmpl-0b6eb56742694a359f96602a12e54a18'], free_encoder_mm_hashes=['4305c2c3017eea60f2497b664cbf3c6ee327aa09cdf75fcbb496db1f15a45d20', 'b2c42b958566d9d27241c6580c7b60225559d5d7049bb2623e088e517d27d0da', '2608f628d1f799b1e16abe874a9e70af7f065fb9654a7c602e4d7bc03fba8e78'], structured_output_request_ids={}, grammar_bitmask=null, kv_connector_metadata=null, ec_connector_metadata=null) (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [dump_input.py:79] Dumping scheduler stats: SchedulerStats(num_running_reqs=31, num_waiting_reqs=0, step_counter=0, current_wave=0, kv_cache_usage=0.0613608748481167, prefix_cache_stats=PrefixCacheStats(reset=False, requests=0, queries=0, hits=0), spec_decoding_stats=None, kv_connector_stats=None, num_corrupted_reqs=0) (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] EngineCore encountered a fatal error. (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] Traceback (most recent call last): (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 264, in collective_rpc (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] result = get_response(w, dequeue_timeout, (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 244, in get_response (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] status, result = w.worker_response_mq.dequeue( (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] File "/vllm-workspace/vllm/vllm/distributed/device_communicators/shm_broadcast.py", line 511, in dequeue (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] with self.acquire_read(timeout, cancel, indefinite) as buf: (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] File "/usr/local/python3.11.13/lib/python3.11/contextlib.py", line 137, in __enter__ (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] return next(self.gen) (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] ^^^^^^^^^^^^^^ (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] File "/vllm-workspace/vllm/vllm/distributed/device_communicators/shm_broadcast.py", line 460, in acquire_read (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] raise TimeoutError (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] TimeoutError (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] The above exception was the direct cause of the following exception: (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] Traceback (most recent call last): (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 701, in run_engine_core (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] engine_core.run_busy_loop() (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 728, in run_busy_loop (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] self._process_engine_step() (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 754, in _process_engine_step (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] outputs, model_executed = self.step_fn() (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] ^^^^^^^^^^^^^^ (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 284, in step (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] model_output = self.execute_model_with_error_logging( (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 270, in execute_model_with_error_logging (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] raise err (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 261, in execute_model_with_error_logging (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] return model_fn(scheduler_output) (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] ^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 181, in execute_model (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] (output, ) = self.collective_rpc( (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] ^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 273, in collective_rpc (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] raise TimeoutError(f"RPC call to {method} timed out.") from e (EngineCore_DP0 pid=1532787) ERROR 12-12 12:32:11 [core.py:710] TimeoutError: RPC call to execute_model timed out. (Worker_TP1 pid=1532925) INFO 12-12 12:32:11 [multiproc_executor.py:558] Parent process exited, terminating worker (Worker_TP0 pid=1532924) INFO 12-12 12:32:11 [multiproc_executor.py:558] Parent process exited, terminating worker (Worker_TP2 pid=1532926) INFO 12-12 12:32:11 [multiproc_executor.py:558] Parent process exited, terminating worker (APIServer pid=1532514) ERROR 12-12 12:32:11 [async_llm.py:480] AsyncLLM output_handler failed. (APIServer pid=1532514) ERROR 12-12 12:32:11 [async_llm.py:480] Traceback (most recent call last): (APIServer pid=1532514) ERROR 12-12 12:32:11 [async_llm.py:480] File "/vllm-workspace/vllm/vllm/v1/engine/async_llm.py", line 439, in output_handler (APIServer pid=1532514) ERROR 12-12 12:32:11 [async_llm.py:480] outputs = await engine_core.get_output_async() (APIServer pid=1532514) ERROR 12-12 12:32:11 [async_llm.py:480] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1532514) ERROR 12-12 12:32:11 [async_llm.py:480] File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 846, in get_output_async (APIServer pid=1532514) ERROR 12-12 12:32:11 [async_llm.py:480] raise self._format_exception(outputs) from None (APIServer pid=1532514) ERROR 12-12 12:32:11 [async_llm.py:480] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause. (Worker_TP3 pid=1532927) INFO 12-12 12:32:11 [multiproc_executor.py:558] Parent process exited, terminating worker

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions