[Bug]: 数据系统精度测试场景，1E1P1D(proxyep-d跨机)，Qwen2.5-VL-7B-Instruct，并发128，ipv6，数据集textvqa-subset，开启前缀缓存，random调度策略，精度分数29.91分，分数过低，部分请求timed out after 600s without worker response

### Your current environment

<details>
vllm commit link: vllm-ascend: v0.11.0rc4-EPD-post1

</details>


### 🐛 Describe the bug

[ENCODE_0] : INFO 12-19 11:04:37 [loggers.py:127] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 2 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
[ENCODE_0] : INFO 12-19 11:04:47 [loggers.py:127] Engine 000: Avg prompt throughput: 103.8 tokens/s, Avg generation throughput: 0.1 tokens/s, Running: 0 reqs, Waiting: 2 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
[ENCODE_0] : INFO 12-19 11:04:57 [loggers.py:127] Engine 000: Avg prompt throughput: 92.6 tokens/s, Avg generation throughput: 0.1 tokens/s, Running: 0 reqs, Waiting: 2 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
[ENCODE_0] : INFO 12-19 11:05:07 [loggers.py:127] Engine 000: Avg prompt throughput: 270.8 tokens/s, Avg generation throughput: 0.3 tokens/s, Running: 0 reqs, Waiting: 2 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
[ENCODE_0] : INFO 12-19 11:05:17 [loggers.py:127] Engine 000: Avg prompt throughput: 92.7 tokens/s, Avg generation throughput: 0.1 tokens/s, Running: 0 reqs, Waiting: 2 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
[ENCODE_0] : INFO 12-19 11:05:27 [loggers.py:127] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 2 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
[PROXY] : ERROR 12-19 11:11:55 [proxy.py:439] Runtime error during generate: Request c3bb9d1a-3c75-423b-85a3-1f888da19a83 timed out after 600s without worker response.
[PROXY] : Error processing chat completion request: %s 500: No response from proxy
[PROXY] : INFO:     127.0.0.1:45092 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
[PROXY] : ERROR 12-19 11:11:56 [proxy.py:439] Runtime error during generate: Request b61f14bf-7406-4317-8760-5edea7dc5276 timed out after 600s without worker response.
[PROXY] : Error processing chat completion request: %s 500: No response from proxy
[PROXY] : INFO:     127.0.0.1:45098 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
[PROXY] : ERROR 12-19 11:11:56 [proxy.py:439] Runtime error during generate: Request 12979e06-0de5-415b-99d0-345547eec19a timed out after 600s without worker response.
[PROXY] : Error processing chat completion request: %s 500: No response from proxy
[PROXY] : INFO:     127.0.0.1:45432 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
[PROXY] : ERROR 12-19 11:11:57 [proxy.py:439] Runtime error during generate: Request 0e1f2120-5ee3-4ff7-abfe-860ef2d2a6d3 timed out after 600s without worker response.
[PROXY] : Error processing chat completion request: %s 500: No response from proxy
[PROXY] : INFO:     127.0.0.1:45022 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
[PROXY] : ERROR 12-19 11:11:57 [proxy.py:439] Runtime error during generate: Request 9b781388-4e3d-4d98-8eac-aab7f9b817c0 timed out after 600s without worker response.
[PROXY] : Error processing chat completion request: %s 500: No response from proxy
[PROXY] : INFO:     127.0.0.1:45114 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
[PROXY] : ERROR 12-19 11:11:58 [proxy.py:439] Runtime error during generate: Request 6d470295-d7a8-4d53-9880-54787942d259 timed out after 600s without worker response.
[PROXY] : Error processing chat completion request: %s 500: No response from proxy
[PROXY] : INFO:     127.0.0.1:45226 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
[PROXY] : ERROR 12-19 11:11:58 [proxy.py:439] Runtime error during generate: Request 0d6324c0-5d43-43ae-b5b1-a510981915e7 timed out after 600s without worker response.
[PROXY] : Error processing chat completion request: %s 500: No response from proxy
[PROXY] : INFO:     127.0.0.1:52332 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
[PROXY] : ERROR 12-19 11:11:58 [proxy.py:439] Runtime error during generate: Request 017dc567-b9ed-4f9f-96ce-b9bb63b8453a timed out after 600s without worker response.
[PROXY] : Error processing chat completion request: %s 500: No response from proxy
[PROXY] : INFO:     127.0.0.1:45358 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
[PROXY] : ERROR 12-19 11:11:59 [proxy.py:439] Runtime error during generate: Request d78658fa-5446-4e37-8b7e-5bfc7743ee5d timed out after 600s without worker response.
[PROXY] : Error processing chat completion request: %s 500: No response from proxy

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: 数据系统精度测试场景，1E1P1D(proxyep-d跨机)，Qwen2.5-VL-7B-Instruct，并发128，ipv6，数据集textvqa-subset，开启前缀缓存，random调度策略，精度分数29.91分，分数过低，部分请求timed out after 600s without worker response #190

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: 数据系统精度测试场景，1E1P1D(proxyep-d跨机)，Qwen2.5-VL-7B-Instruct，并发128，ipv6，数据集textvqa-subset，开启前缀缓存，random调度策略，精度分数29.91分，分数过低，部分请求timed out after 600s without worker response #190

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions