Skip to content

[Bug]: 数据系统精度测试场景,1E1P1D(proxyep-d跨机),Qwen2.5-VL-7B-Instruct,并发128,ipv6,数据集textvqa-subset,开启前缀缓存,random调度策略,精度分数29.91分,分数过低,部分请求timed out after 600s without worker response #190

@zhumingjue138

Description

@zhumingjue138

Your current environment

Details vllm commit link: vllm-ascend: v0.11.0rc4-EPD-post1

🐛 Describe the bug

[ENCODE_0] : INFO 12-19 11:04:37 [loggers.py:127] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 2 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
[ENCODE_0] : INFO 12-19 11:04:47 [loggers.py:127] Engine 000: Avg prompt throughput: 103.8 tokens/s, Avg generation throughput: 0.1 tokens/s, Running: 0 reqs, Waiting: 2 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
[ENCODE_0] : INFO 12-19 11:04:57 [loggers.py:127] Engine 000: Avg prompt throughput: 92.6 tokens/s, Avg generation throughput: 0.1 tokens/s, Running: 0 reqs, Waiting: 2 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
[ENCODE_0] : INFO 12-19 11:05:07 [loggers.py:127] Engine 000: Avg prompt throughput: 270.8 tokens/s, Avg generation throughput: 0.3 tokens/s, Running: 0 reqs, Waiting: 2 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
[ENCODE_0] : INFO 12-19 11:05:17 [loggers.py:127] Engine 000: Avg prompt throughput: 92.7 tokens/s, Avg generation throughput: 0.1 tokens/s, Running: 0 reqs, Waiting: 2 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
[ENCODE_0] : INFO 12-19 11:05:27 [loggers.py:127] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 2 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
[PROXY] : ERROR 12-19 11:11:55 [proxy.py:439] Runtime error during generate: Request c3bb9d1a-3c75-423b-85a3-1f888da19a83 timed out after 600s without worker response.
[PROXY] : Error processing chat completion request: %s 500: No response from proxy
[PROXY] : INFO: 127.0.0.1:45092 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
[PROXY] : ERROR 12-19 11:11:56 [proxy.py:439] Runtime error during generate: Request b61f14bf-7406-4317-8760-5edea7dc5276 timed out after 600s without worker response.
[PROXY] : Error processing chat completion request: %s 500: No response from proxy
[PROXY] : INFO: 127.0.0.1:45098 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
[PROXY] : ERROR 12-19 11:11:56 [proxy.py:439] Runtime error during generate: Request 12979e06-0de5-415b-99d0-345547eec19a timed out after 600s without worker response.
[PROXY] : Error processing chat completion request: %s 500: No response from proxy
[PROXY] : INFO: 127.0.0.1:45432 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
[PROXY] : ERROR 12-19 11:11:57 [proxy.py:439] Runtime error during generate: Request 0e1f2120-5ee3-4ff7-abfe-860ef2d2a6d3 timed out after 600s without worker response.
[PROXY] : Error processing chat completion request: %s 500: No response from proxy
[PROXY] : INFO: 127.0.0.1:45022 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
[PROXY] : ERROR 12-19 11:11:57 [proxy.py:439] Runtime error during generate: Request 9b781388-4e3d-4d98-8eac-aab7f9b817c0 timed out after 600s without worker response.
[PROXY] : Error processing chat completion request: %s 500: No response from proxy
[PROXY] : INFO: 127.0.0.1:45114 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
[PROXY] : ERROR 12-19 11:11:58 [proxy.py:439] Runtime error during generate: Request 6d470295-d7a8-4d53-9880-54787942d259 timed out after 600s without worker response.
[PROXY] : Error processing chat completion request: %s 500: No response from proxy
[PROXY] : INFO: 127.0.0.1:45226 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
[PROXY] : ERROR 12-19 11:11:58 [proxy.py:439] Runtime error during generate: Request 0d6324c0-5d43-43ae-b5b1-a510981915e7 timed out after 600s without worker response.
[PROXY] : Error processing chat completion request: %s 500: No response from proxy
[PROXY] : INFO: 127.0.0.1:52332 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
[PROXY] : ERROR 12-19 11:11:58 [proxy.py:439] Runtime error during generate: Request 017dc567-b9ed-4f9f-96ce-b9bb63b8453a timed out after 600s without worker response.
[PROXY] : Error processing chat completion request: %s 500: No response from proxy
[PROXY] : INFO: 127.0.0.1:45358 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
[PROXY] : ERROR 12-19 11:11:59 [proxy.py:439] Runtime error during generate: Request d78658fa-5446-4e37-8b7e-5bfc7743ee5d timed out after 600s without worker response.
[PROXY] : Error processing chat completion request: %s 500: No response from proxy

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions