Skip to content

Commit 802c574

Browse files
authored
[Benchmark] Upgrade benchmark args for new vllm version (#3218)
### What this PR does / why we need it? Since the newest vllm commit has deprecated the arg `--endpoint-type`, we should use `--backend` instead ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? test it locally: ```shell export VLLM_USE_MODELSCOPE=true export DATASET_PATH=/root/.cache/datasets/ShareGPT_V3_unfiltered_cleaned_split.json vllm serve Qwen/Qwen2.5-7B-Instruct --load-format dummy wget -O ${DATASET_PATH} /root/.cache/datasets/ShareGPT_V3_unfiltered_cleaned_split.json https://hf-mirror.com/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json vllm bench serve --model Qwen/Qwen2.5-7B-Instruct --backend vllm --dataset-name sharegpt --dataset-path ${DATASET_PATH} --num-prompt 200 ``` and the result looks good: ```shell ============ Serving Benchmark Result ============ Successful requests: 200 Benchmark duration (s): 20.36 Total input tokens: 43560 Total generated tokens: 44697 Request throughput (req/s): 9.82 Output token throughput (tok/s): 2194.88 Peak output token throughput (tok/s): 4676.00 Peak concurrent requests: 200.00 Total Token throughput (tok/s): 4333.93 ---------------Time to First Token---------------- Mean TTFT (ms): 2143.85 Median TTFT (ms): 2486.17 P99 TTFT (ms): 2530.36 -----Time per Output Token (excl. 1st token)------ Mean TPOT (ms): 43.50 Median TPOT (ms): 30.75 P99 TPOT (ms): 309.22 ---------------Inter-token Latency---------------- Mean ITL (ms): 28.15 Median ITL (ms): 25.42 P99 ITL (ms): 38.30 ================================================== ``` - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 Signed-off-by: wangli <[email protected]>
1 parent 1b270a6 commit 802c574

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

benchmarks/tests/serving-tests.json

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@
1818
},
1919
"client_parameters": {
2020
"model": "Qwen/Qwen2.5-VL-7B-Instruct",
21-
"endpoint_type": "openai-chat",
21+
"backend": "openai-chat",
2222
"dataset_name": "hf",
2323
"hf_split": "train",
2424
"endpoint": "/v1/chat/completions",
@@ -45,7 +45,7 @@
4545
},
4646
"client_parameters": {
4747
"model": "Qwen/Qwen3-8B",
48-
"endpoint_type": "vllm",
48+
"backend": "vllm",
4949
"dataset_name": "sharegpt",
5050
"dataset_path": "/github/home/.cache/datasets/ShareGPT_V3_unfiltered_cleaned_split.json",
5151
"num_prompts": 200
@@ -69,7 +69,7 @@
6969
},
7070
"client_parameters": {
7171
"model": "Qwen/Qwen2.5-7B-Instruct",
72-
"endpoint_type": "vllm",
72+
"backend": "vllm",
7373
"dataset_name": "sharegpt",
7474
"dataset_path": "/github/home/.cache/datasets/ShareGPT_V3_unfiltered_cleaned_split.json",
7575
"num_prompts": 200

0 commit comments

Comments
 (0)