[Feature] Add encode time #192

LJH-LBJ · 2025-12-22T12:09:14Z

Purpose

Resolve: JiusiServe/LM-service#28

1、在chat模板中添加enable_metrics参数，获取输入的参数
2、在输出模板中添加metrics参数，用于返回encoder执行的时间

Test Plan

非流式

curl -X POST  http://127.0.0.1:5580/v1/chat/completions     -H "Content-Type: application/json"     -d '{
    "model": "/workspace/models/Qwen2.5-VL-7B-Instruct",
    "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": [
        {"type": "image_url", "image_url": {"url": "file:///workspace/l00807937/EPD_Timecount_v0.11.0/image/work.jpg"}},
        {"type": "text", "text": "What is the text in the illustrate?"}
    ]}
    ],
    "enable_metrics": {
      "encode": true
    }
    }'

返回值

[root@devserver-bms-163 llm-service]# curl -X POST  http://127.0.0.1:5580/v1/chat/completions     -H "Content-Type: application/json"     -d '{
    "model": "/workspace/models/Qwen2.5-VL-7B-Instruct",
    "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": [
        {"type": "image_url", "image_url": {"url": "file:///workspace/l00807937/EPD_Timecount_v0.11.0/image/work.jpg"}},
        {"type": "text", "text": "What is the text in the illustrate?"}
    ]}
    ],
    "enable_metrics": {
      "encode": true
    }
    }'
{"id":"chatcmpl-bb32e92d9eb045ea99a1870bba9665cd","object":"chat.completion","created":1766392413,"model":"/workspace/models/Qwen2.5-VL-7B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"The text in the image is in Chinese and reads: \"我的工作永远都做不完的\" which translates to \"My work will never be finished.\"","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning_content":null},"logprobs":null,"finish_reason":"stop","stop_reason":"None","token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":216,"total_tokens":248,"completion_tokens":32,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null,"metrics":{"encode_time_ms":56}}

流式

curl -X POST  http://127.0.0.1:5580/v1/chat/completions     -H "Content-Type: application/json"     -d '{
    "model": "/workspace/models/Qwen2.5-VL-7B-Instruct",
    "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": [
        {"type": "image_url", "image_url": {"url": "file:///workspace/l00807937/EPD_Timecount_v0.11.0/image/work.jpg"}},
        {"type": "text", "text": "What is the text in the illustrate?"}
    ]}
    ],
    "enable_metrics": {
      "encode": true
    }, "stream": true
    }'

返回值

[root@devserver-bms-163 llm-service]# curl -X POST  http://127.0.0.1:5580/v1/chat/completions     -H "Content-Type: application/json"     -d '{
    "model": "/workspace/models/Qwen2.5-VL-7B-Instruct",
    "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": [
        {"type": "image_url", "image_url": {"url": "file:///workspace/l00807937/EPD_Timecount_v0.11.0/image/work.jpg"}},
        {"type": "text", "text": "What is the text in the illustrate?"}
    ]}
    ],
    "enable_metrics": {
      "encode": true
    }, "stream": true
    }'
data: {"id":"chatcmpl-dd73b1841fc145abb388e2f0ab9ec3d1","object":"chat.completion.chunk","created":1766392398,"model":"/workspace/models/Qwen2.5-VL-7B-Instruct","choices":[{"index":0,"delta":{"role":"assistant","content":""},"logprobs":null,"finish_reason":null}],"prompt_token_ids":null,"metrics":{"encode_time_ms":70}}

data: {"id":"chatcmpl-dd73b1841fc145abb388e2f0ab9ec3d1","object":"chat.completion.chunk","created":1766392398,"model":"/workspace/models/Qwen2.5-VL-7B-Instruct","choices":[{"index":0,"delta":{"content":"The"},"logprobs":null,"finish_reason":null,"token_ids":null}]}

....

data: {"id":"chatcmpl-dd73b1841fc145abb388e2f0ab9ec3d1","object":"chat.completion.chunk","created":1766392398,"model":"/workspace/models/Qwen2.5-VL-7B-Instruct","choices":[{"index":0,"delta":{"content":""},"logprobs":null,"finish_reason":"stop","stop_reason":"None","token_ids":null}]}

data: [DONE]

纯文本

[root@devserver-bms-163 llm-service]#curl -X POST http://127.0.0.1:5580/v1/chat/completions   -H "Content-Type: application/json"   -d '{
    "model": "/workspace/models/Qwen2.5-VL-7B-Instruct",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is the text in the illustrate?"}
    ],
    "enable_metrics": {
      "encode": true
    }
  }'

返回值

{"id":"chatcmpl-13c0beac94a1494abaaafd304bd07fbf","object":"chat.completion","created":1766480757,"model":"/workspace/models/Qwen2.5-VL-7B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"I'm sorry, but you haven't provided an image or any text for me to describe. Could you please upload an image or provide the text directly so I can assist you better?","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning_content":null},"logprobs":null,"finish_reason":"stop","stop_reason":"None","token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":27,"total_tokens":65,"completion_tokens":38,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null,"metrics":{"encode_time_ms":0}}[root@devserver-bms-163 llm-service]#

离线测试无问题

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Junhong <[email protected]>

Copilot

Pull request overview

This PR adds encode time metrics capability to vLLM's chat completion API. It enables clients to request encoding time measurements through a new enable_metrics parameter and receive the metrics in the response.

Adds enable_metrics parameter to chat completion requests to toggle metrics collection
Introduces capture_metrics_result field in RequestOutput to store captured metrics
Returns encode_time_ms metric in both streaming and non-streaming responses

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
vllm/inputs/data.py	Adds `enable_metrics` field to TokensPrompt schema to pass metrics configuration through the pipeline
vllm/outputs.py	Adds `capture_metrics_result` field to RequestOutput to store metrics captured during request processing
vllm/entrypoints/openai/protocol.py	Adds `enable_metrics` request parameter and `metrics` response field to support metrics in API contracts
vllm/entrypoints/openai/serving_engine.py	Propagates `enable_metrics` from the request to the engine prompt for processing
vllm/entrypoints/openai/serving_chat.py	Implements metrics extraction logic and populates metrics in both streaming and non-streaming response paths

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

vllm/entrypoints/openai/protocol.py

vllm/entrypoints/openai/serving_chat.py

vllm/entrypoints/openai/serving_engine.py

vllm/entrypoints/openai/protocol.py

vllm/entrypoints/openai/serving_chat.py

wuhang2014 · 2025-12-23T06:40:13Z

Please test pure text request with a true metrics flag.

Co-authored-by: Copilot <[email protected]>

Signed-off-by: Junhong <[email protected]>

LJH-LBJ · 2025-12-23T09:29:44Z

Please test pure text request with a true metrics flag.

Done

Signed-off-by: Junhong <[email protected]>

LJH-LBJ added 14 commits December 19, 2025 16:42

add_enable_metrics

928bd77

Signed-off-by: Junhong <[email protected]>

fix

61cea7f

Signed-off-by: Junhong <[email protected]>

fix

4b23746

Signed-off-by: Junhong <[email protected]>

Update __init__.py

6197406

Signed-off-by: Junhong <[email protected]>

fix

8054329

Signed-off-by: Junhong <[email protected]>

Update serving_chat.py

3b4021d

Signed-off-by: Junhong <[email protected]>

Update serving_chat.py

ef1223f

Signed-off-by: Junhong <[email protected]>

fix

179c561

Signed-off-by: Junhong <[email protected]>

opt

ccbeb40

Signed-off-by: Junhong <[email protected]>

Update serving_chat.py

1e25362

Signed-off-by: Junhong <[email protected]>

fix pre-commit

aede3ab

Signed-off-by: Junhong <[email protected]>

fix pre-commit

153dd7c

Signed-off-by: Junhong <[email protected]>

fix pre-commit

a1f14ef

Signed-off-by: Junhong <[email protected]>

fix pre-commit

450ba76

Signed-off-by: Junhong <[email protected]>

wuhang2014 requested a review from Copilot December 23, 2025 06:34

Copilot started reviewing on behalf of wuhang2014 December 23, 2025 06:35 View session

Copilot AI reviewed Dec 23, 2025

View reviewed changes

LJH-LBJ changed the title ~~Add encode time~~ [Feature] Add encode time Dec 23, 2025

LJH-LBJ and others added 8 commits December 23, 2025 15:10

Update vllm/entrypoints/openai/serving_engine.py

0d4eec6

Co-authored-by: Copilot <[email protected]>

Update vllm/entrypoints/openai/serving_chat.py

6ac7528

Co-authored-by: Copilot <[email protected]>

Update vllm/entrypoints/openai/serving_chat.py

336e66c

Co-authored-by: Copilot <[email protected]>

Update vllm/entrypoints/openai/serving_chat.py

b2b9057

Co-authored-by: Copilot <[email protected]>

Update serving_chat.py

729533a

Signed-off-by: Junhong <[email protected]>

Update protocol.py

08acc14

Signed-off-by: Junhong <[email protected]>

Update protocol.py

3158049

Signed-off-by: Junhong <[email protected]>

fix pre-commit

3531f94

Signed-off-by: Junhong <[email protected]>

fix pre-commit

e370741

Signed-off-by: Junhong <[email protected]>

wuhang2014 merged commit c8643fd into JiusiServe:v0.11.0 Dec 25, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Add encode time #192

[Feature] Add encode time #192

Uh oh!

LJH-LBJ commented Dec 22, 2025 •

edited by github-actions bot

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wuhang2014 commented Dec 23, 2025

Uh oh!

LJH-LBJ commented Dec 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Feature] Add encode time #192

[Feature] Add encode time #192

Uh oh!

Conversation

LJH-LBJ commented Dec 22, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wuhang2014 commented Dec 23, 2025

Uh oh!

LJH-LBJ commented Dec 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

LJH-LBJ commented Dec 22, 2025 •

edited by github-actions bot

Loading