Skip to content

Conversation

@LJH-LBJ
Copy link
Collaborator

@LJH-LBJ LJH-LBJ commented Dec 22, 2025

Purpose

Resolve: JiusiServe/LM-service#28
image
1、在chat模板中添加enable_metrics参数,获取输入的参数
2、在输出模板中添加metrics参数,用于返回encoder执行的时间

Test Plan

非流式

curl -X POST  http://127.0.0.1:5580/v1/chat/completions     -H "Content-Type: application/json"     -d '{
    "model": "/workspace/models/Qwen2.5-VL-7B-Instruct",
    "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": [
        {"type": "image_url", "image_url": {"url": "file:///workspace/l00807937/EPD_Timecount_v0.11.0/image/work.jpg"}},
        {"type": "text", "text": "What is the text in the illustrate?"}
    ]}
    ],
    "enable_metrics": {
      "encode": true
    }
    }'

返回值

[root@devserver-bms-163 llm-service]# curl -X POST  http://127.0.0.1:5580/v1/chat/completions     -H "Content-Type: application/json"     -d '{
    "model": "/workspace/models/Qwen2.5-VL-7B-Instruct",
    "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": [
        {"type": "image_url", "image_url": {"url": "file:///workspace/l00807937/EPD_Timecount_v0.11.0/image/work.jpg"}},
        {"type": "text", "text": "What is the text in the illustrate?"}
    ]}
    ],
    "enable_metrics": {
      "encode": true
    }
    }'
{"id":"chatcmpl-bb32e92d9eb045ea99a1870bba9665cd","object":"chat.completion","created":1766392413,"model":"/workspace/models/Qwen2.5-VL-7B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"The text in the image is in Chinese and reads: \"我的工作永远都做不完的\" which translates to \"My work will never be finished.\"","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning_content":null},"logprobs":null,"finish_reason":"stop","stop_reason":"None","token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":216,"total_tokens":248,"completion_tokens":32,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null,"metrics":{"encode_time_ms":56}}

流式

curl -X POST  http://127.0.0.1:5580/v1/chat/completions     -H "Content-Type: application/json"     -d '{
    "model": "/workspace/models/Qwen2.5-VL-7B-Instruct",
    "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": [
        {"type": "image_url", "image_url": {"url": "file:///workspace/l00807937/EPD_Timecount_v0.11.0/image/work.jpg"}},
        {"type": "text", "text": "What is the text in the illustrate?"}
    ]}
    ],
    "enable_metrics": {
      "encode": true
    }, "stream": true
    }'

返回值

[root@devserver-bms-163 llm-service]# curl -X POST  http://127.0.0.1:5580/v1/chat/completions     -H "Content-Type: application/json"     -d '{
    "model": "/workspace/models/Qwen2.5-VL-7B-Instruct",
    "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": [
        {"type": "image_url", "image_url": {"url": "file:///workspace/l00807937/EPD_Timecount_v0.11.0/image/work.jpg"}},
        {"type": "text", "text": "What is the text in the illustrate?"}
    ]}
    ],
    "enable_metrics": {
      "encode": true
    }, "stream": true
    }'
data: {"id":"chatcmpl-dd73b1841fc145abb388e2f0ab9ec3d1","object":"chat.completion.chunk","created":1766392398,"model":"/workspace/models/Qwen2.5-VL-7B-Instruct","choices":[{"index":0,"delta":{"role":"assistant","content":""},"logprobs":null,"finish_reason":null}],"prompt_token_ids":null,"metrics":{"encode_time_ms":70}}

data: {"id":"chatcmpl-dd73b1841fc145abb388e2f0ab9ec3d1","object":"chat.completion.chunk","created":1766392398,"model":"/workspace/models/Qwen2.5-VL-7B-Instruct","choices":[{"index":0,"delta":{"content":"The"},"logprobs":null,"finish_reason":null,"token_ids":null}]}

....

data: {"id":"chatcmpl-dd73b1841fc145abb388e2f0ab9ec3d1","object":"chat.completion.chunk","created":1766392398,"model":"/workspace/models/Qwen2.5-VL-7B-Instruct","choices":[{"index":0,"delta":{"content":""},"logprobs":null,"finish_reason":"stop","stop_reason":"None","token_ids":null}]}

data: [DONE]

纯文本

[root@devserver-bms-163 llm-service]#curl -X POST http://127.0.0.1:5580/v1/chat/completions   -H "Content-Type: application/json"   -d '{
    "model": "/workspace/models/Qwen2.5-VL-7B-Instruct",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is the text in the illustrate?"}
    ],
    "enable_metrics": {
      "encode": true
    }
  }'

返回值

{"id":"chatcmpl-13c0beac94a1494abaaafd304bd07fbf","object":"chat.completion","created":1766480757,"model":"/workspace/models/Qwen2.5-VL-7B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"I'm sorry, but you haven't provided an image or any text for me to describe. Could you please upload an image or provide the text directly so I can assist you better?","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning_content":null},"logprobs":null,"finish_reason":"stop","stop_reason":"None","token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":27,"total_tokens":65,"completion_tokens":38,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null,"metrics":{"encode_time_ms":0}}[root@devserver-bms-163 llm-service]# 

离线测试无问题

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Junhong <[email protected]>
Signed-off-by: Junhong <[email protected]>
Signed-off-by: Junhong <[email protected]>
Signed-off-by: Junhong <[email protected]>
Signed-off-by: Junhong <[email protected]>
Signed-off-by: Junhong <[email protected]>
Signed-off-by: Junhong <[email protected]>
Signed-off-by: Junhong <[email protected]>
Signed-off-by: Junhong <[email protected]>
Signed-off-by: Junhong <[email protected]>
Signed-off-by: Junhong <[email protected]>
Signed-off-by: Junhong <[email protected]>
Signed-off-by: Junhong <[email protected]>
Signed-off-by: Junhong <[email protected]>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds encode time metrics capability to vLLM's chat completion API. It enables clients to request encoding time measurements through a new enable_metrics parameter and receive the metrics in the response.

  • Adds enable_metrics parameter to chat completion requests to toggle metrics collection
  • Introduces capture_metrics_result field in RequestOutput to store captured metrics
  • Returns encode_time_ms metric in both streaming and non-streaming responses

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
vllm/inputs/data.py Adds enable_metrics field to TokensPrompt schema to pass metrics configuration through the pipeline
vllm/outputs.py Adds capture_metrics_result field to RequestOutput to store metrics captured during request processing
vllm/entrypoints/openai/protocol.py Adds enable_metrics request parameter and metrics response field to support metrics in API contracts
vllm/entrypoints/openai/serving_engine.py Propagates enable_metrics from the request to the engine prompt for processing
vllm/entrypoints/openai/serving_chat.py Implements metrics extraction logic and populates metrics in both streaming and non-streaming response paths

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@wuhang2014
Copy link
Collaborator

Please test pure text request with a true metrics flag.

@LJH-LBJ LJH-LBJ changed the title Add encode time [Feature] Add encode time Dec 23, 2025
@LJH-LBJ
Copy link
Collaborator Author

LJH-LBJ commented Dec 23, 2025

Please test pure text request with a true metrics flag.

Done

Signed-off-by: Junhong <[email protected]>
@wuhang2014 wuhang2014 merged commit c8643fd into JiusiServe:v0.11.0 Dec 25, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants