Skip to content

Conversation

@LJH-LBJ
Copy link
Contributor

@LJH-LBJ LJH-LBJ commented Dec 22, 2025

Description

Resolve: #28
image

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Performance improvement
  • Code refactoring
  • Test improvements
  • CI/CD improvements

Related Issues

Changes Made

  • 获取enable_metrics参数
  • 对encoder执行计时,存储在metrics
  • 返回metrics

Testing

非流式

curl -X POST  http://127.0.0.1:5580/v1/chat/completions     -H "Content-Type: application/json"     -d '{
    "model": "/workspace/models/Qwen2.5-VL-7B-Instruct",
    "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": [
        {"type": "image_url", "image_url": {"url": "file:///workspace/l00807937/EPD_Timecount_v0.11.0/image/work.jpg"}},
        {"type": "text", "text": "What is the text in the illustrate?"}
    ]}
    ],
    "enable_metrics": {
      "encode": true
    }
    }'

返回值

[root@devserver-bms-163 llm-service]# curl -X POST  http://127.0.0.1:5580/v1/chat/completions     -H "Content-Type: application/json"     -d '{
    "model": "/workspace/models/Qwen2.5-VL-7B-Instruct",
    "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": [
        {"type": "image_url", "image_url": {"url": "file:///workspace/l00807937/EPD_Timecount_v0.11.0/image/work.jpg"}},
        {"type": "text", "text": "What is the text in the illustrate?"}
    ]}
    ],
    "enable_metrics": {
      "encode": true
    }
    }'
{"id":"chatcmpl-bb32e92d9eb045ea99a1870bba9665cd","object":"chat.completion","created":1766392413,"model":"/workspace/models/Qwen2.5-VL-7B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"The text in the image is in Chinese and reads: \"我的工作永远都做不完的\" which translates to \"My work will never be finished.\"","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning_content":null},"logprobs":null,"finish_reason":"stop","stop_reason":"None","token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":216,"total_tokens":248,"completion_tokens":32,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null,"metrics":{"encode_time_ms":56}}

流式

curl -X POST  http://127.0.0.1:5580/v1/chat/completions     -H "Content-Type: application/json"     -d '{
    "model": "/workspace/models/Qwen2.5-VL-7B-Instruct",
    "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": [
        {"type": "image_url", "image_url": {"url": "file:///workspace/l00807937/EPD_Timecount_v0.11.0/image/work.jpg"}},
        {"type": "text", "text": "What is the text in the illustrate?"}
    ]}
    ],
    "enable_metrics": {
      "encode": true
    }, "stream": true
    }'

返回值

[root@devserver-bms-163 llm-service]# curl -X POST  http://127.0.0.1:5580/v1/chat/completions     -H "Content-Type: application/json"     -d '{
    "model": "/workspace/models/Qwen2.5-VL-7B-Instruct",
    "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": [
        {"type": "image_url", "image_url": {"url": "file:///workspace/l00807937/EPD_Timecount_v0.11.0/image/work.jpg"}},
        {"type": "text", "text": "What is the text in the illustrate?"}
    ]}
    ],
    "enable_metrics": {
      "encode": true
    }, "stream": true
    }'
data: {"id":"chatcmpl-dd73b1841fc145abb388e2f0ab9ec3d1","object":"chat.completion.chunk","created":1766392398,"model":"/workspace/models/Qwen2.5-VL-7B-Instruct","choices":[{"index":0,"delta":{"role":"assistant","content":""},"logprobs":null,"finish_reason":null}],"prompt_token_ids":null,"metrics":{"encode_time_ms":70}}

data: {"id":"chatcmpl-dd73b1841fc145abb388e2f0ab9ec3d1","object":"chat.completion.chunk","created":1766392398,"model":"/workspace/models/Qwen2.5-VL-7B-Instruct","choices":[{"index":0,"delta":{"content":"The"},"logprobs":null,"finish_reason":null,"token_ids":null}]}

....

data: {"id":"chatcmpl-dd73b1841fc145abb388e2f0ab9ec3d1","object":"chat.completion.chunk","created":1766392398,"model":"/workspace/models/Qwen2.5-VL-7B-Instruct","choices":[{"index":0,"delta":{"content":""},"logprobs":null,"finish_reason":"stop","stop_reason":"None","token_ids":null}]}

data: [DONE]

纯文本

[root@devserver-bms-163 llm-service]#curl -X POST http://127.0.0.1:5580/v1/chat/completions   -H "Content-Type: application/json"   -d '{
    "model": "/workspace/models/Qwen2.5-VL-7B-Instruct",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is the text in the illustrate?"}
    ],
    "enable_metrics": {
      "encode": true
    }
  }'

返回值

{"id":"chatcmpl-13c0beac94a1494abaaafd304bd07fbf","object":"chat.completion","created":1766480757,"model":"/workspace/models/Qwen2.5-VL-7B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"I'm sorry, but you haven't provided an image or any text for me to describe. Could you please upload an image or provide the text directly so I can assist you better?","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning_content":null},"logprobs":null,"finish_reason":"stop","stop_reason":"None","token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":27,"total_tokens":65,"completion_tokens":38,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null,"metrics":{"encode_time_ms":0}}[root@devserver-bms-163 llm-service]# 

离线测试无问题

  • Existing tests pass
  • New tests added (if applicable)
  • Manual testing performed

Test Coverage

Documentation

  • Documentation updated (if needed)
  • Code comments added/updated
  • API documentation updated (if applicable)

Checklist

  • I have read the CONTRIBUTING guidelines
  • My code follows the project's style guidelines
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published
  • I have signed off my commits (DCO)

Screenshots/Output

Additional Notes

Reviewer Checklist

  • Code quality and style
  • Test coverage adequate
  • Documentation updated
  • Performance considerations reviewed
  • Security implications considered
  • Breaking changes documented

Signed-off-by: Junhong <[email protected]>
Signed-off-by: Junhong <[email protected]>
Signed-off-by: Junhong <[email protected]>
Signed-off-by: Junhong <[email protected]>
Signed-off-by: Junhong <[email protected]>
Signed-off-by: Junhong <[email protected]>
Signed-off-by: Junhong <[email protected]>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds encoder execution timing functionality to track and report the time taken for encoding multimodal data in requests. When enable_metrics["encode"] is set to true in the request, the system now captures and returns the encoding time in milliseconds.

Key Changes:

  • Added enable_metrics parameter support to capture encoder timing metrics
  • Implemented encoder execution time tracking and reporting in both streaming and non-streaming responses
  • Extended protocol classes to support metrics collection and propagation

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
lm_service/protocol/protocol.py Added capture_metrics_result field to GenerationResponse for storing metrics data and enable_metrics field to GenerationRequest to control which metrics to capture
lm_service/apis/vllm/proxy.py Implemented encoder timing logic by extracting enable_metrics from prompt, calculating encode time, and adding helper functions metrics_enabled() and cal_exec_time() to support metrics functionality

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@LJH-LBJ LJH-LBJ changed the title Add encoder time [Feature] Add encoder time Dec 23, 2025
Signed-off-by: Junhong <[email protected]>
LJH-LBJ and others added 4 commits December 23, 2025 17:33
Co-authored-by: Copilot <[email protected]>
Signed-off-by: Junhong Liu <[email protected]>
Signed-off-by: Junhong <[email protected]>
Signed-off-by: Junhong <[email protected]>
@wuhang2014 wuhang2014 merged commit a295f06 into JiusiServe:main Dec 25, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[RFC] Encode Metrics

2 participants