-
Notifications
You must be signed in to change notification settings - Fork 8
[Feature] Add encode time #192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Junhong <[email protected]>
Signed-off-by: Junhong <[email protected]>
Signed-off-by: Junhong <[email protected]>
Signed-off-by: Junhong <[email protected]>
Signed-off-by: Junhong <[email protected]>
Signed-off-by: Junhong <[email protected]>
Signed-off-by: Junhong <[email protected]>
Signed-off-by: Junhong <[email protected]>
Signed-off-by: Junhong <[email protected]>
Signed-off-by: Junhong <[email protected]>
Signed-off-by: Junhong <[email protected]>
Signed-off-by: Junhong <[email protected]>
Signed-off-by: Junhong <[email protected]>
Signed-off-by: Junhong <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds encode time metrics capability to vLLM's chat completion API. It enables clients to request encoding time measurements through a new enable_metrics parameter and receive the metrics in the response.
- Adds
enable_metricsparameter to chat completion requests to toggle metrics collection - Introduces
capture_metrics_resultfield in RequestOutput to store captured metrics - Returns
encode_time_msmetric in both streaming and non-streaming responses
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| vllm/inputs/data.py | Adds enable_metrics field to TokensPrompt schema to pass metrics configuration through the pipeline |
| vllm/outputs.py | Adds capture_metrics_result field to RequestOutput to store metrics captured during request processing |
| vllm/entrypoints/openai/protocol.py | Adds enable_metrics request parameter and metrics response field to support metrics in API contracts |
| vllm/entrypoints/openai/serving_engine.py | Propagates enable_metrics from the request to the engine prompt for processing |
| vllm/entrypoints/openai/serving_chat.py | Implements metrics extraction logic and populates metrics in both streaming and non-streaming response paths |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Please test pure text request with a |
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Signed-off-by: Junhong <[email protected]>
Signed-off-by: Junhong <[email protected]>
Signed-off-by: Junhong <[email protected]>
Signed-off-by: Junhong <[email protected]>
Done |
Signed-off-by: Junhong <[email protected]>
Purpose
Resolve: JiusiServe/LM-service#28

1、在chat模板中添加enable_metrics参数,获取输入的参数
2、在输出模板中添加metrics参数,用于返回encoder执行的时间
Test Plan
非流式
返回值
流式
返回值
纯文本
返回值
离线测试无问题
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.