-
Notifications
You must be signed in to change notification settings - Fork 9
[Feature] Add encoder time #64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Junhong <[email protected]>
Signed-off-by: Junhong <[email protected]>
Signed-off-by: Junhong <[email protected]>
Signed-off-by: Junhong <[email protected]>
Signed-off-by: Junhong <[email protected]>
Signed-off-by: Junhong <[email protected]>
Signed-off-by: Junhong <[email protected]>
Signed-off-by: Junhong <[email protected]>
…vice into add_encoder_time
Signed-off-by: Junhong <[email protected]>
Signed-off-by: Junhong <[email protected]>
Signed-off-by: Junhong <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds encoder execution timing functionality to track and report the time taken for encoding multimodal data in requests. When enable_metrics["encode"] is set to true in the request, the system now captures and returns the encoding time in milliseconds.
Key Changes:
- Added
enable_metricsparameter support to capture encoder timing metrics - Implemented encoder execution time tracking and reporting in both streaming and non-streaming responses
- Extended protocol classes to support metrics collection and propagation
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| lm_service/protocol/protocol.py | Added capture_metrics_result field to GenerationResponse for storing metrics data and enable_metrics field to GenerationRequest to control which metrics to capture |
| lm_service/apis/vllm/proxy.py | Implemented encoder timing logic by extracting enable_metrics from prompt, calculating encode time, and adding helper functions metrics_enabled() and cal_exec_time() to support metrics functionality |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Junhong <[email protected]>
Co-authored-by: Copilot <[email protected]> Signed-off-by: Junhong Liu <[email protected]>
Signed-off-by: Junhong <[email protected]>
Signed-off-by: Junhong <[email protected]>
Signed-off-by: Junhong <[email protected]>
Description
Resolve: #28

Type of Change
Related Issues
Changes Made
Testing
非流式
返回值
流式
返回值
纯文本
返回值
离线测试无问题
Test Coverage
Documentation
Checklist
Screenshots/Output
Additional Notes
Reviewer Checklist