fix: use server worker config for vllm serving#1052
Conversation
Signed-off-by: pm-ju <pmdevops29@gmail.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Code Review
This pull request refactors the buildVllmModelServing function to explicitly identify and use the server worker configuration from the workers map, rather than assuming it is the first element in the slice. This ensures that the correct commands, resources, and images are applied when multiple worker types are present. A new test case was also added to verify this behavior. A review comment suggested adding a validation check for the pod count to prevent an invalid negative replica value from being generated if the server worker has zero pods.
| if serverWorker == nil { | ||
| return nil, fmt.Errorf("server worker not found in backend: %s", backend.Name) | ||
| } |
There was a problem hiding this comment.
The current implementation does not verify that serverWorker.Pods is at least 1. If Pods is 0, the WORKER_REPLICAS calculation at line 331 will result in -1, which is an invalid value for Kubernetes resource replicas. Adding a validation check here prevents the generation of invalid manifests.
if serverWorker == nil {
return nil, fmt.Errorf("server worker not found in backend: %s", backend.Name)
}
if serverWorker.Pods < 1 {
return nil, fmt.Errorf("server worker must have at least 1 pod in backend: %s", backend.Name)
}
FAUST-BENCHOU
left a comment
There was a problem hiding this comment.
Whether it's changed or not doesn't matter, but personally I think the previous version was clearer.
Signed-off-by: pm-ju <pmdevops29@gmail.com>
Fair point. I narrowed the change to keep the previous structure and only use the resolved server worker config for command generation, since that was the actual bug. |
|
dont understand.Only one worker in such circumstance so get the first worker's config as commands is enough.u mean |
My thinking was that since we already resolve and validate the server worker with If regular VLLM is guaranteed to always have exactly one worker, then |
/kind bug
What this PR does / why we need it:
For regular VLLM backends, BuildModelServing checks that a server worker exists, but it built the engine command from backend.Workers[0]. If the server worker is not the first item in the workers list, the generated ModelServing can use another worker's config for the VLLM command.
This change:
Verification:
go test ./pkg/model-booster-controller/convert
