fix: use server worker config for vllm serving by pm-ju · Pull Request #1052 · volcano-sh/kthena

pm-ju · 2026-05-14T15:20:25Z

/kind bug

What this PR does / why we need it:

For regular VLLM backends, BuildModelServing checks that a server worker exists, but it built the engine command from backend.Workers[0]. If the server worker is not the first item in the workers list, the generated ModelServing can use another worker's config for the VLLM command.

This change:

stores the resolved server worker once
uses that server worker for command generation
also uses it consistently for server replicas, image, resources, and worker replicas
adds regression coverage where a non-server worker appears before the server worker

Verification:

go test ./pkg/model-booster-controller/convert

Signed-off-by: pm-ju <pmdevops29@gmail.com>

volcano-sh-bot · 2026-05-14T15:20:30Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign yaozengzeng for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

pkg/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

gemini-code-assist

Code Review

This pull request refactors the buildVllmModelServing function to explicitly identify and use the server worker configuration from the workers map, rather than assuming it is the first element in the slice. This ensures that the correct commands, resources, and images are applied when multiple worker types are present. A new test case was also added to verify this behavior. A review comment suggested adding a validation check for the pod count to prevent an invalid negative replica value from being generated if the server worker has zero pods.

gemini-code-assist · 2026-05-14T15:22:47Z

+	if serverWorker == nil {
 		return nil, fmt.Errorf("server worker not found in backend: %s", backend.Name)
 	}


The current implementation does not verify that serverWorker.Pods is at least 1. If Pods is 0, the WORKER_REPLICAS calculation at line 331 will result in -1, which is an invalid value for Kubernetes resource replicas. Adding a validation check here prevents the generation of invalid manifests.

if serverWorker == nil { return nil, fmt.Errorf("server worker not found in backend: %s", backend.Name) } if serverWorker.Pods < 1 { return nil, fmt.Errorf("server worker must have at least 1 pod in backend: %s", backend.Name) }

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

FAUST-BENCHOU

Whether it's changed or not doesn't matter, but personally I think the previous version was clearer.

Signed-off-by: pm-ju <pmdevops29@gmail.com>

pm-ju · 2026-05-16T11:40:38Z

Whether it's changed or not doesn't matter, but personally I think the previous version was clearer.

Fair point. I narrowed the change to keep the previous structure and only use the resolved server worker config for command generation, since that was the actual bug.

FAUST-BENCHOU · 2026-05-16T11:47:43Z

dont understand.Only one worker in such circumstance so get the first worker's config as commands is enough.u mean workersMap[workload.ModelWorkerTypeServer].Config this kind of way is more safe or what?

pm-ju · 2026-05-16T12:35:37Z

dont understand.Only one worker in such circumstance so get the first worker's config as commands is enough.u mean workersMap[workload.ModelWorkerTypeServer].Config this kind of way is more safe or what?

My thinking was that since we already resolve and validate the server worker with workersMap[ModelWorkerTypeServer], using that same worker for command generation keeps the logic consistent and avoids relying on slice position.

If regular VLLM is guaranteed to always have exactly one worker, then backend.Workers[0] is fine and this change is mostly defensive. I’ll wait for maintainer input on whether that invariant should be relied on here.

fix: use server worker config for vllm serving

33cfcf2

Signed-off-by: pm-ju <pmdevops29@gmail.com>

Copilot AI review requested due to automatic review settings May 14, 2026 15:20

volcano-sh-bot requested review from YaoZengzeng and git-malu May 14, 2026 15:20

volcano-sh-bot added the kind/bug label May 14, 2026

volcano-sh-bot added the size/M label May 14, 2026

Copilot started reviewing on behalf of pm-ju May 14, 2026 15:20 View session

Copilot AI reviewed May 14, 2026

View reviewed changes

pm-ju requested a review from Copilot May 14, 2026 15:22

Copilot started reviewing on behalf of pm-ju May 14, 2026 15:22 View session

gemini-code-assist Bot reviewed May 14, 2026

View reviewed changes

Copilot AI reviewed May 14, 2026

View reviewed changes

FAUST-BENCHOU reviewed May 16, 2026

View reviewed changes

fix: narrow vllm server worker config selection

3d53338

Signed-off-by: pm-ju <pmdevops29@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: use server worker config for vllm serving#1052

fix: use server worker config for vllm serving#1052
pm-ju wants to merge 2 commits into
volcano-sh:mainfrom
pm-ju:fix-vllm-server-worker-config-selection

pm-ju commented May 14, 2026

Uh oh!

volcano-sh-bot commented May 14, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 14, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

FAUST-BENCHOU left a comment

Uh oh!

pm-ju commented May 16, 2026

Uh oh!

FAUST-BENCHOU commented May 16, 2026

Uh oh!

pm-ju commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

pm-ju commented May 14, 2026

Uh oh!

volcano-sh-bot commented May 14, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

FAUST-BENCHOU left a comment

Choose a reason for hiding this comment

Uh oh!

pm-ju commented May 16, 2026

Uh oh!

FAUST-BENCHOU commented May 16, 2026

Uh oh!

pm-ju commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants