Skip to content

fix: use server worker config for vllm serving#1052

Open
pm-ju wants to merge 2 commits into
volcano-sh:mainfrom
pm-ju:fix-vllm-server-worker-config-selection
Open

fix: use server worker config for vllm serving#1052
pm-ju wants to merge 2 commits into
volcano-sh:mainfrom
pm-ju:fix-vllm-server-worker-config-selection

Conversation

@pm-ju
Copy link
Copy Markdown
Contributor

@pm-ju pm-ju commented May 14, 2026

/kind bug

What this PR does / why we need it:

For regular VLLM backends, BuildModelServing checks that a server worker exists, but it built the engine command from backend.Workers[0]. If the server worker is not the first item in the workers list, the generated ModelServing can use another worker's config for the VLLM command.

This change:

  • stores the resolved server worker once
  • uses that server worker for command generation
  • also uses it consistently for server replicas, image, resources, and worker replicas
  • adds regression coverage where a non-server worker appears before the server worker

Verification:

go test ./pkg/model-booster-controller/convert
image

Signed-off-by: pm-ju <pmdevops29@gmail.com>
Copilot AI review requested due to automatic review settings May 14, 2026 15:20
@volcano-sh-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign yaozengzeng for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the buildVllmModelServing function to explicitly identify and use the server worker configuration from the workers map, rather than assuming it is the first element in the slice. This ensures that the correct commands, resources, and images are applied when multiple worker types are present. A new test case was also added to verify this behavior. A review comment suggested adding a validation check for the pod count to prevent an invalid negative replica value from being generated if the server worker has zero pods.

Comment on lines 218 to 220
if serverWorker == nil {
return nil, fmt.Errorf("server worker not found in backend: %s", backend.Name)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation does not verify that serverWorker.Pods is at least 1. If Pods is 0, the WORKER_REPLICAS calculation at line 331 will result in -1, which is an invalid value for Kubernetes resource replicas. Adding a validation check here prevents the generation of invalid manifests.

if serverWorker == nil {
		return nil, fmt.Errorf("server worker not found in backend: %s", backend.Name)
	}
	if serverWorker.Pods < 1 {
		return nil, fmt.Errorf("server worker must have at least 1 pod in backend: %s", backend.Name)
	}

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Copy Markdown
Contributor

@FAUST-BENCHOU FAUST-BENCHOU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether it's changed or not doesn't matter, but personally I think the previous version was clearer.

Signed-off-by: pm-ju <pmdevops29@gmail.com>
@pm-ju
Copy link
Copy Markdown
Contributor Author

pm-ju commented May 16, 2026

Whether it's changed or not doesn't matter, but personally I think the previous version was clearer.

Fair point. I narrowed the change to keep the previous structure and only use the resolved server worker config for command generation, since that was the actual bug.

@FAUST-BENCHOU
Copy link
Copy Markdown
Contributor

dont understand.Only one worker in such circumstance so get the first worker's config as commands is enough.u mean workersMap[workload.ModelWorkerTypeServer].Config this kind of way is more safe or what?

@pm-ju
Copy link
Copy Markdown
Contributor Author

pm-ju commented May 16, 2026

dont understand.Only one worker in such circumstance so get the first worker's config as commands is enough.u mean workersMap[workload.ModelWorkerTypeServer].Config this kind of way is more safe or what?

My thinking was that since we already resolve and validate the server worker with workersMap[ModelWorkerTypeServer], using that same worker for command generation keeps the logic consistent and avoids relying on slice position.

If regular VLLM is guaranteed to always have exactly one worker, then backend.Workers[0] is fine and this change is mostly defensive. I’ll wait for maintainer input on whether that invariant should be relied on here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants