[Doc] More neutral K8s deployment guide #14084

terrytangyuan · 2025-03-02T00:34:40Z

This is to follow-up unaddressed comments in #13841, specifically:

Removed reference to a specific project and switched to reference the integrations page instead;
Removed the phrase that other OSS projects can "make your deployment even smoother" since this is opinionated;
Removed a guide in a specific project and switched to use K8s official guide instead.

Signed-off-by: Yuan Tang <[email protected]>

github-actions · 2025-03-02T00:34:51Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

KuntaiDu

The official Kubernetes is probably to difficult for a new user to digest. Request changing to a more user-friendly documentation.

KuntaiDu · 2025-03-02T00:42:10Z

docs/source/deployment/k8s.md


 ## Pre-requisite

-Ensure that you have a running Kubernetes environment with GPU (you can follow [this tutorial](https://github.com/vllm-project/production-stack/blob/main/tutorials/00-install-kubernetes-env.md) to install a Kubernetes environment on a bare-medal GPU machine).
+Ensure that you have a running [Kubernetes cluster with GPUs](https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/).


This guide seems to be a bit complicated for new user to digest. For the sake of user friendliness, could you please find another script that is easy for the user to use?

I did a quick look to the doc. It seems to assume that user already has Kubernetes, and most of the installation scripts arer pretty scattered and needs clicking multiple pages and really navigate the doc for a long time to get all necessary things set up.

Please see further discussions below with @Hanchenli.

Hanchenli

Since users will (or should) visit this page when they are first trying to deploy their K8s vLLM stack, it is best that we keep some clean references instead of just passing the buck to another link.

The integration link contains many repositories that are less dedicated to vLLM inference. For example, Llama Stack does not only support vLLM but also other API providers like TGI, together AI, Groq... I do not think that is something that should appear when people are solely looking for a K8s deployment for vLLM. On the other hand, I think we should move KubeAI to this page if they are dedicated to vLLM in the future.

I agree with you on the pre-requisite change that it is better to link them to an official K8s page. However, the link you provide is not easy-to-understand enough for beginners. Can you change the link to a more beginner-friendly one.

terrytangyuan · 2025-03-02T00:54:51Z

I don't have better alternatives. The official K8s guide is the most vendor-neutral instructions (e.g. not just NVIDIA GPUs) and contains references to official vendor-specific instructions.

The current link from production-stack only provides instructions on NVIDIA GPUs and has specific prerequisites that are OS and vendor specific. It also uses specific versions Minikube and GPU Operator that can get easily outdated (instead of being maintained by vendors). It also requires Helm as a dependency.

Hanchenli · 2025-03-02T01:04:57Z

I think you got a point about the K8s installation. I totally agree with you that the current link in the documentation page needs an update. I just worry that users will not be able to install K8s if they followed the link you put in the PR so they can't even get started.

Could we get something more user friendly for that part? I could search for some better official tutorial in the mean time as well.

I keep my request for change for not moving this section to the external integration. Some repos are indeed external integrations like llama-stack (they are not even affiliated with vLLM! Suppose meta decided to archive that repo one day, what can we do?)

terrytangyuan · 2025-03-02T01:10:51Z

We probably should not require GPUs as a prerequisite at all. I can help add a CPU-only K8s deployment guide as a follow-up PR if that's helpful.

terrytangyuan · 2025-03-02T02:02:02Z

I keep my request for change for not moving this section to the external integration. Some repos are indeed external integrations like llama-stack

I think currently these two pages are a bit confusing so there perhaps needs to be a separate discussion on this:

KServe, KubeAI, production stack, llmaz, LWS, Helm probably should belong in the same category since they are all dedicated to deploying on Kubernetes. That should probably be done separately. I think integrations page is probably the best option so far given that only Llama Stack is irrelevant to K8s. For this PR, I am happy to remove the link to integrations page if that works for you.

they are not even affiliated with vLLM! Suppose meta decided to archive that repo one day, what can we do?

We can simply remove the link if that becomes the case.

Hanchenli · 2025-03-02T02:24:49Z

Apologies but I am a little confused of what change you are proposing exactly.

For this PR, I am happy to remove the link to integrations page if that works for you.

Could you commit again for what you are proposing? (should be several lines) I am glad that we agree that at least Llama-Index has little to do with deploying vLLM on K8s. But I am not very sure about the rest. Would love to talk about how we should put KServe, KubeAI ... etc into the documentation.

terrytangyuan · 2025-03-02T02:29:17Z

docs/source/deployment/k8s.md

-* [vLLM production-stack](https://github.com/vllm-project/production-stack): Born out of a Berkeley-UChicago collaboration, vLLM production stack is a project that contains latest research and community effort, while still delivering production-level stability and performance. Checkout the [documentation page](https://docs.vllm.ai/en/latest/deployment/integrations/production-stack.html) for more details and examples.
-
--------
+Alternatively, you can deploy vLLM using other open source projects. Checkout the [integrations page](https://docs.vllm.ai/en/latest/deployment/integrations) for more details and examples.


Suggested change

Alternatively, you can deploy vLLM using other open source projects. Checkout the [integrations page](https://docs.vllm.ai/en/latest/deployment/integrations) for more details and examples.

@Hanchenli Here's my proposed change.

I see now. Why don’t we just keep adding direct links on this page instead of moving them all to another page? Btw production stack is not an integration, it is part of the vLLM-project.

That will duplicate the list and introduces maintenance burden. Ideally we should just reorganize those two lists (in a separate PR since redirects need to be properly handled) and then cross-reference the link to the appropriate list here.

I see. So this PR is dependent on the re-organization of the two list. I would suggest that we should first handle the separate PR that figures out the two lists first.

After that we will have a much better sense of what we want to do on this page and PR.

I don't think we should wait on merging this PR since this just addresses the comments in #13841 that got merged unexpectedly. Reorganizing existing doc should be a separate effort.

I think we should withhold this PR to wait for another PR since the changes we are proposing are conditioned on another page. I suggest that we should wait for more people’s opinion first.

I have updated the PR to include both integrations page and frameworks page just to make it more neutral for all tools before any reorganization.

I suggest that we should wait for more people’s opinion first.

To be fair, this PR focuses on reverting changes in #13841 that were not supposed to be merged since there were very explicit comments that were left unaddressed. People's opinions were expressed in #13841 yet they were not addressed and the PR was merged without waiting for another review.

Signed-off-by: Yuan Tang <[email protected]>

[DOC] More neutral K8s deployment guide

57196b3

Signed-off-by: Yuan Tang <[email protected]>

mergify bot added the documentation Improvements or additions to documentation label Mar 2, 2025

terrytangyuan mentioned this pull request Mar 2, 2025

[Documentation] Add more deployment guide for Kubernetes deployment #13841

Merged

terrytangyuan changed the title ~~[DOC] More neutral K8s deployment guide~~ [Doc] More neutral K8s deployment guide Mar 2, 2025

KuntaiDu requested changes Mar 2, 2025

View reviewed changes

Hanchenli suggested changes Mar 2, 2025

View reviewed changes

terrytangyuan commented Mar 2, 2025

View reviewed changes

Update

b9fa85b

Signed-off-by: Yuan Tang <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Doc] More neutral K8s deployment guide #14084

[Doc] More neutral K8s deployment guide #14084

terrytangyuan commented Mar 2, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Mar 2, 2025

KuntaiDu left a comment

KuntaiDu Mar 2, 2025

KuntaiDu Mar 2, 2025

terrytangyuan Mar 2, 2025 •

edited

Loading

Hanchenli left a comment •

edited

Loading

terrytangyuan commented Mar 2, 2025 •

edited

Loading

Hanchenli commented Mar 2, 2025 •

edited

Loading

terrytangyuan commented Mar 2, 2025

terrytangyuan commented Mar 2, 2025

Hanchenli commented Mar 2, 2025

terrytangyuan Mar 2, 2025

terrytangyuan Mar 2, 2025

Hanchenli Mar 2, 2025

terrytangyuan Mar 2, 2025 •

edited

Loading

Hanchenli Mar 2, 2025

terrytangyuan Mar 2, 2025 •

edited

Loading

Hanchenli Mar 2, 2025 •

edited

Loading

terrytangyuan Mar 2, 2025 •

edited

Loading

[Doc] More neutral K8s deployment guide #14084

Are you sure you want to change the base?

[Doc] More neutral K8s deployment guide #14084

Conversation

terrytangyuan commented Mar 2, 2025 • edited by github-actions bot Loading

github-actions bot commented Mar 2, 2025

KuntaiDu left a comment

Choose a reason for hiding this comment

KuntaiDu Mar 2, 2025

Choose a reason for hiding this comment

KuntaiDu Mar 2, 2025

Choose a reason for hiding this comment

terrytangyuan Mar 2, 2025 • edited Loading

Choose a reason for hiding this comment

Hanchenli left a comment • edited Loading

Choose a reason for hiding this comment

terrytangyuan commented Mar 2, 2025 • edited Loading

Hanchenli commented Mar 2, 2025 • edited Loading

terrytangyuan commented Mar 2, 2025

terrytangyuan commented Mar 2, 2025

Hanchenli commented Mar 2, 2025

terrytangyuan Mar 2, 2025

Choose a reason for hiding this comment

terrytangyuan Mar 2, 2025

Choose a reason for hiding this comment

Hanchenli Mar 2, 2025

Choose a reason for hiding this comment

terrytangyuan Mar 2, 2025 • edited Loading

Choose a reason for hiding this comment

Hanchenli Mar 2, 2025

Choose a reason for hiding this comment

terrytangyuan Mar 2, 2025 • edited Loading

Choose a reason for hiding this comment

Hanchenli Mar 2, 2025 • edited Loading

Choose a reason for hiding this comment

terrytangyuan Mar 2, 2025 • edited Loading

Choose a reason for hiding this comment

terrytangyuan commented Mar 2, 2025 •

edited by github-actions bot

Loading

terrytangyuan Mar 2, 2025 •

edited

Loading

Hanchenli left a comment •

edited

Loading

terrytangyuan commented Mar 2, 2025 •

edited

Loading

Hanchenli commented Mar 2, 2025 •

edited

Loading

terrytangyuan Mar 2, 2025 •

edited

Loading

terrytangyuan Mar 2, 2025 •

edited

Loading

Hanchenli Mar 2, 2025 •

edited

Loading

terrytangyuan Mar 2, 2025 •

edited

Loading