Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc] More neutral K8s deployment guide #14084

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

terrytangyuan
Copy link
Contributor

@terrytangyuan terrytangyuan commented Mar 2, 2025

This is to follow-up unaddressed comments in #13841, specifically:

  1. Removed reference to a specific project and switched to reference the integrations page instead;
  2. Removed the phrase that other OSS projects can "make your deployment even smoother" since this is opinionated;
  3. Removed a guide in a specific project and switched to use K8s official guide instead.

Copy link

github-actions bot commented Mar 2, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added the documentation Improvements or additions to documentation label Mar 2, 2025
@terrytangyuan terrytangyuan changed the title [DOC] More neutral K8s deployment guide [Doc] More neutral K8s deployment guide Mar 2, 2025
Copy link
Collaborator

@KuntaiDu KuntaiDu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The official Kubernetes is probably to difficult for a new user to digest. Request changing to a more user-friendly documentation.


## Pre-requisite

Ensure that you have a running Kubernetes environment with GPU (you can follow [this tutorial](https://github.com/vllm-project/production-stack/blob/main/tutorials/00-install-kubernetes-env.md) to install a Kubernetes environment on a bare-medal GPU machine).
Ensure that you have a running [Kubernetes cluster with GPUs](https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This guide seems to be a bit complicated for new user to digest. For the sake of user friendliness, could you please find another script that is easy for the user to use?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick look to the doc. It seems to assume that user already has Kubernetes, and most of the installation scripts arer pretty scattered and needs clicking multiple pages and really navigate the doc for a long time to get all necessary things set up.

Copy link
Contributor Author

@terrytangyuan terrytangyuan Mar 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see further discussions below with @Hanchenli.

Copy link

@Hanchenli Hanchenli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since users will (or should) visit this page when they are first trying to deploy their K8s vLLM stack, it is best that we keep some clean references instead of just passing the buck to another link.

The integration link contains many repositories that are less dedicated to vLLM inference. For example, Llama Stack does not only support vLLM but also other API providers like TGI, together AI, Groq... I do not think that is something that should appear when people are solely looking for a K8s deployment for vLLM. On the other hand, I think we should move KubeAI to this page if they are dedicated to vLLM in the future.

I agree with you on the pre-requisite change that it is better to link them to an official K8s page. However, the link you provide is not easy-to-understand enough for beginners. Can you change the link to a more beginner-friendly one.

@terrytangyuan
Copy link
Contributor Author

terrytangyuan commented Mar 2, 2025

I don't have better alternatives. The official K8s guide is the most vendor-neutral instructions (e.g. not just NVIDIA GPUs) and contains references to official vendor-specific instructions.

The current link from production-stack only provides instructions on NVIDIA GPUs and has specific prerequisites that are OS and vendor specific. It also uses specific versions Minikube and GPU Operator that can get easily outdated (instead of being maintained by vendors). It also requires Helm as a dependency.

@Hanchenli
Copy link

Hanchenli commented Mar 2, 2025

I think you got a point about the K8s installation. I totally agree with you that the current link in the documentation page needs an update. I just worry that users will not be able to install K8s if they followed the link you put in the PR so they can't even get started.

Could we get something more user friendly for that part? I could search for some better official tutorial in the mean time as well.

I keep my request for change for not moving this section to the external integration. Some repos are indeed external integrations like llama-stack (they are not even affiliated with vLLM! Suppose meta decided to archive that repo one day, what can we do?)

@terrytangyuan
Copy link
Contributor Author

We probably should not require GPUs as a prerequisite at all. I can help add a CPU-only K8s deployment guide as a follow-up PR if that's helpful.

@terrytangyuan
Copy link
Contributor Author

I keep my request for change for not moving this section to the external integration. Some repos are indeed external integrations like llama-stack

I think currently these two pages are a bit confusing so there perhaps needs to be a separate discussion on this:

  1. https://docs.vllm.ai/en/latest/deployment/frameworks/index.html
  2. https://docs.vllm.ai/en/latest/deployment/integrations/index.html

KServe, KubeAI, production stack, llmaz, LWS, Helm probably should belong in the same category since they are all dedicated to deploying on Kubernetes. That should probably be done separately. I think integrations page is probably the best option so far given that only Llama Stack is irrelevant to K8s. For this PR, I am happy to remove the link to integrations page if that works for you.

they are not even affiliated with vLLM! Suppose meta decided to archive that repo one day, what can we do?

We can simply remove the link if that becomes the case.

@Hanchenli
Copy link

Apologies but I am a little confused of what change you are proposing exactly.

For this PR, I am happy to remove the link to integrations page if that works for you.

Could you commit again for what you are proposing? (should be several lines) I am glad that we agree that at least Llama-Index has little to do with deploying vLLM on K8s. But I am not very sure about the rest. Would love to talk about how we should put KServe, KubeAI ... etc into the documentation.

* [vLLM production-stack](https://github.com/vllm-project/production-stack): Born out of a Berkeley-UChicago collaboration, vLLM production stack is a project that contains latest research and community effort, while still delivering production-level stability and performance. Checkout the [documentation page](https://docs.vllm.ai/en/latest/deployment/integrations/production-stack.html) for more details and examples.

--------
Alternatively, you can deploy vLLM using other open source projects. Checkout the [integrations page](https://docs.vllm.ai/en/latest/deployment/integrations) for more details and examples.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Alternatively, you can deploy vLLM using other open source projects. Checkout the [integrations page](https://docs.vllm.ai/en/latest/deployment/integrations) for more details and examples.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Hanchenli Here's my proposed change.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see now. Why don’t we just keep adding direct links on this page instead of moving them all to another page? Btw production stack is not an integration, it is part of the vLLM-project.

Copy link
Contributor Author

@terrytangyuan terrytangyuan Mar 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That will duplicate the list and introduces maintenance burden. Ideally we should just reorganize those two lists (in a separate PR since redirects need to be properly handled) and then cross-reference the link to the appropriate list here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. So this PR is dependent on the re-organization of the two list. I would suggest that we should first handle the separate PR that figures out the two lists first.

After that we will have a much better sense of what we want to do on this page and PR.

Copy link
Contributor Author

@terrytangyuan terrytangyuan Mar 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should wait on merging this PR since this just addresses the comments in #13841 that got merged unexpectedly. Reorganizing existing doc should be a separate effort.

Copy link

@Hanchenli Hanchenli Mar 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should withhold this PR to wait for another PR since the changes we are proposing are conditioned on another page. I suggest that we should wait for more people’s opinion first.

Copy link
Contributor Author

@terrytangyuan terrytangyuan Mar 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the PR to include both integrations page and frameworks page just to make it more neutral for all tools before any reorganization.

I suggest that we should wait for more people’s opinion first.

To be fair, this PR focuses on reverting changes in #13841 that were not supposed to be merged since there were very explicit comments that were left unaddressed. People's opinions were expressed in #13841 yet they were not addressed and the PR was merged without waiting for another review.

Signed-off-by: Yuan Tang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants