-
-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Doc] More neutral K8s deployment guide #14084
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Yuan Tang <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The official Kubernetes is probably to difficult for a new user to digest. Request changing to a more user-friendly documentation.
|
||
## Pre-requisite | ||
|
||
Ensure that you have a running Kubernetes environment with GPU (you can follow [this tutorial](https://github.com/vllm-project/production-stack/blob/main/tutorials/00-install-kubernetes-env.md) to install a Kubernetes environment on a bare-medal GPU machine). | ||
Ensure that you have a running [Kubernetes cluster with GPUs](https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This guide seems to be a bit complicated for new user to digest. For the sake of user friendliness, could you please find another script that is easy for the user to use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a quick look to the doc. It seems to assume that user already has Kubernetes, and most of the installation scripts arer pretty scattered and needs clicking multiple pages and really navigate the doc for a long time to get all necessary things set up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see further discussions below with @Hanchenli.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since users will (or should) visit this page when they are first trying to deploy their K8s vLLM stack, it is best that we keep some clean references instead of just passing the buck to another link.
The integration link contains many repositories that are less dedicated to vLLM inference. For example, Llama Stack does not only support vLLM but also other API providers like TGI, together AI, Groq... I do not think that is something that should appear when people are solely looking for a K8s deployment for vLLM. On the other hand, I think we should move KubeAI to this page if they are dedicated to vLLM in the future.
I agree with you on the pre-requisite change that it is better to link them to an official K8s page. However, the link you provide is not easy-to-understand enough for beginners. Can you change the link to a more beginner-friendly one.
I don't have better alternatives. The official K8s guide is the most vendor-neutral instructions (e.g. not just NVIDIA GPUs) and contains references to official vendor-specific instructions. The current link from production-stack only provides instructions on NVIDIA GPUs and has specific prerequisites that are OS and vendor specific. It also uses specific versions Minikube and GPU Operator that can get easily outdated (instead of being maintained by vendors). It also requires Helm as a dependency. |
I think you got a point about the K8s installation. I totally agree with you that the current link in the documentation page needs an update. I just worry that users will not be able to install K8s if they followed the link you put in the PR so they can't even get started. Could we get something more user friendly for that part? I could search for some better official tutorial in the mean time as well. I keep my request for change for not moving this section to the external integration. Some repos are indeed external integrations like llama-stack (they are not even affiliated with vLLM! Suppose meta decided to archive that repo one day, what can we do?) |
We probably should not require GPUs as a prerequisite at all. I can help add a CPU-only K8s deployment guide as a follow-up PR if that's helpful. |
I think currently these two pages are a bit confusing so there perhaps needs to be a separate discussion on this:
KServe, KubeAI, production stack, llmaz, LWS, Helm probably should belong in the same category since they are all dedicated to deploying on Kubernetes. That should probably be done separately. I think integrations page is probably the best option so far given that only Llama Stack is irrelevant to K8s. For this PR, I am happy to remove the link to integrations page if that works for you.
We can simply remove the link if that becomes the case. |
Apologies but I am a little confused of what change you are proposing exactly.
Could you commit again for what you are proposing? (should be several lines) I am glad that we agree that at least Llama-Index has little to do with deploying vLLM on K8s. But I am not very sure about the rest. Would love to talk about how we should put KServe, KubeAI ... etc into the documentation. |
docs/source/deployment/k8s.md
Outdated
* [vLLM production-stack](https://github.com/vllm-project/production-stack): Born out of a Berkeley-UChicago collaboration, vLLM production stack is a project that contains latest research and community effort, while still delivering production-level stability and performance. Checkout the [documentation page](https://docs.vllm.ai/en/latest/deployment/integrations/production-stack.html) for more details and examples. | ||
|
||
-------- | ||
Alternatively, you can deploy vLLM using other open source projects. Checkout the [integrations page](https://docs.vllm.ai/en/latest/deployment/integrations) for more details and examples. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, you can deploy vLLM using other open source projects. Checkout the [integrations page](https://docs.vllm.ai/en/latest/deployment/integrations) for more details and examples. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Hanchenli Here's my proposed change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see now. Why don’t we just keep adding direct links on this page instead of moving them all to another page? Btw production stack is not an integration, it is part of the vLLM-project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That will duplicate the list and introduces maintenance burden. Ideally we should just reorganize those two lists (in a separate PR since redirects need to be properly handled) and then cross-reference the link to the appropriate list here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. So this PR is dependent on the re-organization of the two list. I would suggest that we should first handle the separate PR that figures out the two lists first.
After that we will have a much better sense of what we want to do on this page and PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should wait on merging this PR since this just addresses the comments in #13841 that got merged unexpectedly. Reorganizing existing doc should be a separate effort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should withhold this PR to wait for another PR since the changes we are proposing are conditioned on another page. I suggest that we should wait for more people’s opinion first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the PR to include both integrations page and frameworks page just to make it more neutral for all tools before any reorganization.
I suggest that we should wait for more people’s opinion first.
To be fair, this PR focuses on reverting changes in #13841 that were not supposed to be merged since there were very explicit comments that were left unaddressed. People's opinions were expressed in #13841 yet they were not addressed and the PR was merged without waiting for another review.
Signed-off-by: Yuan Tang <[email protected]>
This is to follow-up unaddressed comments in #13841, specifically: