-
Notifications
You must be signed in to change notification settings - Fork 207
Extending the guide with multiple LoRAs serving with BBR #1859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
300fbbe
fe198fc
b2302c0
6bcb91e
dbb93bc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||
|---|---|---|---|---|
| @@ -0,0 +1,91 @@ | ||||
| apiVersion: gateway.networking.k8s.io/v1 | ||||
| kind: HTTPRoute | ||||
| metadata: | ||||
| name: llm-llama-route | ||||
| spec: | ||||
| parentRefs: | ||||
| - group: gateway.networking.k8s.io | ||||
| kind: Gateway | ||||
| name: inference-gateway | ||||
| rules: | ||||
| - backendRefs: | ||||
| - group: inference.networking.k8s.io | ||||
| kind: InferencePool | ||||
| name: vllm-llama3-8b-instruct | ||||
| matches: | ||||
| - path: | ||||
| type: PathPrefix | ||||
| value: / | ||||
| headers: | ||||
| - type: Exact | ||||
| name: X-Gateway-Model-Name | ||||
| value: 'meta-llama/Llama-3.1-8B-Instruct' | ||||
| timeouts: | ||||
| request: 300s | ||||
| --- | ||||
| apiVersion: gateway.networking.k8s.io/v1 | ||||
| kind: HTTPRoute | ||||
| metadata: | ||||
| name: llm-deepseek-route #give this HTTPRoute any name that helps you to group and track the matchers | ||||
| spec: | ||||
| parentRefs: | ||||
| - group: gateway.networking.k8s.io | ||||
| kind: Gateway | ||||
| name: inference-gateway | ||||
| rules: | ||||
| - backendRefs: | ||||
| - group: inference.networking.k8s.io | ||||
| kind: InferencePool | ||||
| name: vllm-deepseek-r1 | ||||
| matches: | ||||
| - path: | ||||
| type: PathPrefix | ||||
| value: / | ||||
| headers: | ||||
| - type: Exact | ||||
| name: X-Gateway-Model-Name | ||||
| value: 'deepseek/vllm-deepseek-r1' | ||||
| - path: | ||||
| type: PathPrefix | ||||
| value: / | ||||
| headers: | ||||
| - type: Exact | ||||
| #Body-Based routing(https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/pkg/bbr/README.md) is being used to copy the model name from the request body to the header. | ||||
| name: X-Gateway-Model-Name | ||||
| value: 'food-review' | ||||
| - path: | ||||
| type: PathPrefix | ||||
| value: / | ||||
| headers: | ||||
| - type: Exact | ||||
| #Body-Based routing(https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/pkg/bbr/README.md) is being used to copy the model name from the request body to the header. | ||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ditto
Suggested change
|
||||
| name: X-Gateway-Model-Name | ||||
| value: 'movie-critique' | ||||
| timeouts: | ||||
| request: 300s | ||||
| --- | ||||
| apiVersion: gateway.networking.k8s.io/v1 | ||||
| kind: HTTPRoute | ||||
| metadata: | ||||
| name: vllm-llama3-8b-instruct-lora-food-review-1 #give this HTTPRoute any name that helps you to group and track the routes | ||||
| spec: | ||||
| parentRefs: | ||||
| - group: gateway.networking.k8s.io | ||||
| kind: Gateway | ||||
| name: inference-gateway | ||||
| rules: | ||||
| - backendRefs: | ||||
| - group: inference.networking.k8s.io | ||||
| kind: InferencePool | ||||
| name: vllm-llama3-8b-instruct | ||||
| matches: | ||||
| - path: | ||||
| type: PathPrefix | ||||
| value: / | ||||
| headers: | ||||
| - type: Exact | ||||
| #Body-Based routing(https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/pkg/bbr/README.md) is being used to copy the model name from the request body to the header. | ||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ditto
Suggested change
|
||||
| name: X-Gateway-Model-Name | ||||
| value: 'food-review-1' #this is the name of LoRA as defined in vLLM deployment | ||||
| timeouts: | ||||
| request: 300s | ||||
|
Comment on lines
+67
to
+91
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why this is not part of the first HttpRoute
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This could have been part of it, but, as explained in the guide, I want to show two different ways of defining routes: using matchers on a route and using independently defined routes.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I understand the motivation here for trying to show that this functionality can be configured in more than one way. we need to keep in mind that these quickstart guides serve as guiding principles for newcomers, how they should deploy and this should reflect implicitly our "recommendation".
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @nirrozenbaum , my understanding is that one can use multiple HTTPRoutes to deal with the limitation on the number of matchers per single HTTPRoute.
Admittedly, the intention here can and should be made more clear. I can elaborate on this issue and suggest that until the number of LoRAs for the base model is below the limit on the matchers per HTTPRoute, the preferred routing configuration is via a single HTTPRoute per EPP. If the number of LoRAs exceeds this limit, then one can define additional HTTPRoutes. I do not want to leave a user with the impression that she is more restricted than she actually is. Hope this makes sense. Please let me know. |
||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,44 @@ | ||
| apiVersion: apps/v1 | ||
| kind: Deployment | ||
| metadata: | ||
| name: vllm-deepseek-r1 | ||
| spec: | ||
| replicas: 1 | ||
| selector: | ||
| matchLabels: | ||
| app: vllm-deepseek-r1 | ||
| template: | ||
| metadata: | ||
| labels: | ||
| app: vllm-deepseek-r1 | ||
| spec: | ||
| containers: | ||
| - name: vllm-sim | ||
| image: ghcr.io/llm-d/llm-d-inference-sim:v0.4.0 | ||
| imagePullPolicy: Always | ||
| args: | ||
| - --model | ||
| - deepseek/vllm-deepseek-r1 | ||
| - --port | ||
| - "8000" | ||
| - --max-loras | ||
| - "2" | ||
| - --lora-modules | ||
| - '{"name": "food-review"}' | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe we can call the lora adapters with completely different names to avoid confusion?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we still have |
||
| - '{"name": "movie-critique"}' | ||
| env: | ||
| - name: POD_NAME | ||
| valueFrom: | ||
| fieldRef: | ||
| fieldPath: metadata.name | ||
| - name: NAMESPACE | ||
| valueFrom: | ||
| fieldRef: | ||
| fieldPath: metadata.namespace | ||
| ports: | ||
| - containerPort: 8000 | ||
| name: http | ||
| protocol: TCP | ||
| resources: | ||
| requests: | ||
| cpu: 10m | ||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we remove these documentation comments?
looking at the HttpRoute, we don't really care where the header comes from (e.g., for testing purpose, a user may add this header manually).