-
Couldn't load subscription status.
- Fork 659
feat: Streamline GAIE recipe #3829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
atchernych
wants to merge
17
commits into
main
Choose a base branch
from
dep-508-streamline-v2
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+529
−13
Open
Changes from all commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
dbc1dd8
add VLLM GAIE Recipe
atchernych 9c33b4b
Make the blackbox deployment a backup option
atchernych 1e0817d
make epp.useDynamo=true by default
atchernych 5714c6c
Merge branch 'DEP-423-recipe' into here
atchernych 6e1684c
streamline GAIE recipe
atchernych 33979ee
account for ns and cleanup
atchernych 7f3b4ff
add GAIE checks
atchernych b7c1724
cleanup
atchernych 018e11c
Fix issues
atchernych fed4e92
nerge main
atchernych e02873a
cleanup
atchernych 0b6f8ca
cleanup
atchernych 4416b38
cleanup
atchernych b4673fa
cleanup
atchernych 7e0203b
Merge branch 'main' into DEP-508-streamline
atchernych 691bfcb
move manifests to dirs per function
atchernych 9d7eae5
fld rename
atchernych File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,87 @@ | ||
| #!/usr/bin/env bash | ||
| # SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
| #!/usr/bin/env bash | ||
| set -Eeuo pipefail | ||
|
|
||
| # ===== Namespace ensure ===== | ||
| if ! kubectl get ns "$NAMESPACE" >/dev/null 2>&1; then | ||
| kubectl create namespace "$NAMESPACE" | ||
| fi | ||
|
|
||
| KGW_NS="${KGW_NS:-kgateway-system}" | ||
|
|
||
| ok() { printf "✅ %s\n" "$*"; } | ||
| fail(){ printf "❌ %s\n" "$*" >&2; exit 1; } | ||
| info(){ printf "ℹ️ %s\n" "$*"; } | ||
|
|
||
| need() { command -v "$1" >/dev/null 2>&1 || fail "'$1' is required"; } | ||
|
|
||
| need kubectl | ||
|
|
||
| # ===== Config (env overridable) ===== | ||
| : "${NAMESPACE:=dynamo}" | ||
|
|
||
| # ===== Pre-flight checks ===== | ||
| command -v helm >/dev/null 2>&1 || { echo "ERROR: helm not found"; exit 1; } | ||
| command -v kubectl >/dev/null 2>&1 || { echo "ERROR: kubectl not found"; exit 1; } | ||
|
|
||
| GATEWAY_CRDS=( | ||
| gateways.gateway.networking.k8s.io | ||
| gatewayclasses.gateway.networking.k8s.io | ||
| httproutes.gateway.networking.k8s.io | ||
| referencegrants.gateway.networking.k8s.io | ||
| ) | ||
| info "Checking Gateway API CRDs…" | ||
| for c in "${GATEWAY_CRDS[@]}"; do | ||
| kubectl get crd "$c" >/dev/null 2>&1 || fail "Missing CRD: $c (run step a)" | ||
| kubectl wait --for=condition=Established "crd/$c" --timeout=60s >/dev/null || fail "CRD not Established: $c" | ||
| done | ||
| ok "Gateway API CRDs present & Established" | ||
|
|
||
| GAIE_CRDS=( | ||
| inferencemodels.inference.networking.x-k8s.io | ||
| inferencepools.inference.networking.x-k8s.io | ||
| ) | ||
|
|
||
| info "Checking GAIE (Inference Extension) CRDs…" | ||
| for c in "${GAIE_CRDS[@]}"; do | ||
| kubectl get crd "$c" >/dev/null 2>&1 || fail "Missing CRD: $c (run step b install of inference extension)" | ||
| kubectl wait --for=condition=Established "crd/$c" --timeout=60s >/dev/null || fail "CRD not Established: $c" | ||
| done | ||
| ok "GAIE CRDs present & Established" | ||
|
|
||
| info "Checking Kgateway controller in namespace '$KGW_NS'…" | ||
| # namespace must exist | ||
| kubectl get ns "$KGW_NS" >/dev/null 2>&1 || fail "Namespace '$KGW_NS' not found (run step c Helm installs)" | ||
|
|
||
| # pods should be running | ||
| if ! kubectl get pods -n "$KGW_NS" -l app.kubernetes.io/name=kgateway >/dev/null 2>&1; then | ||
| # fallback label (charts sometimes label differently) | ||
| PODS=$(kubectl get pods -n "$KGW_NS" -o name | grep -E 'kgateway|gateway' || true) | ||
| [[ -z "${PODS:-}" ]] && fail "Kgateway pods not found in '$KGW_NS'" | ||
| else | ||
| PODS=$(kubectl get pods -n "$KGW_NS" -l app.kubernetes.io/name=kgateway -o name) | ||
| fi | ||
| for p in $PODS; do | ||
| kubectl wait -n "$KGW_NS" --for=condition=Ready "$p" --timeout=180s >/dev/null || fail "Pod not Ready: $p" | ||
| done | ||
| ok "Kgateway controller pods Ready" | ||
|
|
||
| kubectl get gateway.gateway.networking.k8s.io inference-gateway -n "$NAMESPACE" >/dev/null 2>&1 || fail "Gateway 'inference-gateway' not found in $NAMESPACE (apply step d manifest)" | ||
|
|
||
| ok "GAIE is installed and the gateway is up in namespace '$NAMESPACE'." | ||
|
|
||
|
|
||
48 changes: 48 additions & 0 deletions
48
recipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/epp/configmap.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,48 @@ | ||
| # SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| # NOTE: You can remove the namespace field if using kubectl apply -n | ||
| apiVersion: v1 | ||
| kind: ConfigMap | ||
| metadata: | ||
| name: epp-config | ||
| labels: | ||
| app.kubernetes.io/name: dynamo-gaie | ||
| app.kubernetes.io/instance: llama3-70b-agg | ||
| data: | ||
| epp-config-dynamo.yaml: | | ||
| apiVersion: inference.networking.x-k8s.io/v1alpha1 | ||
| kind: EndpointPickerConfig | ||
| plugins: | ||
| # Required: tells EPP which profile to use (even if you only have one) | ||
| - type: single-profile-handler | ||
|
|
||
| # Picker: chooses the final endpoint after scoring | ||
| - name: picker | ||
| type: max-score-picker | ||
| - name: dyn-pre | ||
| type: dynamo-inject-workerid | ||
| parameters: {} | ||
| - name: dyn-kv | ||
| type: kv-aware-scorer | ||
| parameters: | ||
| frontendURL: http://127.0.0.1:8000/v1/chat/completions | ||
| timeoutMS: 10000 | ||
| schedulingProfiles: | ||
| - name: default | ||
| plugins: | ||
| - pluginRef: dyn-kv | ||
| weight: 1 | ||
| - pluginRef: picker |
109 changes: 109 additions & 0 deletions
109
recipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/epp/deployment.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,109 @@ | ||
| # SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| # NOTE: Update the namespace field below to match your deployment namespace | ||
| apiVersion: apps/v1 | ||
| kind: Deployment | ||
| metadata: | ||
| name: llama3-70b-agg-epp | ||
| labels: | ||
| app: llama3-70b-agg-epp | ||
| spec: | ||
| replicas: 1 | ||
| selector: | ||
| matchLabels: | ||
| app: llama3-70b-agg-epp | ||
| template: | ||
| metadata: | ||
| labels: | ||
| app: llama3-70b-agg-epp | ||
| spec: | ||
| serviceAccountName: epp-sa | ||
| terminationGracePeriodSeconds: 130 | ||
|
|
||
| imagePullSecrets: | ||
| - name: docker-imagepullsecret | ||
|
|
||
| containers: | ||
| - name: epp | ||
| image: "gitlab-master.nvidia.com:5005/dl/ai-dynamo/dynamo/epp-inference-extension-dynamo:etcdless-2" | ||
| imagePullPolicy: IfNotPresent | ||
| resources: | ||
| requests: | ||
| memory: "1Gi" | ||
| cpu: "1" | ||
| limits: | ||
| memory: "2Gi" | ||
| cpu: "2" | ||
| command: ["/bin/sh", "-c"] | ||
| args: | ||
| - > | ||
| exec /epp | ||
| -poolName "llama3-70b-agg-pool" | ||
| -poolNamespace "$POD_NAMESPACE" | ||
| -v 4 --zap-encoder json | ||
| -grpcPort 9002 -grpcHealthPort 9003 | ||
| -configFile /etc/epp/epp-config-dynamo.yaml | ||
|
|
||
| volumeMounts: | ||
| - name: epp-config | ||
| mountPath: /etc/epp | ||
| readOnly: true | ||
|
|
||
| env: | ||
| - name: POD_NAMESPACE | ||
| valueFrom: | ||
| fieldRef: | ||
| fieldPath: metadata.namespace | ||
| - name: PLATFORM_NAMESPACE | ||
| value: "$(POD_NAMESPACE)" # set to your dynamo platform namespace if different | ||
| - name: ETCD_ENDPOINTS | ||
| value: "dynamo-platform-etcd.$(PLATFORM_NAMESPACE):2379" | ||
| - name: NATS_SERVER | ||
| value: "nats://dynamo-platform-nats.$(PLATFORM_NAMESPACE):4222" | ||
| - name: DYN_NAMESPACE | ||
| value: "llama3-70b-agg" | ||
| - name: DYNAMO_COMPONENT | ||
| value: "backend" | ||
| - name: DYNAMO_KV_BLOCK_SIZE | ||
| value: "128" # UPDATE to match the --block-size in your deploy.yaml engine command | ||
| - name: USE_STREAMING | ||
| value: "true" | ||
|
|
||
| ports: | ||
| - containerPort: 9002 | ||
| - containerPort: 9003 | ||
| - name: metrics | ||
| containerPort: 9090 | ||
| livenessProbe: | ||
| grpc: | ||
| port: 9003 | ||
| service: inference-extension | ||
| initialDelaySeconds: 5 | ||
| periodSeconds: 10 | ||
| readinessProbe: | ||
| grpc: | ||
| port: 9003 | ||
| service: inference-extension | ||
| initialDelaySeconds: 5 | ||
| periodSeconds: 10 | ||
|
|
||
| volumes: | ||
| - name: epp-config | ||
| configMap: | ||
| name: epp-config | ||
| items: | ||
| - key: epp-config-dynamo.yaml | ||
| path: epp-config-dynamo.yaml |
39 changes: 39 additions & 0 deletions
39
recipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/epp/http-route.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,39 @@ | ||
| # SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| # NOTE: You can remove metadata.namespace if using kubectl apply -n | ||
| # The backendRefs.namespace field should match where your InferencePool is deployed | ||
| apiVersion: gateway.networking.k8s.io/v1 | ||
| kind: HTTPRoute | ||
| metadata: | ||
| name: llama3-70b-agg-route | ||
| spec: | ||
| parentRefs: | ||
| - group: gateway.networking.k8s.io | ||
| kind: Gateway | ||
| name: inference-gateway | ||
| rules: | ||
| - backendRefs: | ||
| - group: inference.networking.x-k8s.io | ||
| kind: InferencePool | ||
| name: llama3-70b-agg-pool | ||
| port: 8000 | ||
| weight: 1 | ||
| matches: | ||
| - path: | ||
| type: PathPrefix | ||
| value: / | ||
| timeouts: | ||
| request: 300s |
30 changes: 30 additions & 0 deletions
30
recipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/epp/service.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,30 @@ | ||
| # SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| # NOTE: Update the namespace field below to match your deployment namespace | ||
| apiVersion: v1 | ||
| kind: Service | ||
| metadata: | ||
| name: llama3-70b-agg-epp | ||
| spec: | ||
| selector: | ||
| app: llama3-70b-agg | ||
| ports: | ||
| - protocol: TCP | ||
| port: 9002 | ||
| targetPort: 9002 | ||
| appProtocol: http2 | ||
| type: ClusterIP | ||
|
|
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.