feat: Streamline GAIE recipe #3829

atchernych · 2025-10-22T20:09:53Z

Overview:

Dep 508 streamline v2

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

Summary by CodeRabbit

Release Notes

New Features
- Added optional Gateway API Inference Extension (GAIE) deployment support via a new --gaie flag.
- New pre-flight validation checks to ensure deployment prerequisites are met.
Documentation
- Updated deployment guides with GAIE workflow and prerequisites.
- Restructured reference documentation to highlight deployment options and integration status.

Signed-off-by: Anna Tchernych <[email protected]>

coderabbitai · 2025-10-22T20:15:45Z

Walkthrough

This pull request introduces Gateway API Inference Extension (GAIE) integration across documentation, validation infrastructure, and deployment automation. Changes include README restructuring to reflect GAIE status, a new pre-flight validation script, Kubernetes manifests for RBAC and model deployment, and updated run.sh logic to conditionally apply GAIE-specific configurations.

Changes

Cohort / File(s)	Summary
Documentation Updates `deploy/inference-gateway/README.md`, `recipes/README.md`	Removed namespace specification from kubectl apply command for Inference Extension CRDs; restructured recipe table by removing GPU/Benchmark columns and adding GAIE-integration column; added new GAIE deployment guidance section with prerequisite clarifications and workflow steps.
GAIE Validation `recipes/gaie_checks.sh`	New Bash script for pre-flight GAIE gateway validation, verifying namespace existence, required CRDs (Gateway API and GAIE), Kgateway controller pods, and inference-gateway presence with colored status output.
RBAC Manifests `recipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/01-rbac/*`	Three new RBAC resources: ServiceAccount (epp-sa), ClusterRole (pod-read with get/watch/list permissions on inference resources and authentication/authorization APIs), and ClusterRoleBinding binding epp-sa to pod-read.
Model Configuration Manifests `recipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/02-model/*`	Two new CRD manifests: InferencePool (llama3-70b-agg-pool targeting port 8000 with Frontend/llama3-70b-agg labels) and InferenceModel (llama3-70b-agg-model, Critical priority, referencing RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic).
EPP Deployment Manifests `recipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/03-epp/*`	Four new manifests: ConfigMap (epp-config with EndpointPickerConfig and scheduling profiles), Deployment (llama3-70b-agg-epp with 1vCPU/1Gi requests, gRPC ports 9002/9003, metrics port 9090, health probes), Service (ClusterIP exposing port 9002 with http2 protocol), and HTTPRoute (routing from inference-gateway to InferencePool with 300s timeout).
Deployment Script Updates `recipes/run.sh`	Added GAIE integration flag (default false) with validation; introduced conditional GAIE execution path (runs gaie_checks.sh, applies gaie/* manifests, exits); changed model-download job reference; updated PVC creation logic (only when not downloading); added global bash safety (set -euo pipefail, IFS configuration).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Changes span multiple file types (documentation, bash scripts, YAML manifests) with new conditional logic in run.sh. YAML manifests are largely homogeneous configuration boilerplate requiring minimal logic review, while run.sh additions introduce moderate complexity through GAIE flag parsing and branching paths. Documentation restructuring is straightforward but requires verification against intended content flow.

Poem

🐰 Hops with joy through GAIE's gates—
Manifests multiply, validation waits,
RBAC roles and pools aligned,
EPP endpoints beautifully designed,
The gateway now extends with grace,
Inference finds its perfect place! 🚀

Pre-merge checks

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The pull request description uses the correct template headings but lacks substantive content in the Overview, Details, and Where should the reviewer start? sections, providing only placeholders without any actual description of the changes, impacted files, or review guidance.	Please fill in the Overview with a concise summary of the feature, populate the Details section with a description of key changes and added files, and specify in Where should the reviewer start which files or components the reviewer should focus on.
Docstring Coverage	⚠️ Warning	Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The pull request title succinctly describes the main feature being introduced, which is the streamlining of the GAIE recipe, and it is concise and directly related to the primary change of integrating GAIE workflows into the Dynamo recipe.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

recipes/run.sh (1)

83-105: set -u + direct $2 deref can abort parsing; guard option args

Under set -u, referencing $2 when missing exits before you call missing_requirement. Guard with argc checks before using $2.

-        --model)
-            if [ "$2" ]; then
-                MODEL=$2
-                shift 2
-            else
-                missing_requirement "$1"
-            fi
-            ;;
+        --model)
+            if [[ $# -lt 2 ]] || [[ "$2" =~ ^- ]]; then missing_requirement "$1"; fi
+            MODEL="$2"; shift 2;;
         --framework)
-            if [ "$2" ]; then
-                FRAMEWORK=$2
-                shift 2
-            else
-                missing_requirement "$1"
-            fi
-            ;;
+            if [[ $# -lt 2 ]] || [[ "$2" =~ ^- ]]; then missing_requirement "$1"; fi
+            FRAMEWORK="$2"; shift 2;;
         --namespace)
-            if [ "$2" ]; then
-                NAMESPACE=$2
-                shift 2
-            else
-                missing_requirement "$1"
-            fi
-            ;;
+            if [[ $# -lt 2 ]] || [[ "$2" =~ ^- ]]; then missing_requirement "$1"; fi
+            NAMESPACE="$2"; shift 2;;

🧹 Nitpick comments (3)

recipes/run.sh (2)

221-224: Quote kubectl arguments to avoid word-splitting; minor hardening

Safer with quotes around -n/-f args.

-    $DRY_RUN kubectl apply -n $NAMESPACE -f $MODEL_CACHE_DIR/model-cache.yaml
-    $DRY_RUN kubectl apply -n $NAMESPACE -f $MODEL_CACHE_DIR/model-download.yaml
+    $DRY_RUN kubectl apply -n "$NAMESPACE" -f "$MODEL_CACHE_DIR/model-cache.yaml"
+    $DRY_RUN kubectl apply -n "$NAMESPACE" -f "$MODEL_CACHE_DIR/model-download.yaml"
@@
-$DRY_RUN kubectl apply -n $NAMESPACE -f $DEPLOY_FILE
+$DRY_RUN kubectl apply -n "$NAMESPACE" -f "$DEPLOY_FILE"
@@
-$DRY_RUN kubectl wait --for=condition=Complete job/model-download -n $NAMESPACE --timeout=6000s
+$DRY_RUN kubectl wait --for=condition=Complete "job/model-download" -n "$NAMESPACE" --timeout=6000s

Also applies to: 245-246, 227-227

247-254: GAIE path: check script existence and exit cleanly

Ensure gaie_checks.sh exists and is executable; exit 0 to make intent explicit.

 if [[ "$INTEGRATION" == "gaie" ]]; then
   # run gaie checks.
   SCRIPT_DIR="$(cd -- "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
-  "${SCRIPT_DIR}/gaie_checks.sh"
+  if [[ ! -x "${SCRIPT_DIR}/gaie_checks.sh" ]]; then
+    echo "ERROR: ${SCRIPT_DIR}/gaie_checks.sh not found or not executable"; exit 1
+  fi
+  "${SCRIPT_DIR}/gaie_checks.sh"
   kubectl apply -f "$DEPLOY_PATH/gaie/k8s-manifests" -n "$NAMESPACE"
   # For now do not run the benchmark
-  exit
+  exit 0
 fi

recipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/03-epp/02-deployment.yaml (1)

33-38: Harden container: runAsNonRoot, no privilege escalation, seccomp; satisfy CKV_K8S_20/23

Add pod and container security contexts to reduce risk and quiet static analysis.

 spec:
   template:
     metadata:
       labels:
         app: llama3-70b-agg-epp
     spec:
+      securityContext:
+        runAsNonRoot: true
+        seccompProfile:
+          type: RuntimeDefault
       serviceAccountName: epp-sa
@@
       containers:
         - name: epp
@@
           resources:
             requests:
               memory: "1Gi"
               cpu: "1"
             limits:
               memory: "2Gi"
               cpu: "2"
+          securityContext:
+            allowPrivilegeEscalation: false
+            readOnlyRootFilesystem: true
+            capabilities:
+              drop: ["ALL"]
@@
           livenessProbe:
             grpc:
               port: 9003
               service: inference-extension
@@
           readinessProbe:
             grpc:
               port: 9003
               service: inference-extension

Please confirm the gRPC probe service name “inference-extension” matches the server’s registered service.

Also applies to: 39-64, 85-102, 103-109

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8642c4b and 691bfcb.

📒 Files selected for processing (13)

deploy/inference-gateway/README.md (1 hunks)
recipes/README.md (3 hunks)
recipes/gaie_checks.sh (1 hunks)
recipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/01-rbac/01-service-account.yaml (1 hunks)
recipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/01-rbac/02-cluster-role.yaml (1 hunks)
recipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/01-rbac/03-role-binding.yaml (1 hunks)
recipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/02-model/01-inference-pool.yaml (1 hunks)
recipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/02-model/02-inference-model.yaml (1 hunks)
recipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/03-epp/01-configmap.yaml (1 hunks)
recipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/03-epp/02-deployment.yaml (1 hunks)
recipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/03-epp/03-service.yaml (1 hunks)
recipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/03-epp/04-http-route.yaml (1 hunks)
recipes/run.sh (6 hunks)

🧰 Additional context used

🪛 Checkov (3.2.334)

recipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/03-epp/02-deployment.yaml

[medium] 17-109: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)

[medium] 17-109: Minimize the admission of root containers

(CKV_K8S_23)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)

GitHub Check: sglang
GitHub Check: trtllm (arm64)
GitHub Check: trtllm (amd64)
GitHub Check: vllm (amd64)
GitHub Check: operator (amd64)
GitHub Check: operator (arm64)
GitHub Check: vllm (arm64)
GitHub Check: Build and Test - dynamo

🔇 Additional comments (2)

recipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/03-epp/04-http-route.yaml (1)

16-39: Verify Gateway API and InferencePool support in your cluster

Ensure your Gateway implementation supports non-Service backendRefs (InferencePool) and add backendRefs.namespace if the pool is deployed in a different namespace. Provide a ReferenceGrant for cross-namespace access.

Confirm your cluster has the Gateway API (gateway.networking.k8s.io/v1), the InferencePool CRD (inferencepools.inference.networking.x-k8s.io), and that HTTPRoute rule timeouts are supported.

recipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/02-model/02-inference-model.yaml (1)

16-28: Manual verification needed for CRD version and cross-namespace binding
Cannot verify the installed InferenceModel CRD version without kubectl. Confirm the CRD’s version matches v1alpha2 and, if the referenced InferencePool resides in a different namespace, add a poolRef.namespace field under spec.poolRef.

recipes/gaie_checks.sh

recipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/rbac/role-binding.yaml

Signed-off-by: Anna Tchernych <[email protected]>

atchernych added 15 commits October 8, 2025 16:02

add VLLM GAIE Recipe

dbc1dd8

Signed-off-by: Anna Tchernych <[email protected]>

Make the blackbox deployment a backup option

9c33b4b

Signed-off-by: Anna Tchernych <[email protected]>

make epp.useDynamo=true by default

1e0817d

Signed-off-by: Anna Tchernych <[email protected]>

Merge branch 'DEP-423-recipe' into here

5714c6c

Signed-off-by: Anna Tchernych <[email protected]>

streamline GAIE recipe

6e1684c

Signed-off-by: Anna Tchernych <[email protected]>

account for ns and cleanup

33979ee

Signed-off-by: Anna Tchernych <[email protected]>

add GAIE checks

7f3b4ff

Signed-off-by: Anna Tchernych <[email protected]>

cleanup

b7c1724

Signed-off-by: Anna Tchernych <[email protected]>

Fix issues

018e11c

Signed-off-by: Anna Tchernych <[email protected]>

nerge main

fed4e92

Signed-off-by: Anna Tchernych <[email protected]>

cleanup

e02873a

Signed-off-by: Anna Tchernych <[email protected]>

cleanup

0b6f8ca

Signed-off-by: Anna Tchernych <[email protected]>

cleanup

4416b38

Signed-off-by: Anna Tchernych <[email protected]>

cleanup

b4673fa

Signed-off-by: Anna Tchernych <[email protected]>

Merge branch 'main' into DEP-508-streamline

7e0203b

atchernych requested review from a team as code owners October 22, 2025 20:09

pull-request-size bot added the size/XL label Oct 22, 2025

github-actions bot added the feat label Oct 22, 2025

move manifests to dirs per function

691bfcb

Signed-off-by: Anna Tchernych <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB October 22, 2025 20:13 Inactive

coderabbitai bot reviewed Oct 22, 2025

View reviewed changes

recipes/gaie_checks.sh Show resolved Hide resolved

recipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/rbac/role-binding.yaml Show resolved Hide resolved

fld rename

9d7eae5

Signed-off-by: Anna Tchernych <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB October 22, 2025 20:37 Inactive

atchernych enabled auto-merge (squash) October 24, 2025 00:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Streamline GAIE recipe #3829

feat: Streamline GAIE recipe #3829

Uh oh!

atchernych commented Oct 22, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Oct 22, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: Streamline GAIE recipe #3829

Are you sure you want to change the base?

feat: Streamline GAIE recipe #3829

Uh oh!

Conversation

atchernych commented Oct 22, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

atchernych commented Oct 22, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 22, 2025 •

edited

Loading