Skip to content

Conversation

@atchernych
Copy link
Contributor

@atchernych atchernych commented Oct 22, 2025

Overview:

Dep 508 streamline v2

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

Release Notes

  • New Features

    • Added optional Gateway API Inference Extension (GAIE) deployment support via a new --gaie flag.
    • New pre-flight validation checks to ensure deployment prerequisites are met.
  • Documentation

    • Updated deployment guides with GAIE workflow and prerequisites.
    • Restructured reference documentation to highlight deployment options and integration status.

Signed-off-by: Anna Tchernych <[email protected]>
Signed-off-by: Anna Tchernych <[email protected]>
Signed-off-by: Anna Tchernych <[email protected]>
Signed-off-by: Anna Tchernych <[email protected]>
Signed-off-by: Anna Tchernych <[email protected]>
Signed-off-by: Anna Tchernych <[email protected]>
Signed-off-by: Anna Tchernych <[email protected]>
Signed-off-by: Anna Tchernych <[email protected]>
Signed-off-by: Anna Tchernych <[email protected]>
Signed-off-by: Anna Tchernych <[email protected]>
Signed-off-by: Anna Tchernych <[email protected]>
@atchernych atchernych requested review from a team as code owners October 22, 2025 20:09
@github-actions github-actions bot added the feat label Oct 22, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 22, 2025

Walkthrough

This pull request introduces Gateway API Inference Extension (GAIE) integration across documentation, validation infrastructure, and deployment automation. Changes include README restructuring to reflect GAIE status, a new pre-flight validation script, Kubernetes manifests for RBAC and model deployment, and updated run.sh logic to conditionally apply GAIE-specific configurations.

Changes

Cohort / File(s) Summary
Documentation Updates
deploy/inference-gateway/README.md, recipes/README.md
Removed namespace specification from kubectl apply command for Inference Extension CRDs; restructured recipe table by removing GPU/Benchmark columns and adding GAIE-integration column; added new GAIE deployment guidance section with prerequisite clarifications and workflow steps.
GAIE Validation
recipes/gaie_checks.sh
New Bash script for pre-flight GAIE gateway validation, verifying namespace existence, required CRDs (Gateway API and GAIE), Kgateway controller pods, and inference-gateway presence with colored status output.
RBAC Manifests
recipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/01-rbac/*
Three new RBAC resources: ServiceAccount (epp-sa), ClusterRole (pod-read with get/watch/list permissions on inference resources and authentication/authorization APIs), and ClusterRoleBinding binding epp-sa to pod-read.
Model Configuration Manifests
recipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/02-model/*
Two new CRD manifests: InferencePool (llama3-70b-agg-pool targeting port 8000 with Frontend/llama3-70b-agg labels) and InferenceModel (llama3-70b-agg-model, Critical priority, referencing RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic).
EPP Deployment Manifests
recipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/03-epp/*
Four new manifests: ConfigMap (epp-config with EndpointPickerConfig and scheduling profiles), Deployment (llama3-70b-agg-epp with 1vCPU/1Gi requests, gRPC ports 9002/9003, metrics port 9090, health probes), Service (ClusterIP exposing port 9002 with http2 protocol), and HTTPRoute (routing from inference-gateway to InferencePool with 300s timeout).
Deployment Script Updates
recipes/run.sh
Added GAIE integration flag (default false) with validation; introduced conditional GAIE execution path (runs gaie_checks.sh, applies gaie/* manifests, exits); changed model-download job reference; updated PVC creation logic (only when not downloading); added global bash safety (set -euo pipefail, IFS configuration).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Changes span multiple file types (documentation, bash scripts, YAML manifests) with new conditional logic in run.sh. YAML manifests are largely homogeneous configuration boilerplate requiring minimal logic review, while run.sh additions introduce moderate complexity through GAIE flag parsing and branching paths. Documentation restructuring is straightforward but requires verification against intended content flow.

Poem

🐰 Hops with joy through GAIE's gates—
Manifests multiply, validation waits,
RBAC roles and pools aligned,
EPP endpoints beautifully designed,
The gateway now extends with grace,
Inference finds its perfect place! 🚀

Pre-merge checks

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The pull request description uses the correct template headings but lacks substantive content in the Overview, Details, and Where should the reviewer start? sections, providing only placeholders without any actual description of the changes, impacted files, or review guidance. Please fill in the Overview with a concise summary of the feature, populate the Details section with a description of key changes and added files, and specify in Where should the reviewer start which files or components the reviewer should focus on.
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The pull request title succinctly describes the main feature being introduced, which is the streamlining of the GAIE recipe, and it is concise and directly related to the primary change of integrating GAIE workflows into the Dynamo recipe.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
recipes/run.sh (1)

83-105: set -u + direct $2 deref can abort parsing; guard option args

Under set -u, referencing $2 when missing exits before you call missing_requirement. Guard with argc checks before using $2.

-        --model)
-            if [ "$2" ]; then
-                MODEL=$2
-                shift 2
-            else
-                missing_requirement "$1"
-            fi
-            ;;
+        --model)
+            if [[ $# -lt 2 ]] || [[ "$2" =~ ^- ]]; then missing_requirement "$1"; fi
+            MODEL="$2"; shift 2;;
         --framework)
-            if [ "$2" ]; then
-                FRAMEWORK=$2
-                shift 2
-            else
-                missing_requirement "$1"
-            fi
-            ;;
+            if [[ $# -lt 2 ]] || [[ "$2" =~ ^- ]]; then missing_requirement "$1"; fi
+            FRAMEWORK="$2"; shift 2;;
         --namespace)
-            if [ "$2" ]; then
-                NAMESPACE=$2
-                shift 2
-            else
-                missing_requirement "$1"
-            fi
-            ;;
+            if [[ $# -lt 2 ]] || [[ "$2" =~ ^- ]]; then missing_requirement "$1"; fi
+            NAMESPACE="$2"; shift 2;;
🧹 Nitpick comments (3)
recipes/run.sh (2)

221-224: Quote kubectl arguments to avoid word-splitting; minor hardening

Safer with quotes around -n/-f args.

-    $DRY_RUN kubectl apply -n $NAMESPACE -f $MODEL_CACHE_DIR/model-cache.yaml
-    $DRY_RUN kubectl apply -n $NAMESPACE -f $MODEL_CACHE_DIR/model-download.yaml
+    $DRY_RUN kubectl apply -n "$NAMESPACE" -f "$MODEL_CACHE_DIR/model-cache.yaml"
+    $DRY_RUN kubectl apply -n "$NAMESPACE" -f "$MODEL_CACHE_DIR/model-download.yaml"
@@
-$DRY_RUN kubectl apply -n $NAMESPACE -f $DEPLOY_FILE
+$DRY_RUN kubectl apply -n "$NAMESPACE" -f "$DEPLOY_FILE"
@@
-$DRY_RUN kubectl wait --for=condition=Complete job/model-download -n $NAMESPACE --timeout=6000s
+$DRY_RUN kubectl wait --for=condition=Complete "job/model-download" -n "$NAMESPACE" --timeout=6000s

Also applies to: 245-246, 227-227


247-254: GAIE path: check script existence and exit cleanly

Ensure gaie_checks.sh exists and is executable; exit 0 to make intent explicit.

 if [[ "$INTEGRATION" == "gaie" ]]; then
   # run gaie checks.
   SCRIPT_DIR="$(cd -- "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
-  "${SCRIPT_DIR}/gaie_checks.sh"
+  if [[ ! -x "${SCRIPT_DIR}/gaie_checks.sh" ]]; then
+    echo "ERROR: ${SCRIPT_DIR}/gaie_checks.sh not found or not executable"; exit 1
+  fi
+  "${SCRIPT_DIR}/gaie_checks.sh"
   kubectl apply -f "$DEPLOY_PATH/gaie/k8s-manifests" -n "$NAMESPACE"
   # For now do not run the benchmark
-  exit
+  exit 0
 fi
recipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/03-epp/02-deployment.yaml (1)

33-38: Harden container: runAsNonRoot, no privilege escalation, seccomp; satisfy CKV_K8S_20/23

Add pod and container security contexts to reduce risk and quiet static analysis.

 spec:
   template:
     metadata:
       labels:
         app: llama3-70b-agg-epp
     spec:
+      securityContext:
+        runAsNonRoot: true
+        seccompProfile:
+          type: RuntimeDefault
       serviceAccountName: epp-sa
@@
       containers:
         - name: epp
@@
           resources:
             requests:
               memory: "1Gi"
               cpu: "1"
             limits:
               memory: "2Gi"
               cpu: "2"
+          securityContext:
+            allowPrivilegeEscalation: false
+            readOnlyRootFilesystem: true
+            capabilities:
+              drop: ["ALL"]
@@
           livenessProbe:
             grpc:
               port: 9003
               service: inference-extension
@@
           readinessProbe:
             grpc:
               port: 9003
               service: inference-extension

Please confirm the gRPC probe service name “inference-extension” matches the server’s registered service.

Also applies to: 39-64, 85-102, 103-109

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8642c4b and 691bfcb.

📒 Files selected for processing (13)
  • deploy/inference-gateway/README.md (1 hunks)
  • recipes/README.md (3 hunks)
  • recipes/gaie_checks.sh (1 hunks)
  • recipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/01-rbac/01-service-account.yaml (1 hunks)
  • recipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/01-rbac/02-cluster-role.yaml (1 hunks)
  • recipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/01-rbac/03-role-binding.yaml (1 hunks)
  • recipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/02-model/01-inference-pool.yaml (1 hunks)
  • recipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/02-model/02-inference-model.yaml (1 hunks)
  • recipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/03-epp/01-configmap.yaml (1 hunks)
  • recipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/03-epp/02-deployment.yaml (1 hunks)
  • recipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/03-epp/03-service.yaml (1 hunks)
  • recipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/03-epp/04-http-route.yaml (1 hunks)
  • recipes/run.sh (6 hunks)
🧰 Additional context used
🪛 Checkov (3.2.334)
recipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/03-epp/02-deployment.yaml

[medium] 17-109: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 17-109: Minimize the admission of root containers

(CKV_K8S_23)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: sglang
  • GitHub Check: trtllm (arm64)
  • GitHub Check: trtllm (amd64)
  • GitHub Check: vllm (amd64)
  • GitHub Check: operator (amd64)
  • GitHub Check: operator (arm64)
  • GitHub Check: vllm (arm64)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (2)
recipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/03-epp/04-http-route.yaml (1)

16-39: Verify Gateway API and InferencePool support in your cluster

  • Ensure your Gateway implementation supports non-Service backendRefs (InferencePool) and add backendRefs.namespace if the pool is deployed in a different namespace. Provide a ReferenceGrant for cross-namespace access.
  • Confirm your cluster has the Gateway API (gateway.networking.k8s.io/v1), the InferencePool CRD (inferencepools.inference.networking.x-k8s.io), and that HTTPRoute rule timeouts are supported.
recipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/02-model/02-inference-model.yaml (1)

16-28: Manual verification needed for CRD version and cross-namespace binding
Cannot verify the installed InferenceModel CRD version without kubectl. Confirm the CRD’s version matches v1alpha2 and, if the referenced InferencePool resides in a different namespace, add a poolRef.namespace field under spec.poolRef.

Signed-off-by: Anna Tchernych <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant