-
Notifications
You must be signed in to change notification settings - Fork 655
feat: Streamline GAIE recipe #3829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Anna Tchernych <[email protected]>
Signed-off-by: Anna Tchernych <[email protected]>
Signed-off-by: Anna Tchernych <[email protected]>
Signed-off-by: Anna Tchernych <[email protected]>
Signed-off-by: Anna Tchernych <[email protected]>
Signed-off-by: Anna Tchernych <[email protected]>
Signed-off-by: Anna Tchernych <[email protected]>
Signed-off-by: Anna Tchernych <[email protected]>
Signed-off-by: Anna Tchernych <[email protected]>
Signed-off-by: Anna Tchernych <[email protected]>
Signed-off-by: Anna Tchernych <[email protected]>
Signed-off-by: Anna Tchernych <[email protected]>
Signed-off-by: Anna Tchernych <[email protected]>
Signed-off-by: Anna Tchernych <[email protected]>
Signed-off-by: Anna Tchernych <[email protected]>
WalkthroughThis pull request introduces Gateway API Inference Extension (GAIE) integration across documentation, validation infrastructure, and deployment automation. Changes include README restructuring to reflect GAIE status, a new pre-flight validation script, Kubernetes manifests for RBAC and model deployment, and updated run.sh logic to conditionally apply GAIE-specific configurations. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Changes span multiple file types (documentation, bash scripts, YAML manifests) with new conditional logic in run.sh. YAML manifests are largely homogeneous configuration boilerplate requiring minimal logic review, while run.sh additions introduce moderate complexity through GAIE flag parsing and branching paths. Documentation restructuring is straightforward but requires verification against intended content flow. Poem
Pre-merge checks❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
recipes/run.sh (1)
83-105: set -u + direct $2 deref can abort parsing; guard option argsUnder set -u, referencing $2 when missing exits before you call missing_requirement. Guard with argc checks before using $2.
- --model) - if [ "$2" ]; then - MODEL=$2 - shift 2 - else - missing_requirement "$1" - fi - ;; + --model) + if [[ $# -lt 2 ]] || [[ "$2" =~ ^- ]]; then missing_requirement "$1"; fi + MODEL="$2"; shift 2;; --framework) - if [ "$2" ]; then - FRAMEWORK=$2 - shift 2 - else - missing_requirement "$1" - fi - ;; + if [[ $# -lt 2 ]] || [[ "$2" =~ ^- ]]; then missing_requirement "$1"; fi + FRAMEWORK="$2"; shift 2;; --namespace) - if [ "$2" ]; then - NAMESPACE=$2 - shift 2 - else - missing_requirement "$1" - fi - ;; + if [[ $# -lt 2 ]] || [[ "$2" =~ ^- ]]; then missing_requirement "$1"; fi + NAMESPACE="$2"; shift 2;;
🧹 Nitpick comments (3)
recipes/run.sh (2)
221-224: Quote kubectl arguments to avoid word-splitting; minor hardeningSafer with quotes around -n/-f args.
- $DRY_RUN kubectl apply -n $NAMESPACE -f $MODEL_CACHE_DIR/model-cache.yaml - $DRY_RUN kubectl apply -n $NAMESPACE -f $MODEL_CACHE_DIR/model-download.yaml + $DRY_RUN kubectl apply -n "$NAMESPACE" -f "$MODEL_CACHE_DIR/model-cache.yaml" + $DRY_RUN kubectl apply -n "$NAMESPACE" -f "$MODEL_CACHE_DIR/model-download.yaml" @@ -$DRY_RUN kubectl apply -n $NAMESPACE -f $DEPLOY_FILE +$DRY_RUN kubectl apply -n "$NAMESPACE" -f "$DEPLOY_FILE" @@ -$DRY_RUN kubectl wait --for=condition=Complete job/model-download -n $NAMESPACE --timeout=6000s +$DRY_RUN kubectl wait --for=condition=Complete "job/model-download" -n "$NAMESPACE" --timeout=6000sAlso applies to: 245-246, 227-227
247-254: GAIE path: check script existence and exit cleanlyEnsure gaie_checks.sh exists and is executable; exit 0 to make intent explicit.
if [[ "$INTEGRATION" == "gaie" ]]; then # run gaie checks. SCRIPT_DIR="$(cd -- "$(dirname "${BASH_SOURCE[0]}")" && pwd)" - "${SCRIPT_DIR}/gaie_checks.sh" + if [[ ! -x "${SCRIPT_DIR}/gaie_checks.sh" ]]; then + echo "ERROR: ${SCRIPT_DIR}/gaie_checks.sh not found or not executable"; exit 1 + fi + "${SCRIPT_DIR}/gaie_checks.sh" kubectl apply -f "$DEPLOY_PATH/gaie/k8s-manifests" -n "$NAMESPACE" # For now do not run the benchmark - exit + exit 0 firecipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/03-epp/02-deployment.yaml (1)
33-38: Harden container: runAsNonRoot, no privilege escalation, seccomp; satisfy CKV_K8S_20/23Add pod and container security contexts to reduce risk and quiet static analysis.
spec: template: metadata: labels: app: llama3-70b-agg-epp spec: + securityContext: + runAsNonRoot: true + seccompProfile: + type: RuntimeDefault serviceAccountName: epp-sa @@ containers: - name: epp @@ resources: requests: memory: "1Gi" cpu: "1" limits: memory: "2Gi" cpu: "2" + securityContext: + allowPrivilegeEscalation: false + readOnlyRootFilesystem: true + capabilities: + drop: ["ALL"] @@ livenessProbe: grpc: port: 9003 service: inference-extension @@ readinessProbe: grpc: port: 9003 service: inference-extensionPlease confirm the gRPC probe service name “inference-extension” matches the server’s registered service.
Also applies to: 39-64, 85-102, 103-109
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
deploy/inference-gateway/README.md(1 hunks)recipes/README.md(3 hunks)recipes/gaie_checks.sh(1 hunks)recipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/01-rbac/01-service-account.yaml(1 hunks)recipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/01-rbac/02-cluster-role.yaml(1 hunks)recipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/01-rbac/03-role-binding.yaml(1 hunks)recipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/02-model/01-inference-pool.yaml(1 hunks)recipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/02-model/02-inference-model.yaml(1 hunks)recipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/03-epp/01-configmap.yaml(1 hunks)recipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/03-epp/02-deployment.yaml(1 hunks)recipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/03-epp/03-service.yaml(1 hunks)recipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/03-epp/04-http-route.yaml(1 hunks)recipes/run.sh(6 hunks)
🧰 Additional context used
🪛 Checkov (3.2.334)
recipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/03-epp/02-deployment.yaml
[medium] 17-109: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 17-109: Minimize the admission of root containers
(CKV_K8S_23)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: sglang
- GitHub Check: trtllm (arm64)
- GitHub Check: trtllm (amd64)
- GitHub Check: vllm (amd64)
- GitHub Check: operator (amd64)
- GitHub Check: operator (arm64)
- GitHub Check: vllm (arm64)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (2)
recipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/03-epp/04-http-route.yaml (1)
16-39: Verify Gateway API and InferencePool support in your cluster
- Ensure your Gateway implementation supports non-Service backendRefs (InferencePool) and add
backendRefs.namespaceif the pool is deployed in a different namespace. Provide a ReferenceGrant for cross-namespace access.- Confirm your cluster has the Gateway API (
gateway.networking.k8s.io/v1), the InferencePool CRD (inferencepools.inference.networking.x-k8s.io), and that HTTPRoute rule timeouts are supported.recipes/llama-3-70b/vllm/agg/gaie/k8s-manifests/02-model/02-inference-model.yaml (1)
16-28: Manual verification needed for CRD version and cross-namespace binding
Cannot verify the installedInferenceModelCRD version withoutkubectl. Confirm the CRD’s version matchesv1alpha2and, if the referencedInferencePoolresides in a different namespace, add apoolRef.namespacefield underspec.poolRef.
Signed-off-by: Anna Tchernych <[email protected]>
Overview:
Dep 508 streamline v2
Details:
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
Release Notes
New Features
--gaieflag.Documentation