diff --git a/benchmarks/profiler/deploy/profile_sla_moe_job.yaml b/benchmarks/profiler/deploy/profile_sla_moe_job.yaml index 8b5db97d3f..d99c86133c 100644 --- a/benchmarks/profiler/deploy/profile_sla_moe_job.yaml +++ b/benchmarks/profiler/deploy/profile_sla_moe_job.yaml @@ -31,7 +31,7 @@ spec: command: ["python", "-m", "benchmarks.profiler.profile_sla"] args: - --config - - /sgl-workspace/dynamo/recipes/deepseek-r1/sglang-wideep/tep16p-dep16d-disagg.yaml + - /sgl-workspace/dynamo/recipes/deepseek-r1/sglang/disagg-16gpu/deploy.yaml - --output-dir - /data/profiling_results - --namespace diff --git a/recipes/CONTRIBUTING.md b/recipes/CONTRIBUTING.md new file mode 100644 index 0000000000..90e5da54f9 --- /dev/null +++ b/recipes/CONTRIBUTING.md @@ -0,0 +1,23 @@ +# Recipes Contributing Guide + +When adding new model recipes, ensure they follow the standard structure: +```text +/ +├── model-cache/ +│ ├── model-cache.yaml +│ └── model-download.yaml +├── / +│ └── / +│ ├── deploy.yaml +│ └── perf.yaml (optional) +└── README.md (optional) +``` + +## Validation +The `run.sh` script expects this exact directory structure and will validate that the directories and files exist before deployment: +- Model directory exists in `recipes//` +- Framework is one of the supported frameworks (vllm, sglang, trtllm) +- Framework directory exists in `recipes///` +- Deployment directory exists in `recipes////` +- Required files (`deploy.yaml`) exist in the deployment directory +- If present, performance benchmarks (`perf.yaml`) will be automatically executed \ No newline at end of file diff --git a/recipes/README.md b/recipes/README.md index 636df5484b..27a43b59c4 100644 --- a/recipes/README.md +++ b/recipes/README.md @@ -1,88 +1,285 @@ -# Dynamo model serving recipes +# Dynamo Model Serving Recipes -| Model family | Backend | Mode | GPU | Deployment | Benchmark | -|---------------|---------|---------------------|-------|------------|-----------| -| llama-3-70b | vllm | agg | H100, H200 | ✓ | ✓ | -| llama-3-70b | vllm | disagg-multi-node | H100, H200 | ✓ | ✓ | -| llama-3-70b | vllm | disagg-single-node | H100, H200 | ✓ | ✓ | -| DeepSeek-R1 | sglang | disaggregated | H200 | ✓ | 🚧 | -| oss-gpt | trtllm | aggregated | GB200 | ✓ | ✓ | +This repository contains production-ready recipes for deploying large language models using the Dynamo platform. Each recipe includes deployment configurations, performance benchmarking, and model caching setup. + +## Contents +- [Available Models](#available-models) +- [Quick Start](#quick-start) +- [Prerequisites](#prerequisites) +- Deployment Methods + - [Option 1: Automated Deployment](#option-1-automated-deployment) + - [Option 2: Manual Deployment](#option-2-manual-deployment) + + +## Available Models + +| Model Family | Framework | Deployment Mode | GPU Requirements | Status | Benchmark | +|-----------------|-----------|---------------------|------------------|--------|-----------| +| llama-3-70b | vllm | agg | 4x H100/H200 | ✅ | ✅ | +| llama-3-70b | vllm | disagg (1 node) | 8x H100/H200 | ✅ | ✅ | +| llama-3-70b | vllm | disagg (multi-node) | 16x H100/H200 | ✅ | ✅ | +| deepseek-r1 | sglang | disagg (1 node, wide-ep) | 8x H200 | ✅ | 🚧 | +| deepseek-r1 | sglang | disagg (multi-node, wide-ep) | 16x H200 | ✅ | 🚧 | +| gpt-oss-120b | trtllm | agg | 4x GB200 | ✅ | ✅ | + +**Legend:** +- ✅ Functional +- 🚧 Under development + + +**Recipe Directory Structure:** +Recipes are organized into a directory structure that follows the pattern: +```text +/ +├── model-cache/ +│ ├── model-cache.yaml # PVC for model cache +│ └── model-download.yaml # Job for model download +├── / +│ └── / +│ ├── deploy.yaml # DynamoGraphDeployment CRD and optional configmap for custom configuration +│ └── perf.yaml (optional) # Performance benchmark +└── README.md (optional) # Model documentation +``` + +## Quick Start + +Follow the instructions in the [Prerequisites](#prerequisites) section to set up your environment. + +Choose your preferred deployment method: using the `run.sh` script or manual deployment steps. ## Prerequisites -1. Create a namespace and populate NAMESPACE environment variable -This environment variable is used in later steps to deploy and perf-test the model. +### 1. Environment Setup + +Create a Kubernetes namespace and set environment variable: ```bash export NAMESPACE=your-namespace kubectl create namespace ${NAMESPACE} ``` -2. **Dynamo Cloud Platform installed** - Follow [Quickstart Guide](../docs/kubernetes/README.md) +### 2. Deploy Dynamo Platform + +Install the Dynamo Cloud Platform following the [Quickstart Guide](../docs/kubernetes/README.md). + +### 3. GPU Cluster + +Ensure your Kubernetes cluster has: +- GPU nodes with appropriate GPU types (see model requirements above) +- GPU operator installed +- Sufficient GPU memory and compute resources + +### 4. Container Registry Access -3. **Kubernetes cluster with GPU support** +Ensure access to NVIDIA container registry for runtime images: +- `nvcr.io/nvidia/ai-dynamo/vllm-runtime:x.y.z` +- `nvcr.io/nvidia/ai-dynamo/trtllm-runtime:x.y.z` +- `nvcr.io/nvidia/ai-dynamo/sglang-runtime:x.y.z` -4. **Container registry access** for vLLM runtime images +### 5. HuggingFace Access and Kubernetes Secret Creation -5. **HuggingFace token secret** (referenced as `envFromSecret: hf-token-secret`) -Update the `hf-token-secret.yaml` file with your HuggingFace token. +Set up a kubernetes secret with the HuggingFace token for model download: ```bash +# Update the token in the secret file +vim hf_hub_secret/hf_hub_secret.yaml + +# Apply the secret kubectl apply -f hf_hub_secret/hf_hub_secret.yaml -n ${NAMESPACE} ``` -6. (Optional) Create a shared model cache pvc to store the model weights. -Choose a storage class to create the model cache pvc. You'll need to use this storage class name to update the `storageClass` field in the model-cache/model-cache.yaml file. +### 6. Configure Storage Class + +Configure persistent storage for model caching: ```bash +# Check available storage classes kubectl get storageclass ``` -## Running the recipes +Replace "your-storage-class-name" with your actual storage class in the file: `/model-cache/model-cache.yaml` + +```yaml +# In /model-cache/model-cache.yaml +spec: + storageClassName: "your-actual-storage-class" # Replace this +``` + +## Option 1: Automated Deployment -Run the recipe to deploy a model: +Use the `run.sh` script for fully automated deployment: + +**Note:** The script automatically: +- Create model cache PVC and downloads the model +- Deploy the model service +- Runs performance benchmark if a `perf.yaml` file is present in the deployment directory + + +#### Script Usage ```bash -./run.sh --model --framework +./run.sh [OPTIONS] --model --framework --deployment ``` -Arguments: - Deployment type (e.g., agg, disagg-single-node, disagg-multi-node) +**Required Options:** +- `--model `: Model name matching the directory name in the recipes directory (e.g., llama-3-70b, gpt-oss-120b, deepseek-r1) +- `--framework `: Backend framework (`vllm`, `trtllm`, `sglang`) +- `--deployment `: Deployment mode (e.g., agg, disagg, disagg-single-node, disagg-multi-node) + +**Optional Options:** +- `--namespace `: Kubernetes namespace (default: dynamo) +- `--dry-run`: Show commands without executing them +- `-h, --help`: Show help message + +**Environment Variables:** +- `NAMESPACE`: Kubernetes namespace (default: dynamo) + +#### Example Usage +```bash +# Set up environment +export NAMESPACE=your-namespace +kubectl create namespace ${NAMESPACE} +# Configure HuggingFace token +kubectl apply -f hf_hub_secret/hf_hub_secret.yaml -n ${NAMESPACE} + +# use run.sh script to deploy the model +# Deploy Llama-3-70B with vLLM (aggregated mode) +./run.sh --model llama-3-70b --framework vllm --deployment agg + +# Deploy GPT-OSS-120B with TensorRT-LLM +./run.sh --model gpt-oss-120b --framework trtllm --deployment agg + +# Deploy DeepSeek-R1 with SGLang (disaggregated mode) +./run.sh --model deepseek-r1 --framework sglang --deployment disagg + +# Deploy with custom namespace +./run.sh --namespace my-namespace --model llama-3-70b --framework vllm --deployment agg + +# Dry run to see what would be executed +./run.sh --dry-run --model llama-3-70b --framework vllm --deployment agg +``` -Required Options: - --model Model name (e.g., llama-3-70b) - --framework Framework one of VLLM TRTLLM SGLANG (default: VLLM) -Optional: - --skip-model-cache Skip model downloading (assumes model cache already exists) - -h, --help Show this help message +## Option 2: Manual Deployment -Environment Variables: - NAMESPACE Kubernetes namespace (default: dynamo) +For step-by-step manual deployment follow these steps : -Examples: - ./run.sh --model llama-3-70b --framework vllm agg - ./run.sh --skip-model-cache --model llama-3-70b --framework vllm agg - ./run.sh --model llama-3-70b --framework trtllm disagg-single-node -Example: ```bash -./run.sh --model llama-3-70b --framework vllm --deployment-type agg +# 0. Set up environment (see Prerequisites section) +export NAMESPACE=your-namespace +kubectl create namespace ${NAMESPACE} +kubectl apply -f hf_hub_secret/hf_hub_secret.yaml -n ${NAMESPACE} + +# 1. Download model (see Model Download section) +kubectl apply -n $NAMESPACE -f /model-cache/ + +# 2. Deploy model (see Deployment section) +kubectl apply -n $NAMESPACE -f ///deploy.yaml + +# 3. Run benchmarks (optional, if perf.yaml exists) +kubectl apply -n $NAMESPACE -f ///perf.yaml +``` + +### Step 1: Download Model + +```bash +# Start the download job +kubectl apply -n $NAMESPACE -f /model-cache + +# Verify job creation +kubectl get jobs -n $NAMESPACE | grep model-download +``` + +Monitor and wait for the model download to complete: + +```bash + +# Wait for job completion (timeout after 100 minutes) +kubectl wait --for=condition=Complete job/model-download -n $NAMESPACE --timeout=6000s + +# Check job status +kubectl get job model-download -n $NAMESPACE + +# View download logs +kubectl logs job/model-download -n $NAMESPACE +``` + +### Step 2: Deploy Model Service + +```bash +# Navigate to the specific deployment configuration +cd /// + +# Deploy the model service +kubectl apply -n $NAMESPACE -f deploy.yaml + +# Verify deployment creation +kubectl get deployments -n $NAMESPACE ``` +#### Wait for Deployment Ready -## Dry run mode +```bash +# Get deployment name from the deploy.yaml file +DEPLOYMENT_NAME=$(grep "name:" deploy.yaml | head -1 | awk '{print $2}') + +# Wait for deployment to be ready (timeout after 10 minutes) +kubectl wait --for=condition=available deployment/$DEPLOYMENT_NAME -n $NAMESPACE --timeout=1200s + +# Check deployment status +kubectl get deployment $DEPLOYMENT_NAME -n $NAMESPACE + +# Check pod status +kubectl get pods -n $NAMESPACE -l app=$DEPLOYMENT_NAME +``` + +#### Verify Model Service + +```bash +# Check if service is running +kubectl get services -n $NAMESPACE + +# Test model endpoint (port-forward to test locally) +kubectl port-forward service/${DEPLOYMENT_NAME}-frontend 8000:8000 -n $NAMESPACE + +# Test the model API (in another terminal) +curl http://localhost:8000/v1/models -To dry run the recipe, add the `--dry-run` flag. +# Stop port-forward when done +pkill -f "kubectl port-forward" +``` + +### Step 3: Performance Benchmarking (Optional) + +Run performance benchmarks to evaluate model performance. Note that benchmarking is only available for models that include a `perf.yaml` file (optional): + +#### Launch Benchmark Job ```bash -./run.sh --dry-run --model llama-3-70b --framework vllm agg +# From the deployment directory +kubectl apply -n $NAMESPACE -f perf.yaml + +# Verify benchmark job creation +kubectl get jobs -n $NAMESPACE ``` -## (Optional) Running the recipes with model cache -You may need to cache the model weights on a PVC to avoid repeated downloads of the model weights. - See the [Prerequisites](#prerequisites) section for more details. +#### Monitor Benchmark Progress ```bash -./run.sh --model llama-3-70b --framework vllm --deployment-type agg --skip-model-cache +# Get benchmark job name +PERF_JOB_NAME=$(grep "name:" perf.yaml | head -1 | awk '{print $2}') + +# Monitor benchmark logs in real-time +kubectl logs -f job/$PERF_JOB_NAME -n $NAMESPACE + +# Wait for benchmark completion (timeout after 100 minutes) +kubectl wait --for=condition=Complete job/$PERF_JOB_NAME -n $NAMESPACE --timeout=6000s ``` + +#### View Benchmark Results + +```bash +# Check final benchmark results +kubectl logs job/$PERF_JOB_NAME -n $NAMESPACE | tail -50 +``` \ No newline at end of file diff --git a/recipes/deepseek-r1/model_cache/model-cache.yaml b/recipes/deepseek-r1/model-cache/model-cache.yaml similarity index 100% rename from recipes/deepseek-r1/model_cache/model-cache.yaml rename to recipes/deepseek-r1/model-cache/model-cache.yaml diff --git a/recipes/deepseek-r1/model-cache/model-download.yaml b/recipes/deepseek-r1/model-cache/model-download.yaml new file mode 100644 index 0000000000..0f65b6b58d --- /dev/null +++ b/recipes/deepseek-r1/model-cache/model-download.yaml @@ -0,0 +1,44 @@ +# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +apiVersion: batch/v1 +kind: Job +metadata: + name: model-download +spec: + backoffLimit: 3 + completions: 1 + parallelism: 1 + template: + metadata: + labels: + app: model-download + spec: + restartPolicy: Never + containers: + - name: model-download + image: python:3.10-slim + command: ["sh", "-c"] + envFrom: + - secretRef: + name: hf-token-secret + env: + - name: MODEL_NAME + value: deepseek-ai/DeepSeek-R1 + - name: HF_HOME + value: /model-store + - name: HF_HUB_ENABLE_HF_TRANSFER + value: "1" + - name: MODEL_REVISION + value: 56d4cbbb4d29f4355bab4b9a39ccb717a14ad5ad + args: + - | + set -eux + pip install --no-cache-dir huggingface_hub hf_transfer + hf download $MODEL_NAME --revision $MODEL_REVISION + volumeMounts: + - name: model-cache + mountPath: /model-store + volumes: + - name: model-cache + persistentVolumeClaim: + claimName: model-cache \ No newline at end of file diff --git a/recipes/deepseek-r1/model_cache/model-download.yaml b/recipes/deepseek-r1/model_cache/model-download.yaml deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/recipes/deepseek-r1/sglang-wideep/README.md b/recipes/deepseek-r1/sglang/README.md similarity index 75% rename from recipes/deepseek-r1/sglang-wideep/README.md rename to recipes/deepseek-r1/sglang/README.md index 5bb1581a89..4d93cd849a 100644 --- a/recipes/deepseek-r1/sglang-wideep/README.md +++ b/recipes/deepseek-r1/sglang/README.md @@ -1,4 +1,8 @@ -# Container +# DeepSeek R1 SGLang Recipe + +This recipe is for running DeepSeek R1 with SGLang in disaggregated mode. It is based on the WideEP recipe from the SGLang team. + +## Container Use the Dockerfile in `container/Dockerfile.sglang-wideep` to build the container, or @@ -8,7 +12,7 @@ Use the Dockerfile in `container/Dockerfile.sglang-wideep` to build the containe Dynamo commits after `1b3eed4b6a0e735d4ecec6681f4c0b89f2112167` (Sep 18, 2025) are required. -# Hardware +## Hardware The two deployment recipes are for 8xH200 and 16xH200. It should also work for other GPU SKUs. Change the TDP and DEP size accordingly to match the GPU capacity. diff --git a/recipes/deepseek-r1/sglang-wideep/deepep.json b/recipes/deepseek-r1/sglang/deepep.json similarity index 100% rename from recipes/deepseek-r1/sglang-wideep/deepep.json rename to recipes/deepseek-r1/sglang/deepep.json diff --git a/recipes/deepseek-r1/sglang-wideep/tep16p-dep16d-disagg.yaml b/recipes/deepseek-r1/sglang/disagg-16gpu/deploy.yaml similarity index 100% rename from recipes/deepseek-r1/sglang-wideep/tep16p-dep16d-disagg.yaml rename to recipes/deepseek-r1/sglang/disagg-16gpu/deploy.yaml diff --git a/recipes/deepseek-r1/sglang-wideep/tep8p-dep8d-disagg.yaml b/recipes/deepseek-r1/sglang/disagg-8gpu/deploy.yaml similarity index 100% rename from recipes/deepseek-r1/sglang-wideep/tep8p-dep8d-disagg.yaml rename to recipes/deepseek-r1/sglang/disagg-8gpu/deploy.yaml diff --git a/recipes/gpt-oss-120b/trtllm/agg/config.yaml b/recipes/gpt-oss-120b/trtllm/agg/config.yaml deleted file mode 100644 index 2d1701bc3b..0000000000 --- a/recipes/gpt-oss-120b/trtllm/agg/config.yaml +++ /dev/null @@ -1,17 +0,0 @@ -# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. -# SPDX-License-Identifier: Apache-2.0 -apiVersion: v1 -kind: ConfigMap -metadata: - name: llm-config -data: - config.yaml: | - enable_attention_dp: true - cuda_graph_config: - max_batch_size: 800 - enable_padding: true - kv_cache_config: - enable_block_reuse: false - stream_interval: 20 - moe_config: - backend: CUTLASS \ No newline at end of file diff --git a/recipes/gpt-oss-120b/trtllm/agg/deploy.yaml b/recipes/gpt-oss-120b/trtllm/agg/deploy.yaml index 16be6ffae0..a3f0b5c2e6 100644 --- a/recipes/gpt-oss-120b/trtllm/agg/deploy.yaml +++ b/recipes/gpt-oss-120b/trtllm/agg/deploy.yaml @@ -1,5 +1,21 @@ # SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. # SPDX-License-Identifier: Apache-2.0 +apiVersion: v1 +kind: ConfigMap +metadata: + name: llm-config +data: + config.yaml: | + enable_attention_dp: true + cuda_graph_config: + max_batch_size: 800 + enable_padding: true + kv_cache_config: + enable_block_reuse: false + stream_interval: 20 + moe_config: + backend: CUTLASS +--- apiVersion: nvidia.com/v1alpha1 kind: DynamoGraphDeployment metadata: @@ -7,7 +23,7 @@ metadata: spec: backendFramework: trtllm pvcs: - - name: model-cache-oss-gpt120b + - name: model-cache create: false services: Frontend: @@ -31,17 +47,13 @@ spec: - /bin/sh - -c image: my-registry/trtllm-runtime:my-tag - pvc: - create: false - mountPoint: /model-store - name: model-cache replicas: 1 TrtllmWorker: componentType: main dynamoNamespace: gpt-oss-agg envFromSecret: hf-token-secret volumeMounts: - - name: model-cache-oss-gpt120b + - name: model-cache mountPoint: /root/.cache/huggingface sharedMemory: size: 80Gi @@ -90,10 +102,6 @@ spec: - configMap: name: llm-config name: llm-config - pvc: - create: false - mountPoint: /model-store - name: model-cache replicas: 1 resources: limits: diff --git a/recipes/gpt-oss-120b/trtllm/agg/perf.yaml b/recipes/gpt-oss-120b/trtllm/agg/perf.yaml index a1dbbd696a..42dc37f21e 100644 --- a/recipes/gpt-oss-120b/trtllm/agg/perf.yaml +++ b/recipes/gpt-oss-120b/trtllm/agg/perf.yaml @@ -3,7 +3,7 @@ apiVersion: batch/v1 kind: Job metadata: - name: oss-gpt120b-bench + name: gpt-oss-120b-bench spec: backoffLimit: 1 completions: 1 @@ -11,7 +11,7 @@ spec: template: metadata: labels: - app: oss-gpt120b-bench + app: gpt-oss-120b-bench spec: affinity: podAntiAffinity: diff --git a/recipes/run.sh b/recipes/run.sh index e611d39711..af83c65b2e 100755 --- a/recipes/run.sh +++ b/recipes/run.sh @@ -17,8 +17,7 @@ RECIPES_DIR="$( cd "$( dirname "$0" )" && pwd )" # Default values NAMESPACE="${NAMESPACE:-dynamo}" -DOWNLOAD_MODEL=true -DEPLOY_TYPE="" +DEPLOYMENT="" MODEL="" FRAMEWORK="" DRY_RUN="" @@ -29,28 +28,25 @@ DEFAULT_FRAMEWORK=VLLM # Function to show usage usage() { - echo "Usage: $0 [OPTIONS] --model --framework " - echo "" - echo "Arguments:" - echo " Deployment type (e.g., agg, disagg-single-node, disagg-multi-node)" + echo "Usage: $0 [OPTIONS] --model --framework --deployment " echo "" echo "Required Options:" - echo " --model Model name (e.g., llama-3-70b)" - echo " --framework Framework one of ${!FRAMEWORKS[*]} (default: ${DEFAULT_FRAMEWORK})" + echo " --model Model name (e.g., llama-3-70b)" + echo " --framework Framework one of ${!FRAMEWORKS[*]} (default: ${DEFAULT_FRAMEWORK})" + echo " --deployment Deployment type (e.g., agg, disagg etc, please refer to the README.md for available deployment types)" echo "" echo "Optional:" - echo " --namespace Kubernetes namespace (default: dynamo)" - echo " --skip-model-cache Skip model downloading (assumes model cache already exists)" - echo " --dry-run Print commands without executing them" - echo " -h, --help Show this help message" + echo " --namespace Kubernetes namespace (default: dynamo)" + echo " --dry-run Print commands without executing them" + echo " -h, --help Show this help message" echo "" echo "Environment Variables:" - echo " NAMESPACE Kubernetes namespace (default: dynamo)" + echo " NAMESPACE Kubernetes namespace (default: dynamo)" echo "" echo "Examples:" - echo " $0 --model llama-3-70b --framework vllm agg" - echo " $0 --skip-model-cache --model llama-3-70b --framework vllm agg" - echo " $0 --namespace my-ns --model llama-3-70b --framework trtllm disagg-single-node" + echo " $0 --model llama-3-70b --framework vllm --deployment agg" + echo " $0 --model llama-3-70b --framework trtllm --deployment disagg-single-node" + echo " $0 --namespace my-ns --model llama-3-70b --framework vllm --deployment disagg-multi-node" exit 1 } @@ -66,10 +62,6 @@ error() { while [[ $# -gt 0 ]]; do case $1 in - --skip-model-cache) - DOWNLOAD_MODEL=false - shift - ;; --dry-run) DRY_RUN="echo" shift @@ -90,6 +82,14 @@ while [[ $# -gt 0 ]]; do missing_requirement "$1" fi ;; + --deployment) + if [ "$2" ]; then + DEPLOYMENT=$2 + shift 2 + else + missing_requirement "$1" + fi + ;; --namespace) if [ "$2" ]; then NAMESPACE=$2 @@ -105,12 +105,7 @@ while [[ $# -gt 0 ]]; do error 'ERROR: Unknown option: ' "$1" ;; *) - if [[ -z "$DEPLOY_TYPE" ]]; then - DEPLOY_TYPE="$1" - else - error "ERROR: Multiple deployment type arguments provided: " "$1" - fi - shift + error "ERROR: Unknown argument: " "$1" ;; esac done @@ -127,12 +122,12 @@ if [ -n "$FRAMEWORK" ]; then fi # Validate required arguments -if [[ -z "$MODEL" ]] || [[ -z "$DEPLOY_TYPE" ]]; then +if [[ -z "$MODEL" ]] || [[ -z "$DEPLOYMENT" ]]; then if [[ -z "$MODEL" ]]; then echo "ERROR: --model argument is required" fi - if [[ -z "$DEPLOY_TYPE" ]]; then - echo "ERROR: deployment-type argument is required" + if [[ -z "$DEPLOYMENT" ]]; then + echo "ERROR: --deployment argument is required" fi echo "" usage @@ -141,7 +136,7 @@ fi # Construct paths based on new structure: recipes//// MODEL_DIR="$RECIPES_DIR/$MODEL" FRAMEWORK_DIR="$MODEL_DIR/${FRAMEWORK,,}" -DEPLOY_PATH="$FRAMEWORK_DIR/$DEPLOY_TYPE" +DEPLOY_PATH="$FRAMEWORK_DIR/$DEPLOYMENT" # Check if model directory exists if [[ ! -d "$MODEL_DIR" ]]; then @@ -161,7 +156,7 @@ fi # Check if deployment directory exists if [[ ! -d "$DEPLOY_PATH" ]]; then - echo "Error: Deployment type '$DEPLOY_TYPE' does not exist in $FRAMEWORK_DIR" + echo "Error: Deployment type '$DEPLOYMENT' does not exist in $FRAMEWORK_DIR" echo "Available deployment types for $MODEL/${FRAMEWORK,,}:" ls -1 "$FRAMEWORK_DIR" | grep -v "\.sh$\|\.md$" | sed 's/^/ /' exit 1 @@ -176,9 +171,13 @@ if [[ ! -f "$DEPLOY_FILE" ]]; then exit 1 fi -if [[ ! -f "$PERF_FILE" ]]; then - echo "Error: Performance file '$PERF_FILE' not found" - exit 1 +# Check if perf file exists (optional) +PERF_AVAILABLE=false +if [[ -f "$PERF_FILE" ]]; then + PERF_AVAILABLE=true + echo "Performance benchmark file found: $PERF_FILE" +else + echo "Performance benchmark file not found: $PERF_FILE (skipping benchmarks)" fi # Show deployment information @@ -187,42 +186,43 @@ echo "Dynamo Recipe Deployment" echo "======================================" echo "Model: $MODEL" echo "Framework: ${FRAMEWORK,,}" -echo "Deployment Type: $DEPLOY_TYPE" +echo "Deployment Type: $DEPLOYMENT" echo "Namespace: $NAMESPACE" -echo "Model Download: $DOWNLOAD_MODEL" echo "======================================" # Handle model downloading MODEL_CACHE_DIR="$MODEL_DIR/model-cache" -if [[ "$DOWNLOAD_MODEL" == "true" ]]; then - echo "Creating PVC for model cache and downloading model..." - $DRY_RUN kubectl apply -n $NAMESPACE -f $MODEL_CACHE_DIR/model-cache.yaml - $DRY_RUN kubectl apply -n $NAMESPACE -f $MODEL_CACHE_DIR/model-download.yaml - - # Wait for the model download to complete - echo "Waiting for the model download to complete..." - $DRY_RUN kubectl wait --for=condition=Complete job/model-download-${MODEL} -n $NAMESPACE --timeout=6000s -else - echo "Skipping model download (using existing model cache)..." - # Still create the PVC in case it doesn't exist - $DRY_RUN kubectl apply -n $NAMESPACE -f $MODEL_CACHE_DIR/model-cache.yaml -fi +echo "Creating PVC for model cache and downloading model..." +$DRY_RUN kubectl apply -n $NAMESPACE -f $MODEL_CACHE_DIR/model-cache.yaml +$DRY_RUN kubectl apply -n $NAMESPACE -f $MODEL_CACHE_DIR/model-download.yaml + +# Wait for the model download to complete +MODEL_DOWNLOAD_JOB_NAME=$(grep "name:" $MODEL_CACHE_DIR/model-download.yaml | head -1 | awk '{print $2}') +echo "Waiting for job '$MODEL_DOWNLOAD_JOB_NAME' to complete..." +$DRY_RUN kubectl wait --for=condition=Complete job/$MODEL_DOWNLOAD_JOB_NAME -n $NAMESPACE --timeout=6000s # Deploy the specified configuration -echo "Deploying $MODEL ${FRAMEWORK,,} $DEPLOY_TYPE configuration..." +echo "Deploying $MODEL ${FRAMEWORK,,} $DEPLOYMENT configuration..." $DRY_RUN kubectl apply -n $NAMESPACE -f $DEPLOY_FILE -# Launch the benchmark job -echo "Launching benchmark job..." -$DRY_RUN kubectl apply -n $NAMESPACE -f $PERF_FILE - -# Construct job name from the perf file -JOB_NAME=$(grep "name:" $PERF_FILE | head -1 | awk '{print $2}') -echo "Waiting for job '$JOB_NAME' to complete..." -$DRY_RUN kubectl wait --for=condition=Complete job/$JOB_NAME -n $NAMESPACE --timeout=6000s - -# Print logs from the benchmark job -echo "======================================" -echo "Benchmark completed. Logs:" -echo "======================================" -$DRY_RUN kubectl logs job/$JOB_NAME -n $NAMESPACE \ No newline at end of file +# Launch the benchmark job (if available) +if [[ "$PERF_AVAILABLE" == "true" ]]; then + echo "Launching benchmark job..." + $DRY_RUN kubectl apply -n $NAMESPACE -f $PERF_FILE + + # Construct job name from the perf file + JOB_NAME=$(grep "name:" $PERF_FILE | head -1 | awk '{print $2}') + echo "Waiting for job '$JOB_NAME' to complete..." + $DRY_RUN kubectl wait --for=condition=Complete job/$JOB_NAME -n $NAMESPACE --timeout=6000s + + # Print logs from the benchmark job + echo "======================================" + echo "Benchmark completed. Logs:" + echo "======================================" + $DRY_RUN kubectl logs job/$JOB_NAME -n $NAMESPACE +else + echo "======================================" + echo "Deployment completed successfully!" + echo "No performance benchmark available for this configuration." + echo "======================================" +fi \ No newline at end of file diff --git a/tests/profiler/test_profile_sla_dryrun.py b/tests/profiler/test_profile_sla_dryrun.py index e9afd51cca..af0fb716e5 100644 --- a/tests/profiler/test_profile_sla_dryrun.py +++ b/tests/profiler/test_profile_sla_dryrun.py @@ -169,9 +169,7 @@ def sglang_moe_args(self): class Args: def __init__(self): self.backend = "sglang" - self.config = ( - "recipes/deepseek-r1/sglang-wideep/tep16p-dep16d-disagg.yaml" - ) + self.config = "recipes/deepseek-r1/sglang/disagg-16gpu/deploy.yaml" self.output_dir = "/tmp/test_profiling_results" self.namespace = "test-namespace" self.min_num_gpus_per_engine = 8