ai-dynamo · biswapanda · Oct 27, 2025 · Oct 23, 2025 · Oct 23, 2025 · Oct 23, 2025
@@ -31,7 +31,7 @@ spec:
         command: ["python", "-m", "benchmarks.profiler.profile_sla"]
         args:
           - --config
-          - /sgl-workspace/dynamo/recipes/deepseek-r1/sglang-wideep/tep16p-dep16d-disagg.yaml
+          - /sgl-workspace/dynamo/recipes/deepseek-r1/sglang/disagg-16gpu/deploy.yaml
           - --output-dir
           - /data/profiling_results
           - --namespace

@@ -0,0 +1,23 @@
+#  Recipes Contributing Guide
+
+When adding new model recipes, ensure they follow the standard structure:
+```text
+<model-name>/
+├── model-cache/
+│   ├── model-cache.yaml
+│   └── model-download.yaml
+├── <framework>/
+│   └── <deployment-mode>/
+│       ├── deploy.yaml
+│       └── perf.yaml (optional)
+└── README.md (optional)
+```
+
+## Validation
+The `run.sh` script expects this exact directory structure and will validate that the directories and files exist before deployment:
+- Model directory exists in `recipes/<model>/`
+- Framework is one of the supported frameworks (vllm, sglang, trtllm)
+- Framework directory exists in `recipes/<model>/<framework>/`
+- Deployment directory exists in `recipes/<model>/<framework>/<deployment>/`
+- Required files (`deploy.yaml`) exist in the deployment directory
+- If present, performance benchmarks (`perf.yaml`) will be automatically executed
@@ -1,88 +1,285 @@
-# Dynamo model serving recipes
+# Dynamo Model Serving Recipes
 
-| Model family  | Backend | Mode                | GPU   | Deployment | Benchmark |
-|---------------|---------|---------------------|-------|------------|-----------|
-| llama-3-70b   | vllm    | agg                 | H100, H200  |     ✓      |     ✓     |
-| llama-3-70b   | vllm    | disagg-multi-node   | H100, H200  |     ✓      |     ✓     |
-| llama-3-70b   | vllm    | disagg-single-node  | H100, H200  |     ✓      |     ✓     |
-| DeepSeek-R1   | sglang  | disaggregated       | H200  |     ✓      |    🚧     |
-| oss-gpt       | trtllm  | aggregated          | GB200 |     ✓      |     ✓     |
+This repository contains production-ready recipes for deploying large language models using the Dynamo platform. Each recipe includes deployment configurations, performance benchmarking, and model caching setup.
+
+## Contents
+- [Available Models](#available-models)
+- [Quick Start](#quick-start)
+- [Prerequisites](#prerequisites)
+- Deployment Methods
+   - [Option 1: Automated Deployment](#option-1-automated-deployment)
+   - [Option 2: Manual Deployment](#option-2-manual-deployment)
+
+
+## Available Models
+
+| Model Family    | Framework | Deployment Mode      | GPU Requirements | Status | Benchmark |
+|-----------------|-----------|---------------------|------------------|--------|-----------|
+| llama-3-70b     | vllm      | agg                 | 4x H100/H200     | ✅     | ✅        |
+| llama-3-70b     | vllm      | disagg (1 node)      | 8x H100/H200    | ✅     | ✅        |
+| llama-3-70b     | vllm      | disagg (multi-node)     | 16x H100/H200    | ✅     | ✅        |
+| deepseek-r1     | sglang    | disagg (1 node, wide-ep)     | 8x H200          | ✅     | 🚧        |
+| deepseek-r1     | sglang    | disagg (multi-node, wide-ep)     | 16x H200        | ✅     | 🚧        |
+| gpt-oss-120b    | trtllm    | agg                 | 4x GB200         | ✅     | ✅        |
+
+**Legend:**
+- ✅ Functional
+- 🚧 Under development
+
+
+**Recipe Directory Structure:**
+Recipes are organized into a directory structure that follows the pattern:
+```text
+<model-name>/
+├── model-cache/
+│   ├── model-cache.yaml         # PVC for model cache
+│   └── model-download.yaml      # Job for model download
+├── <framework>/
+│   └── <deployment-mode>/
+│       ├── deploy.yaml          # DynamoGraphDeployment CRD and optional configmap for custom configuration
+│       └── perf.yaml (optional) # Performance benchmark
+└── README.md (optional)         # Model documentation
+```
+
+## Quick Start
+
+Follow the instructions in the [Prerequisites](#prerequisites) section to set up your environment.
+
+Choose your preferred deployment method: using the `run.sh` script or manual deployment steps.
 
 
 ## Prerequisites
 
-1. Create a namespace and populate NAMESPACE environment variable
-This environment variable is used in later steps to deploy and perf-test the model.
+### 1. Environment Setup
+
+Create a Kubernetes namespace and set environment variable:
 
 ```bash
 export NAMESPACE=your-namespace
 kubectl create namespace ${NAMESPACE}
 ```
 
-2. **Dynamo Cloud Platform installed** - Follow [Quickstart Guide](../docs/kubernetes/README.md)
+### 2. Deploy Dynamo Platform
+
+Install the Dynamo Cloud Platform following the [Quickstart Guide](../docs/kubernetes/README.md).
+
+### 3. GPU Cluster
+
+Ensure your Kubernetes cluster has:
+- GPU nodes with appropriate GPU types (see model requirements above)
+- GPU operator installed
+- Sufficient GPU memory and compute resources
+
+### 4. Container Registry Access
 
-3. **Kubernetes cluster with GPU support**
+Ensure access to NVIDIA container registry for runtime images:
+- `nvcr.io/nvidia/ai-dynamo/vllm-runtime:x.y.z`
+- `nvcr.io/nvidia/ai-dynamo/trtllm-runtime:x.y.z`
+- `nvcr.io/nvidia/ai-dynamo/sglang-runtime:x.y.z`
 
-4. **Container registry access** for vLLM runtime images
+### 5. HuggingFace Access and Kubernetes Secret Creation
 
-5. **HuggingFace token secret** (referenced as `envFromSecret: hf-token-secret`)
-Update the `hf-token-secret.yaml` file with your HuggingFace token.
+Set up a kubernetes secret with the HuggingFace token for model download:
 
 ```bash
+# Update the token in the secret file
+vim hf_hub_secret/hf_hub_secret.yaml
+
+# Apply the secret
 kubectl apply -f hf_hub_secret/hf_hub_secret.yaml -n ${NAMESPACE}
 ```
 
-6. (Optional) Create a shared model cache pvc to store the model weights.
-Choose a storage class to create the model cache pvc. You'll need to use this storage class name to update the `storageClass` field in the model-cache/model-cache.yaml file.
+### 6. Configure Storage Class
+
+Configure persistent storage for model caching:
 
 ```bash
+# Check available storage classes
 kubectl get storageclass
 ```
 
-## Running the recipes
+Replace "your-storage-class-name" with your actual storage class in the file: `<model>/model-cache/model-cache.yaml`
+
+```yaml
+# In <model>/model-cache/model-cache.yaml
+spec:
+  storageClassName: "your-actual-storage-class"  # Replace this
+```
+
+## Option 1: Automated Deployment
 
-Run the recipe to deploy a model:
+Use the `run.sh` script for fully automated deployment:
+
+**Note:** The script automatically:
+- Create model cache PVC and downloads the model
+- Deploy the model service
+- Runs performance benchmark if a `perf.yaml` file is present in the deployment directory
+
+
+#### Script Usage
 
 ```bash
-./run.sh --model <model> --framework <framework> <deployment-type>
+./run.sh [OPTIONS] --model <model> --framework <framework> --deployment <deployment-type>
 ```
 
-Arguments:
-  <deployment-type>  Deployment type (e.g., agg, disagg-single-node, disagg-multi-node)
+**Required Options:**
+- `--model <model>`: Model name matching the directory name in the recipes directory (e.g., llama-3-70b, gpt-oss-120b, deepseek-r1)
+- `--framework <framework>`: Backend framework (`vllm`, `trtllm`, `sglang`)
+- `--deployment <deployment-type>`: Deployment mode (e.g., agg, disagg, disagg-single-node, disagg-multi-node)
+
+**Optional Options:**
+- `--namespace <namespace>`: Kubernetes namespace (default: dynamo)
+- `--dry-run`: Show commands without executing them
+- `-h, --help`: Show help message
+
+**Environment Variables:**
+- `NAMESPACE`: Kubernetes namespace (default: dynamo)
+
+#### Example Usage
+```bash
+# Set up environment
+export NAMESPACE=your-namespace
+kubectl create namespace ${NAMESPACE}
+# Configure HuggingFace token
+kubectl apply -f hf_hub_secret/hf_hub_secret.yaml -n ${NAMESPACE}
+
+# use run.sh script to deploy the model
+# Deploy Llama-3-70B with vLLM (aggregated mode)
+./run.sh --model llama-3-70b --framework vllm --deployment agg
+
+# Deploy GPT-OSS-120B with TensorRT-LLM
+./run.sh --model gpt-oss-120b --framework trtllm --deployment agg
+
+# Deploy DeepSeek-R1 with SGLang (disaggregated mode)
+./run.sh --model deepseek-r1 --framework sglang --deployment disagg
+
+# Deploy with custom namespace
+./run.sh --namespace my-namespace --model llama-3-70b --framework vllm --deployment agg
+
+# Dry run to see what would be executed
+./run.sh --dry-run --model llama-3-70b --framework vllm --deployment agg
+```
 
-Required Options:
-  --model <model>    Model name (e.g., llama-3-70b)
-  --framework <fw>   Framework one of VLLM TRTLLM SGLANG (default: VLLM)
 
-Optional:
-  --skip-model-cache Skip model downloading (assumes model cache already exists)
-  -h, --help         Show this help message
+## Option 2: Manual Deployment
 
-Environment Variables:
-  NAMESPACE          Kubernetes namespace (default: dynamo)
+For step-by-step manual deployment follow these steps :
 
-Examples:
-  ./run.sh --model llama-3-70b --framework vllm agg
-  ./run.sh --skip-model-cache --model llama-3-70b --framework vllm agg
-  ./run.sh --model llama-3-70b --framework trtllm disagg-single-node
-Example:
 ```bash
-./run.sh --model llama-3-70b --framework vllm --deployment-type agg
+# 0. Set up environment (see Prerequisites section)
+export NAMESPACE=your-namespace
+kubectl create namespace ${NAMESPACE}
+kubectl apply -f hf_hub_secret/hf_hub_secret.yaml -n ${NAMESPACE}
+
+# 1. Download model (see Model Download section)
+kubectl apply -n $NAMESPACE -f <model>/model-cache/
+
+# 2. Deploy model (see Deployment section)
+kubectl apply -n $NAMESPACE -f <model>/<framework>/<mode>/deploy.yaml
+
+# 3. Run benchmarks (optional, if perf.yaml exists)
+kubectl apply -n $NAMESPACE -f <model>/<framework>/<mode>/perf.yaml
+```
+
+### Step 1: Download Model
+
+```bash
+# Start the download job
+kubectl apply -n $NAMESPACE -f <model>/model-cache
+
+# Verify job creation
+kubectl get jobs -n $NAMESPACE | grep model-download
+```
+
+Monitor and wait for the model download to complete:
+
+```bash
+
+# Wait for job completion (timeout after 100 minutes)
+kubectl wait --for=condition=Complete job/model-download -n $NAMESPACE --timeout=6000s
+
+# Check job status
+kubectl get job model-download -n $NAMESPACE
+
+# View download logs
+kubectl logs job/model-download -n $NAMESPACE
+```
+
+### Step 2: Deploy Model Service
+
+```bash
+# Navigate to the specific deployment configuration
+cd <model>/<framework>/<deployment-mode>/
+
+# Deploy the model service
+kubectl apply -n $NAMESPACE -f deploy.yaml
+
+# Verify deployment creation
+kubectl get deployments -n $NAMESPACE
 ```
 
+#### Wait for Deployment Ready
 
-## Dry run mode
+```bash
+# Get deployment name from the deploy.yaml file
+DEPLOYMENT_NAME=$(grep "name:" deploy.yaml | head -1 | awk '{print $2}')
+
+# Wait for deployment to be ready (timeout after 10 minutes)
+kubectl wait --for=condition=available deployment/$DEPLOYMENT_NAME -n $NAMESPACE --timeout=1200s
+
+# Check deployment status
+kubectl get deployment $DEPLOYMENT_NAME -n $NAMESPACE
+
+# Check pod status
+kubectl get pods -n $NAMESPACE -l app=$DEPLOYMENT_NAME
+```
+
+#### Verify Model Service
+
+```bash
+# Check if service is running
+kubectl get services -n $NAMESPACE
+
+# Test model endpoint (port-forward to test locally)
+kubectl port-forward service/${DEPLOYMENT_NAME}-frontend 8000:8000 -n $NAMESPACE
+
+# Test the model API (in another terminal)
+curl http://localhost:8000/v1/models
 
-To dry run the recipe, add the `--dry-run` flag.
+# Stop port-forward when done
+pkill -f "kubectl port-forward"
+```
+
+### Step 3: Performance Benchmarking (Optional)
+
+Run performance benchmarks to evaluate model performance. Note that benchmarking is only available for models that include a `perf.yaml` file (optional):
+
+#### Launch Benchmark Job
 
 ```bash
-./run.sh --dry-run --model llama-3-70b --framework vllm agg
+# From the deployment directory
+kubectl apply -n $NAMESPACE -f perf.yaml
+
+# Verify benchmark job creation
+kubectl get jobs -n $NAMESPACE
 ```
 
-## (Optional) Running the recipes with model cache
-You may need to cache the model weights on a PVC to avoid repeated downloads of the model weights.
- See the [Prerequisites](#prerequisites) section for more details.
+#### Monitor Benchmark Progress
 
 ```bash
-./run.sh --model llama-3-70b --framework vllm --deployment-type agg --skip-model-cache
+# Get benchmark job name
+PERF_JOB_NAME=$(grep "name:" perf.yaml | head -1 | awk '{print $2}')
+
+# Monitor benchmark logs in real-time
+kubectl logs -f job/$PERF_JOB_NAME -n $NAMESPACE
+
+# Wait for benchmark completion (timeout after 100 minutes)
+kubectl wait --for=condition=Complete job/$PERF_JOB_NAME -n $NAMESPACE --timeout=6000s
 ```
+
+#### View Benchmark Results
+
+```bash
+# Check final benchmark results
+kubectl logs job/$PERF_JOB_NAME -n $NAMESPACE | tail -50
+```