-
Notifications
You must be signed in to change notification settings - Fork 655
docs: Restructure profiling documentation around DGDR workflow #3867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
docs: Restructure profiling documentation around DGDR workflow #3867
Conversation
Signed-off-by: Hannah Zhang <[email protected]>
Signed-off-by: Hannah Zhang <[email protected]>
Signed-off-by: Hannah Zhang <[email protected]>
WalkthroughThis PR updates documentation across the project to transition from "Pre-Deployment Profiling" guidance to "SLA-Driven Profiling" documentation, replacing a legacy benchmarking guide with new SLA-driven profiling documentation. A Python constant referencing profiling documentation is also updated. The quickstart guide is restructured to emphasize DynamoGraphDeploymentRequest (DGDR)-based workflows. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes The review requires attention to link consistency across multiple files and verification of the new SLA-driven profiling documentation accuracy and completeness. Most changes follow a consistent replacement pattern (low cognitive load), but the substantial new documentation file and major rewrite of the quickstart guide warrant careful review for clarity, technical accuracy, and alignment with intended workflows. Poem
Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (6)
deploy/utils/README.md (1)
122-122: Consider updating link text to match the new documentation name.The link text still says "Pre-Deployment Profiling" but points to
sla_driven_profiling.md. For consistency with the broader terminology shift in this PR, consider updating the link text to "SLA-Driven Profiling".Apply this diff:
-- **Pre-Deployment Profiling**: See [docs/benchmarks/sla_driven_profiling.md](../../docs/benchmarks/sla_driven_profiling.md) for optimizing configurations before deployment +- **SLA-Driven Profiling**: See [docs/benchmarks/sla_driven_profiling.md](../../docs/benchmarks/sla_driven_profiling.md) for optimizing configurations before deploymenttests/planner/README.md (1)
26-26: Consider updating terminology for consistency.The text says "pre-deployment profiling" but links to
sla_driven_profiling.md. To maintain consistency with the documentation restructuring in this PR, consider updating the terminology.Apply this diff:
-1. Run pre-deployment profiling for your specific setup. See the [pre-deployment profiling documentation](../../docs/benchmarks/sla_driven_profiling.md) for detailed instructions. +1. Run SLA-driven profiling for your specific setup. See the [SLA-driven profiling documentation](../../docs/benchmarks/sla_driven_profiling.md) for detailed instructions.components/backends/vllm/deploy/README.md (1)
100-102: Consider updating section header and link text to match new terminology.The section is titled "Pre-Deployment Profiling (SLA Planner Only)" and the link text says "pre-deployment profiling guide", but both now point to
sla_driven_profiling.md. For consistency with the documentation restructuring, consider updating the section header and link text to use "SLA-Driven Profiling" terminology.Apply this diff:
-### Pre-Deployment Profiling (SLA Planner Only) +### SLA-Driven Profiling (SLA Planner Only) -If using the SLA Planner deployment (`disagg_planner.yaml`), follow the [pre-deployment profiling guide](../../../../docs/benchmarks/sla_driven_profiling.md) to run pre-deployment profiling. The results will be saved to the `dynamo-pvc` PVC and queried by the SLA Planner. +If using the SLA Planner deployment (`disagg_planner.yaml`), follow the [SLA-driven profiling guide](../../../../docs/benchmarks/sla_driven_profiling.md) to run SLA-driven profiling. The results will be saved to the `dynamo-pvc` PVC and queried by the SLA Planner.components/backends/trtllm/deploy/README.md (1)
55-56: Consider updating link text to match the new documentation name.The link text says "Pre-Deployment Profiling" but points to
sla_driven_profiling.md. For consistency with the documentation restructuring, consider updating to "SLA-Driven Profiling".Apply this diff:
-> This deployment requires pre-deployment profiling to be completed first. See [Pre-Deployment Profiling](../../../../docs/benchmarks/sla_driven_profiling.md) for detailed instructions. +> This deployment requires SLA-driven profiling to be completed first. See [SLA-Driven Profiling](../../../../docs/benchmarks/sla_driven_profiling.md) for detailed instructions.docs/planner/sla_planner.md (1)
50-54: Update link text to match new terminology.Line 54's link text still references "Pre-Deployment Profiling," which conflicts with the document rebranding. Update it to align with the section header and new file name.
-See [Pre-Deployment Profiling](../benchmarks/sla_driven_profiling.md) for detailed instructions on running the profiling process. +See [SLA-Driven Profiling](../benchmarks/sla_driven_profiling.md) for detailed instructions on running the profiling process.docs/planner/sla_planner_quickstart.md (1)
444-444: Update link text to reflect new SLA-driven terminology.Line 444 uses outdated terminology ("Pre-Deployment Profiling Details") for a link now pointing to the rebranded SLA-Driven Profiling documentation.
- - [Pre-Deployment Profiling Details](/docs/benchmarks/sla_driven_profiling.md) + - [SLA-Driven Profiling Details](/docs/benchmarks/sla_driven_profiling.md)
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
README.md(1 hunks)components/backends/trtllm/deploy/README.md(1 hunks)components/backends/vllm/deploy/README.md(1 hunks)components/src/dynamo/planner/utils/perf_interpolation.py(1 hunks)deploy/utils/README.md(1 hunks)docs/benchmarks/pre_deployment_profiling.md(0 hunks)docs/benchmarks/sla_driven_profiling.md(1 hunks)docs/planner/load_planner.md(1 hunks)docs/planner/planner_intro.rst(1 hunks)docs/planner/sla_planner.md(2 hunks)docs/planner/sla_planner_quickstart.md(1 hunks)tests/planner/README.md(1 hunks)
💤 Files with no reviewable changes (1)
- docs/benchmarks/pre_deployment_profiling.md
🧰 Additional context used
🪛 GitHub Actions: Generate Documentation
docs/benchmarks/sla_driven_profiling.md
[warning] 172-172: image file not readable: docs/images/h100_prefill_performance.png
[warning] 172-172: image file not readable: docs/images/h100_decode_performance.png
[warning] 192-192: image file not readable: docs/images/itl_interpolation.png
🪛 markdownlint-cli2 (0.18.1)
docs/benchmarks/sla_driven_profiling.md
152-152: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
473-473: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: vllm (arm64)
- GitHub Check: operator (amd64)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (13)
components/src/dynamo/planner/utils/perf_interpolation.py (1)
29-33: LGTM! Error message updated correctly.The documentation reference in the error message has been properly updated to point to the new SLA-driven profiling guide, ensuring users are directed to the correct documentation when profiling data is missing.
docs/planner/load_planner.md (1)
27-27: LGTM! Link updated correctly with consistent terminology.The reference has been properly updated to point to the SLA-Driven Profiling documentation with matching link text.
docs/planner/planner_intro.rst (1)
81-81: LGTM! Toctree entry updated correctly.The toctree entry has been properly updated to reference the new SLA-Driven Profiling documentation with matching entry text.
README.md (1)
182-182: LGTM! Benchmarking section updated appropriately.The link has been updated to point to the SLA Planner Quickstart guide, which aligns with the PR's objective to emphasize the DGDR-based workflow. The text change to "SLA-Driven Dynamo Deployments" appropriately reflects the broader scope of the quickstart guide compared to just profiling documentation.
docs/benchmarks/sla_driven_profiling.md (4)
1-37: Excellent introduction and overview.The opening clearly establishes DGDR as the recommended approach with compelling "Key Benefits" and a well-structured TIP callout. The comprehensive coverage of online vs. offline profiling methods sets clear expectations.
194-351: Well-structured configuration reference.The DGDR Configuration Reference section is thorough, with clear subsection headings, detailed explanations, and practical examples (online, offline with AIC, and MoE). The inline comments in YAML examples help clarify intent.
442-530: Comprehensive troubleshooting and next steps.The Troubleshooting section covers realistic failure modes (profiling duration, SLA constraints, AIC limitations, image pull errors, OOM) with actionable solutions. The "AI Configurator: Attention Head Constraint Error" subsection is particularly helpful with specific model examples and the formula for calculating max TP.
164-164: All referenced image files exist and markdown references are correct. No action needed.The image files are present in the repository:
docs/images/h100_prefill_performance.png✓docs/images/h100_decode_performance.png✓docs/images/itl_interpolation.png✓Line 164 is a text list item, not an image reference. Lines 174 and 192 contain valid markdown image references that correctly point to existing files.
Likely an incorrect or invalid review comment.
docs/planner/sla_planner_quickstart.md (5)
1-110: Excellent DGDR-centric restructuring with clear workflows.The rewrite successfully shifts focus from manual profiling scripts to the declarative DGDR model. The Overview section (lines 8–32) with the mermaid diagram effectively communicates the three-step lifecycle. The "What is a DGDR?" section (lines 34–53) provides necessary conceptual grounding for users new to the operator model. Highly improved over the previous version.
55-109: Prerequisites and operator setup instructions are clear and actionable.The "Prerequisites" section correctly identifies kube-prometheus-stack and image pull secrets. The "Configure Profiler Image" subsection (lines 64–108) provides three scenarios (development, production 0.6.1+, and Helm upgrade) with concrete examples. The verification command at line 107 helps users confirm configuration.
123-149: DGDR Quick Start example is well-commented and easy to adapt.The inline YAML example (lines 123–145) includes helpful comments ("Change the name," "Update to your model," "Adjust to your workload") that guide users through customization. The TIP at line 147–148 correctly directs advanced users to the comprehensive reference.
201-307: Advanced configuration section balances accessibility with depth.The "DGDR Configuration Details" section logically progresses from required fields (lines 203–209) through profiling methods (lines 229–245), GPU discovery (lines 250–260), and advanced patterns. The "Using Existing DGD Configs" subsection (lines 264–308) with ConfigMap instructions correctly emphasizes this as the recommended pattern for custom setups. The inline comment at line 308 clarifies the profiler's role as a "base template optimizer"—this is important conceptual guidance.
416-427: Troubleshooting table and reference to comprehensive guide are helpful.The Quick Diagnostics section (lines 405–414) provides kubectl commands for immediate investigation. The issue table (lines 418–424) covers common failure modes. The NOTE at lines 426–427 appropriately points to the full troubleshooting guide for deeper issues.
| 4. **Generated DGD**: Complete deployment manifest | ||
|
|
||
| **Example Recommendations:** | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add language specifiers to fenced code blocks.
Markdown linting requires all fenced code blocks to specify a language. This improves rendering and enables syntax highlighting.
-```
+```yaml
profilingConfig:Also at line 473:
-```
+```
AssertionError: num_heads <N> should be divisible by tp_size <M> and the division result should be >= 4The error output block at line 473 should either specify a language (e.g., ```text) or be formatted as inline code if it's meant as a literal error message example.
Also applies to: 473-473
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
152-152: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
In docs/benchmarks/sla_driven_profiling.md around line 152 and line 473, the
fenced code blocks lack language specifiers; update the opening fences so the
YAML block becomes ```yaml (so the profilingConfig block is highlighted as YAML)
and change the error output block to a text-specifier like ```text (or convert
that single-line error to inline code using backticks) to satisfy markdown
linting and enable proper rendering.
| aic: | ||
| system: h200_sxm | ||
| model_name: QWEN3_32B | ||
| backend_version: "0.20.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify YAML field naming consistency for AI Configurator configuration.
The DGDR configuration examples show inconsistent field names for AI Configurator settings:
-
Lines 241-244 use
aic:with nested fields:aic: system: h200_sxm model_name: QWEN3_32B backend_version: "0.20.0"
-
Lines 300-303 in the referenced sla_driven_profiling.md show the same structure
However, earlier in sla_driven_profiling.md around lines 84-88, the structure appears identical. Cross-check that all DGDR example configurations in both files use consistent field names and nesting to avoid deployment failures.
Also applies to: 300-303, 405-408
🌐 Web query:
Verify the DynamoGraphDeploymentRequest (DGDR) API schema for AI Configurator field naming: Should AIC settings use nested `aic: { system: ..., model_name: ... }` or flat `aic_system: ..., aic_model_name: ...`?
💡 Result:
Short answer: use the flat fields (aic_system, aic_model_name) — that matches existing Dynamo tooling and CLI flags.
Why: the Dynamo pre‑deployment / profiler tooling and examples use explicit flat flags --aic-system and --aic-model-name (not a nested aic object), so the deployed/consumed configuration is expected in that flat form. (docs.nvidia.com)
If you need, I can: (a) inspect your DGDR CRD source/spec and show exactly which field names it accepts, or (b) produce a one‑line migration mapping (aic:{system,model_name} → aic_system, aic_model_name) for your config tooling.
Citations:
I'll verify the actual documentation files to confirm the format issues at the specified lines.
🏁 Script executed:
#!/bin/bash
# Check the actual content at the specified lines in both documentation files
echo "=== sla_planner_quickstart.md lines 241-244 ==="
sed -n '241,244p' docs/planner/sla_planner_quickstart.md 2>/dev/null || echo "File not found"
echo -e "\n=== sla_planner_quickstart.md lines 300-303 ==="
sed -n '300,303p' docs/planner/sla_planner_quickstart.md 2>/dev/null || echo "File not found"
echo -e "\n=== sla_planner_quickstart.md lines 405-408 ==="
sed -n '405,408p' docs/planner/sla_planner_quickstart.md 2>/dev/null || echo "File not found"
echo -e "\n=== sla_driven_profiling.md lines 84-88 (referenced earlier section) ==="
sed -n '84,88p' docs/planner/sla_driven_profiling.md 2>/dev/null || echo "File not found"
echo -e "\n=== sla_driven_profiling.md lines 300-303 ==="
sed -n '300,303p' docs/planner/sla_driven_profiling.md 2>/dev/null || echo "File not found"
# Search for all instances of nested aic: pattern
echo -e "\n=== Searching for nested 'aic:' pattern in documentation ==="
rg -n "aic:\s*$" docs/planner/ --type markdown -A 5 || echo "No nested aic: patterns found"
# Search for flat format patterns
echo -e "\n=== Searching for flat format 'aic_' in documentation ==="
rg -n "aic_system|aic_model_name" docs/planner/ --type markdown || echo "No flat format found"Length of output: 1421
🏁 Script executed:
#!/bin/bash
# Check the actual content at the specified lines in both documentation files
echo "=== sla_planner_quickstart.md lines 241-244 ==="
sed -n '241,244p' docs/planner/sla_planner_quickstart.md 2>/dev/null || echo "File not found"
echo -e "\n=== sla_planner_quickstart.md lines 300-303 ==="
sed -n '300,303p' docs/planner/sla_planner_quickstart.md 2>/dev/null || echo "File not found"
echo -e "\n=== sla_planner_quickstart.md lines 405-408 ==="
sed -n '405,408p' docs/planner/sla_planner_quickstart.md 2>/dev/null || echo "File not found"
echo -e "\n=== sla_driven_profiling.md lines 84-88 (referenced earlier section) ==="
sed -n '84,88p' docs/planner/sla_driven_profiling.md 2>/dev/null || echo "File not found"
echo -e "\n=== sla_driven_profiling.md lines 300-303 ==="
sed -n '300,303p' docs/planner/sla_driven_profiling.md 2>/dev/null || echo "File not found"
# Search for all instances of nested aic: pattern
echo -e "\n=== Searching for nested 'aic:' pattern in documentation ==="
rg -n "aic:\s*$" docs/planner/ --type markdown -A 5 || echo "No nested aic: patterns found"
# Search for flat format patterns
echo -e "\n=== Searching for flat format 'aic_' in documentation ==="
rg -n "aic_system|aic_model_name" docs/planner/ --type markdown || echo "No flat format found"Length of output: 1421
Correct AIC field format in sla_planner_quickstart.md — use flat fields instead of nested.
The documentation contains incorrect nested AIC configuration. The DGDR API schema requires flat fields (aic_system, aic_model_name), not the nested structure shown in the examples.
Lines 241-244 and 300-303 in sla_planner_quickstart.md currently show:
aic:
system: h200_sxm
model_name: QWEN3_32B
backend_version: "0.20.0"Should be converted to flat fields:
aic_system: h200_sxm
aic_model_name: QWEN3_32B
aic_backend_version: "0.20.0"(Note: The reference to lines 405-408 and sla_driven_profiling.md in the original comment appears incorrect and should be disregarded.)
🤖 Prompt for AI Agents
In docs/planner/sla_planner_quickstart.md around lines 241-244 (and also update
the similar occurrence at lines 300-303), the AIC YAML is using a nested "aic:"
block; replace that nested structure with flat fields by removing the "aic:" key
and adding aic_system, aic_model_name, and aic_backend_version keys with the
same values (e.g., aic_system: h200_sxm, aic_model_name: QWEN3_32B,
aic_backend_version: "0.20.0") so the file conforms to the DGDR API schema.
Overview:
Renames and restructures profiling documentation to emphasize DynamoGraphDeploymentRequest (DGDR) as the recommended approach for SLA-driven profiling. Updates quickstart guide with comprehensive DGDR examples and workflows, and corrects all cross-references throughout the codebase.
Details:
pre_deployment_profiling.mdtosla_driven_profiling.mdwith DGDR-focused contentsla_planner_quickstart.mdwith DGDR usage patternsWhere should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit