Skip to content

Conversation

@hhzhang16
Copy link
Contributor

@hhzhang16 hhzhang16 commented Oct 24, 2025

Overview:

Renames and restructures profiling documentation to emphasize DynamoGraphDeploymentRequest (DGDR) as the recommended approach for SLA-driven profiling. Updates quickstart guide with comprehensive DGDR examples and workflows, and corrects all cross-references throughout the codebase.

Details:

  • Renamed pre_deployment_profiling.md to sla_driven_profiling.md with DGDR-focused content
  • expanded sla_planner_quickstart.md with DGDR usage patterns
  • Updated doc references to point to new profiling doc name

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • Documentation
    • Updated profiling guidance across documentation to emphasize SLA-driven approaches
    • Revamped SLA Planner Quick Start guide with new deployment workflow, updated configuration patterns, and practical examples
    • Introduced comprehensive SLA-driven profiling documentation covering configuration structures, recommended workflows, troubleshooting, and related resources
    • Updated references and links throughout to reflect new documentation organization

@hhzhang16 hhzhang16 requested review from a team as code owners October 24, 2025 02:22
@github-actions github-actions bot added the docs label Oct 24, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 24, 2025

Walkthrough

This PR updates documentation across the project to transition from "Pre-Deployment Profiling" guidance to "SLA-Driven Profiling" documentation, replacing a legacy benchmarking guide with new SLA-driven profiling documentation. A Python constant referencing profiling documentation is also updated. The quickstart guide is restructured to emphasize DynamoGraphDeploymentRequest (DGDR)-based workflows.

Changes

Cohort / File(s) Summary
README link updates
README.md, components/backends/trtllm/deploy/README.md, components/backends/vllm/deploy/README.md, deploy/utils/README.md, tests/planner/README.md
Links updated from pre-deployment profiling documentation to SLA-driven profiling documentation across multiple README files.
Planner documentation links
docs/planner/load_planner.md, docs/planner/planner_intro.rst, docs/planner/sla_planner.md
Navigation and reference links updated to point to SLA-driven profiling docs; toctree entry replaced in planner_intro.rst.
Python constant update
components/src/dynamo/planner/utils/perf_interpolation.py
Error message constant updated to reference SLA-driven profiling documentation path instead of pre-deployment profiling.
Benchmarks documentation replacement
docs/benchmarks/pre_deployment_profiling.md (deleted), docs/benchmarks/sla_driven_profiling.md (new)
Legacy pre-deployment profiling guide removed; comprehensive new SLA-driven profiling documentation with DGDR workflows, configuration examples, and troubleshooting added.
Quickstart guide restructuring
docs/planner/sla_planner_quickstart.md
Major content restructuring: shifted from SLA Planner autoscaling focus to DynamoGraphDeploymentRequest (DGDR)-centric workflow; updated introduction, flow diagram, sections, DGDR configuration guidance, and troubleshooting.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

The review requires attention to link consistency across multiple files and verification of the new SLA-driven profiling documentation accuracy and completeness. Most changes follow a consistent replacement pattern (low cognitive load), but the substantial new documentation file and major rewrite of the quickstart guide warrant careful review for clarity, technical accuracy, and alignment with intended workflows.

Poem

🐰 A hop through the docs, we dance with delight,
Old paths fade to history, new guidance shines bright,
DGDR leads onward where profiling meets SLA,
From Pre-Deploy to SLA-Driven—a fresher new way! ✨

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title "docs: Restructure profiling documentation around DGDR workflow" clearly and accurately summarizes the main changes in this pull request. The changeset demonstrates a comprehensive restructuring of profiling documentation to emphasize the DynamoGraphDeploymentRequest (DGDR) workflow, including renaming core documentation files, expanding quick-start guides, and updating cross-references throughout the codebase. The title is concise, specific, and directly reflects the primary objective of the PR without unnecessary verbosity or vagueness.
Description Check ✅ Passed The pull request description includes complete Overview and Details sections that clearly explain the purpose and scope of changes. The Overview effectively describes the goal of restructuring documentation around DGDR, and the Details section lists the specific modifications made including file renaming, content expansion, and reference updates. However, the "Where should the reviewer start?" section remains empty with only a comment placeholder, and the "Related Issues" section contains an unfilled placeholder (#xxx) rather than an actual issue reference, which weakens the guidance for reviewers and issue tracking.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (6)
deploy/utils/README.md (1)

122-122: Consider updating link text to match the new documentation name.

The link text still says "Pre-Deployment Profiling" but points to sla_driven_profiling.md. For consistency with the broader terminology shift in this PR, consider updating the link text to "SLA-Driven Profiling".

Apply this diff:

-- **Pre-Deployment Profiling**: See [docs/benchmarks/sla_driven_profiling.md](../../docs/benchmarks/sla_driven_profiling.md) for optimizing configurations before deployment
+- **SLA-Driven Profiling**: See [docs/benchmarks/sla_driven_profiling.md](../../docs/benchmarks/sla_driven_profiling.md) for optimizing configurations before deployment
tests/planner/README.md (1)

26-26: Consider updating terminology for consistency.

The text says "pre-deployment profiling" but links to sla_driven_profiling.md. To maintain consistency with the documentation restructuring in this PR, consider updating the terminology.

Apply this diff:

-1. Run pre-deployment profiling for your specific setup. See the [pre-deployment profiling documentation](../../docs/benchmarks/sla_driven_profiling.md) for detailed instructions.
+1. Run SLA-driven profiling for your specific setup. See the [SLA-driven profiling documentation](../../docs/benchmarks/sla_driven_profiling.md) for detailed instructions.
components/backends/vllm/deploy/README.md (1)

100-102: Consider updating section header and link text to match new terminology.

The section is titled "Pre-Deployment Profiling (SLA Planner Only)" and the link text says "pre-deployment profiling guide", but both now point to sla_driven_profiling.md. For consistency with the documentation restructuring, consider updating the section header and link text to use "SLA-Driven Profiling" terminology.

Apply this diff:

-### Pre-Deployment Profiling (SLA Planner Only)
+### SLA-Driven Profiling (SLA Planner Only)

-If using the SLA Planner deployment (`disagg_planner.yaml`), follow the [pre-deployment profiling guide](../../../../docs/benchmarks/sla_driven_profiling.md) to run pre-deployment profiling. The results will be saved to the `dynamo-pvc` PVC and queried by the SLA Planner.
+If using the SLA Planner deployment (`disagg_planner.yaml`), follow the [SLA-driven profiling guide](../../../../docs/benchmarks/sla_driven_profiling.md) to run SLA-driven profiling. The results will be saved to the `dynamo-pvc` PVC and queried by the SLA Planner.
components/backends/trtllm/deploy/README.md (1)

55-56: Consider updating link text to match the new documentation name.

The link text says "Pre-Deployment Profiling" but points to sla_driven_profiling.md. For consistency with the documentation restructuring, consider updating to "SLA-Driven Profiling".

Apply this diff:

-> This deployment requires pre-deployment profiling to be completed first. See [Pre-Deployment Profiling](../../../../docs/benchmarks/sla_driven_profiling.md) for detailed instructions.
+> This deployment requires SLA-driven profiling to be completed first. See [SLA-Driven Profiling](../../../../docs/benchmarks/sla_driven_profiling.md) for detailed instructions.
docs/planner/sla_planner.md (1)

50-54: Update link text to match new terminology.

Line 54's link text still references "Pre-Deployment Profiling," which conflicts with the document rebranding. Update it to align with the section header and new file name.

-See [Pre-Deployment Profiling](../benchmarks/sla_driven_profiling.md) for detailed instructions on running the profiling process.
+See [SLA-Driven Profiling](../benchmarks/sla_driven_profiling.md) for detailed instructions on running the profiling process.
docs/planner/sla_planner_quickstart.md (1)

444-444: Update link text to reflect new SLA-driven terminology.

Line 444 uses outdated terminology ("Pre-Deployment Profiling Details") for a link now pointing to the rebranded SLA-Driven Profiling documentation.

-    - [Pre-Deployment Profiling Details](/docs/benchmarks/sla_driven_profiling.md)
+    - [SLA-Driven Profiling Details](/docs/benchmarks/sla_driven_profiling.md)
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5f98e91 and e83dcc4.

📒 Files selected for processing (12)
  • README.md (1 hunks)
  • components/backends/trtllm/deploy/README.md (1 hunks)
  • components/backends/vllm/deploy/README.md (1 hunks)
  • components/src/dynamo/planner/utils/perf_interpolation.py (1 hunks)
  • deploy/utils/README.md (1 hunks)
  • docs/benchmarks/pre_deployment_profiling.md (0 hunks)
  • docs/benchmarks/sla_driven_profiling.md (1 hunks)
  • docs/planner/load_planner.md (1 hunks)
  • docs/planner/planner_intro.rst (1 hunks)
  • docs/planner/sla_planner.md (2 hunks)
  • docs/planner/sla_planner_quickstart.md (1 hunks)
  • tests/planner/README.md (1 hunks)
💤 Files with no reviewable changes (1)
  • docs/benchmarks/pre_deployment_profiling.md
🧰 Additional context used
🪛 GitHub Actions: Generate Documentation
docs/benchmarks/sla_driven_profiling.md

[warning] 172-172: image file not readable: docs/images/h100_prefill_performance.png


[warning] 172-172: image file not readable: docs/images/h100_decode_performance.png


[warning] 192-192: image file not readable: docs/images/itl_interpolation.png

🪛 markdownlint-cli2 (0.18.1)
docs/benchmarks/sla_driven_profiling.md

152-152: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


473-473: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: vllm (arm64)
  • GitHub Check: operator (amd64)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (13)
components/src/dynamo/planner/utils/perf_interpolation.py (1)

29-33: LGTM! Error message updated correctly.

The documentation reference in the error message has been properly updated to point to the new SLA-driven profiling guide, ensuring users are directed to the correct documentation when profiling data is missing.

docs/planner/load_planner.md (1)

27-27: LGTM! Link updated correctly with consistent terminology.

The reference has been properly updated to point to the SLA-Driven Profiling documentation with matching link text.

docs/planner/planner_intro.rst (1)

81-81: LGTM! Toctree entry updated correctly.

The toctree entry has been properly updated to reference the new SLA-Driven Profiling documentation with matching entry text.

README.md (1)

182-182: LGTM! Benchmarking section updated appropriately.

The link has been updated to point to the SLA Planner Quickstart guide, which aligns with the PR's objective to emphasize the DGDR-based workflow. The text change to "SLA-Driven Dynamo Deployments" appropriately reflects the broader scope of the quickstart guide compared to just profiling documentation.

docs/benchmarks/sla_driven_profiling.md (4)

1-37: Excellent introduction and overview.

The opening clearly establishes DGDR as the recommended approach with compelling "Key Benefits" and a well-structured TIP callout. The comprehensive coverage of online vs. offline profiling methods sets clear expectations.


194-351: Well-structured configuration reference.

The DGDR Configuration Reference section is thorough, with clear subsection headings, detailed explanations, and practical examples (online, offline with AIC, and MoE). The inline comments in YAML examples help clarify intent.


442-530: Comprehensive troubleshooting and next steps.

The Troubleshooting section covers realistic failure modes (profiling duration, SLA constraints, AIC limitations, image pull errors, OOM) with actionable solutions. The "AI Configurator: Attention Head Constraint Error" subsection is particularly helpful with specific model examples and the formula for calculating max TP.


164-164: All referenced image files exist and markdown references are correct. No action needed.

The image files are present in the repository:

  • docs/images/h100_prefill_performance.png
  • docs/images/h100_decode_performance.png
  • docs/images/itl_interpolation.png

Line 164 is a text list item, not an image reference. Lines 174 and 192 contain valid markdown image references that correctly point to existing files.

Likely an incorrect or invalid review comment.

docs/planner/sla_planner_quickstart.md (5)

1-110: Excellent DGDR-centric restructuring with clear workflows.

The rewrite successfully shifts focus from manual profiling scripts to the declarative DGDR model. The Overview section (lines 8–32) with the mermaid diagram effectively communicates the three-step lifecycle. The "What is a DGDR?" section (lines 34–53) provides necessary conceptual grounding for users new to the operator model. Highly improved over the previous version.


55-109: Prerequisites and operator setup instructions are clear and actionable.

The "Prerequisites" section correctly identifies kube-prometheus-stack and image pull secrets. The "Configure Profiler Image" subsection (lines 64–108) provides three scenarios (development, production 0.6.1+, and Helm upgrade) with concrete examples. The verification command at line 107 helps users confirm configuration.


123-149: DGDR Quick Start example is well-commented and easy to adapt.

The inline YAML example (lines 123–145) includes helpful comments ("Change the name," "Update to your model," "Adjust to your workload") that guide users through customization. The TIP at line 147–148 correctly directs advanced users to the comprehensive reference.


201-307: Advanced configuration section balances accessibility with depth.

The "DGDR Configuration Details" section logically progresses from required fields (lines 203–209) through profiling methods (lines 229–245), GPU discovery (lines 250–260), and advanced patterns. The "Using Existing DGD Configs" subsection (lines 264–308) with ConfigMap instructions correctly emphasizes this as the recommended pattern for custom setups. The inline comment at line 308 clarifies the profiler's role as a "base template optimizer"—this is important conceptual guidance.


416-427: Troubleshooting table and reference to comprehensive guide are helpful.

The Quick Diagnostics section (lines 405–414) provides kubectl commands for immediate investigation. The issue table (lines 418–424) covers common failure modes. The NOTE at lines 426–427 appropriately points to the full troubleshooting guide for deeper issues.

4. **Generated DGD**: Complete deployment manifest

**Example Recommendations:**
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add language specifiers to fenced code blocks.

Markdown linting requires all fenced code blocks to specify a language. This improves rendering and enables syntax highlighting.

-```
+```yaml
 profilingConfig:

Also at line 473:

-```
+```
 AssertionError: num_heads <N> should be divisible by tp_size <M> and the division result should be >= 4

The error output block at line 473 should either specify a language (e.g., ```text) or be formatted as inline code if it's meant as a literal error message example.

Also applies to: 473-473

🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

152-152: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
In docs/benchmarks/sla_driven_profiling.md around line 152 and line 473, the
fenced code blocks lack language specifiers; update the opening fences so the
YAML block becomes ```yaml (so the profilingConfig block is highlighted as YAML)
and change the error output block to a text-specifier like ```text (or convert
that single-line error to inline code using backticks) to satisfy markdown
linting and enable proper rendering.

Comment on lines +241 to +244
aic:
system: h200_sxm
model_name: QWEN3_32B
backend_version: "0.20.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify YAML field naming consistency for AI Configurator configuration.

The DGDR configuration examples show inconsistent field names for AI Configurator settings:

  • Lines 241-244 use aic: with nested fields:

    aic:
      system: h200_sxm
      model_name: QWEN3_32B
      backend_version: "0.20.0"
  • Lines 300-303 in the referenced sla_driven_profiling.md show the same structure

However, earlier in sla_driven_profiling.md around lines 84-88, the structure appears identical. Cross-check that all DGDR example configurations in both files use consistent field names and nesting to avoid deployment failures.

Also applies to: 300-303, 405-408


🌐 Web query:

Verify the DynamoGraphDeploymentRequest (DGDR) API schema for AI Configurator field naming: Should AIC settings use nested `aic: { system: ..., model_name: ... }` or flat `aic_system: ..., aic_model_name: ...`?

💡 Result:

Short answer: use the flat fields (aic_system, aic_model_name) — that matches existing Dynamo tooling and CLI flags.

Why: the Dynamo pre‑deployment / profiler tooling and examples use explicit flat flags --aic-system and --aic-model-name (not a nested aic object), so the deployed/consumed configuration is expected in that flat form. (docs.nvidia.com)

If you need, I can: (a) inspect your DGDR CRD source/spec and show exactly which field names it accepts, or (b) produce a one‑line migration mapping (aic:{system,model_name} → aic_system, aic_model_name) for your config tooling.

Citations:


I'll verify the actual documentation files to confirm the format issues at the specified lines.


🏁 Script executed:

#!/bin/bash
# Check the actual content at the specified lines in both documentation files

echo "=== sla_planner_quickstart.md lines 241-244 ==="
sed -n '241,244p' docs/planner/sla_planner_quickstart.md 2>/dev/null || echo "File not found"

echo -e "\n=== sla_planner_quickstart.md lines 300-303 ==="
sed -n '300,303p' docs/planner/sla_planner_quickstart.md 2>/dev/null || echo "File not found"

echo -e "\n=== sla_planner_quickstart.md lines 405-408 ==="
sed -n '405,408p' docs/planner/sla_planner_quickstart.md 2>/dev/null || echo "File not found"

echo -e "\n=== sla_driven_profiling.md lines 84-88 (referenced earlier section) ==="
sed -n '84,88p' docs/planner/sla_driven_profiling.md 2>/dev/null || echo "File not found"

echo -e "\n=== sla_driven_profiling.md lines 300-303 ==="
sed -n '300,303p' docs/planner/sla_driven_profiling.md 2>/dev/null || echo "File not found"

# Search for all instances of nested aic: pattern
echo -e "\n=== Searching for nested 'aic:' pattern in documentation ==="
rg -n "aic:\s*$" docs/planner/ --type markdown -A 5 || echo "No nested aic: patterns found"

# Search for flat format patterns
echo -e "\n=== Searching for flat format 'aic_' in documentation ==="
rg -n "aic_system|aic_model_name" docs/planner/ --type markdown || echo "No flat format found"

Length of output: 1421


🏁 Script executed:

#!/bin/bash
# Check the actual content at the specified lines in both documentation files

echo "=== sla_planner_quickstart.md lines 241-244 ==="
sed -n '241,244p' docs/planner/sla_planner_quickstart.md 2>/dev/null || echo "File not found"

echo -e "\n=== sla_planner_quickstart.md lines 300-303 ==="
sed -n '300,303p' docs/planner/sla_planner_quickstart.md 2>/dev/null || echo "File not found"

echo -e "\n=== sla_planner_quickstart.md lines 405-408 ==="
sed -n '405,408p' docs/planner/sla_planner_quickstart.md 2>/dev/null || echo "File not found"

echo -e "\n=== sla_driven_profiling.md lines 84-88 (referenced earlier section) ==="
sed -n '84,88p' docs/planner/sla_driven_profiling.md 2>/dev/null || echo "File not found"

echo -e "\n=== sla_driven_profiling.md lines 300-303 ==="
sed -n '300,303p' docs/planner/sla_driven_profiling.md 2>/dev/null || echo "File not found"

# Search for all instances of nested aic: pattern
echo -e "\n=== Searching for nested 'aic:' pattern in documentation ==="
rg -n "aic:\s*$" docs/planner/ --type markdown -A 5 || echo "No nested aic: patterns found"

# Search for flat format patterns
echo -e "\n=== Searching for flat format 'aic_' in documentation ==="
rg -n "aic_system|aic_model_name" docs/planner/ --type markdown || echo "No flat format found"

Length of output: 1421


Correct AIC field format in sla_planner_quickstart.md — use flat fields instead of nested.

The documentation contains incorrect nested AIC configuration. The DGDR API schema requires flat fields (aic_system, aic_model_name), not the nested structure shown in the examples.

Lines 241-244 and 300-303 in sla_planner_quickstart.md currently show:

aic:
  system: h200_sxm
  model_name: QWEN3_32B
  backend_version: "0.20.0"

Should be converted to flat fields:

aic_system: h200_sxm
aic_model_name: QWEN3_32B
aic_backend_version: "0.20.0"

(Note: The reference to lines 405-408 and sla_driven_profiling.md in the original comment appears incorrect and should be disregarded.)

🤖 Prompt for AI Agents
In docs/planner/sla_planner_quickstart.md around lines 241-244 (and also update
the similar occurrence at lines 300-303), the AIC YAML is using a nested "aic:"
block; replace that nested structure with flat fields by removing the "aic:" key
and adding aic_system, aic_model_name, and aic_backend_version keys with the
same values (e.g., aic_system: h200_sxm, aic_model_name: QWEN3_32B,
aic_backend_version: "0.20.0") so the file conforms to the DGDR API schema.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant