Skip to content

Conversation

@ishandhanani
Copy link
Contributor

@ishandhanani ishandhanani commented Oct 24, 2025

Summary by CodeRabbit

  • New Features

    • Added a new GPU job configuration script supporting distributed workload processing with prefill and decode modes.
  • Chores

    • Updated job orchestration scripts with refined resource allocation parameters and environment configurations.
    • Enhanced cache and memory management for improved system stability.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 25, 2025

Walkthrough

This PR introduces a new Bash script (gb200-fp4-alt.sh) for orchestrating prefill and decode workflows in disaggregated inference, and modifies the existing gb200-fp4.sh script to adjust cache configuration, environment variables, and performance tuning parameters for GB200 GPU clusters running SLURM jobs.

Changes

Cohort / File(s) Summary
GB200 FP4 SLURM scripts
components/backends/sglang/slurm_jobs/scripts/gb200-fp4-alt.sh, components/backends/sglang/slurm_jobs/scripts/gb200-fp4.sh
New script added with two-mode workflow (prefill/decode) for disaggregated inference. Existing script modified: environment variables restructured, cache/workspace directories configured, TORCH_DISTRIBUTED_DEFAULT_TIMEOUT set consistently, token dispatch limits adjusted (1408→384), EP-size parameters added, and nvidia-cutlass-dsl pre-release installed for both modes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Parameter value changes (token limits, dispatch thresholds) and their performance impact
  • Environment variable configuration consistency across prefill/decode modes
  • Conditional logic handling for USE_INIT_LOCATIONS and command suffix construction
  • nvidia-cutlass-dsl installation version correctness and relevance
  • Verification that mode-specific flag configurations are appropriate for each workflow

Poem

🐰 Hop! Hop! New scripts script-hop in place,
GB200s racing through their scheduled race,
Prefill, decode—each mode tuned just right,
Cache paths set and timeouts aligned tight!
Disaggregated flows now burning so bright,
A well-coordinated cluster takes flight! ✨

Pre-merge checks

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The pull request has no description provided by the author. The repository template requires several key sections: an Overview describing the pull request, Details about the changes made, a section identifying where reviewers should start, and Related Issues information. Since the description is completely absent, none of these required sections are present, leaving reviewers without context about the purpose, scope, or impact of the changes, particularly important given the medium-to-high complexity of the modifications to two shell scripts. Add a complete pull request description following the provided template. Include an overview explaining the purpose of adding the new gb200-fp4-alt.sh script and the rationale for modifying gb200-fp4.sh, provide details on the key changes (such as cache configuration, parameter adjustments, and cutlass-dsl updates), specify which files reviewers should focus on, and note any related issues or GitHub issue references.
Title Check ❓ Inconclusive The title "feat: more gb200 fp4 work" refers to the actual subject matter of the changes (GB200 FP4 configurations), making it related to the changeset. However, the phrasing is vague and generic—"more ... work" does not convey what specifically was accomplished. A reviewer scanning commit history would not understand whether this adds new scripts, modifies existing ones, or implements specific optimizations without examining the details. The title lacks sufficient specificity to clearly summarize the primary change from the developer's perspective. Consider revising the title to be more descriptive and specific. For example, something like "feat: add gb200-fp4-alt script and optimize gb200-fp4 configuration" would better convey that a new script was added and existing configurations were improved, making the intent clearer to reviewers scanning the history.
✅ Passed checks (1 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
components/backends/sglang/slurm_jobs/scripts/gb200-fp4-alt.sh (1)

1-208: High degree of code duplication with gb200-fp4.sh.

Both scripts implement nearly identical argument validation, environment setup, and command construction logic. While the different configuration parameters (offload settings, GPU allocation strategy, context lengths) justify separate scripts, consider extracting common functions (e.g., validate_env_vars(), setup_environment()) into a shared utility to reduce maintenance burden and ensure consistency across future updates.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d870d4b and 47d6661.

📒 Files selected for processing (2)
  • components/backends/sglang/slurm_jobs/scripts/gb200-fp4-alt.sh (1 hunks)
  • components/backends/sglang/slurm_jobs/scripts/gb200-fp4.sh (5 hunks)
🧰 Additional context used
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/3886/merge) by ishandhanani.
components/backends/sglang/slurm_jobs/scripts/gb200-fp4-alt.sh

[error] 1-1: Pre-commit hook 'check-shebang-scripts-are-executable' failed: file has a shebang but is not executable. Run 'chmod +x components/backends/sglang/slurm_jobs/scripts/gb200-fp4-alt.sh' to fix.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: trtllm (amd64)
  • GitHub Check: operator (arm64)
  • GitHub Check: sglang
  • GitHub Check: vllm (amd64)
  • GitHub Check: vllm (arm64)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (3)
components/backends/sglang/slurm_jobs/scripts/gb200-fp4.sh (2)

71-80: Environment variable setup correctly positioned before pip install and main command.

The placement of TORCH_DISTRIBUTED_DEFAULT_TIMEOUT, SGLANG_DG_CACHE_DIR, and FLASHINFER_WORKSPACE_BASE exports before the nvidia-cutlass-dsl install is appropriate for ensuring these are available early. The command_suffix logic for USE_INIT_LOCATIONS is correctly initialized to empty string and conditionally populated.


127-133: Resource parameter adjustments for FP4 cache constraints are well-documented.

The reduction of SGLANG_DEEPEP_NUM_MAX_DISPATCH_TOKENS_PER_RANK from 1408 to 384 (line 164) and --cuda-graph-bs from 1408 to 384 (line 191), paired with the explanatory comment at lines 135–140, demonstrates clear intent to address the integer overflow issue referenced in the FlashInfer GitHub issue. The addition of --ep-size "$TOTAL_GPUS" in both prefill (line 127) and decode (line 206) modes is consistent.

Also applies to: 164-211

components/backends/sglang/slurm_jobs/scripts/gb200-fp4-alt.sh (1)

130-132: Clarify why prefill uses TOTAL_GPUS - 2 while decode uses TOTAL_GPUS.

The prefill mode scales --ep-size, --tp-size, and --dp-size by $((TOTAL_GPUS - 2)), whereas decode mode uses the full $TOTAL_GPUS (lines 201–203). Given the presence of offload flags (lines 115–118) in prefill, this appears intentional (reserving 2 GPUs for CPU offload), but the reason is not documented.

Add a comment explaining this allocation strategy to improve clarity for future maintainers:

+    # Reserve 2 GPUs for CPU offload (offload-mode cpu); use remainder for prefill computation
 	--ep-size "$((TOTAL_GPUS - 2))" \
 	--tp-size "$((TOTAL_GPUS - 2))" \
 	--dp-size "$((TOTAL_GPUS - 2))" \

@@ -0,0 +1,208 @@
#!/bin/bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix: New script missing executable permission (pipeline failure).

The pre-commit hook detected that the file has a shebang but is not executable. Run chmod +x components/backends/sglang/slurm_jobs/scripts/gb200-fp4-alt.sh to fix this before merge.

🧰 Tools
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/3886/merge) by ishandhanani.

[error] 1-1: Pre-commit hook 'check-shebang-scripts-are-executable' failed: file has a shebang but is not executable. Run 'chmod +x components/backends/sglang/slurm_jobs/scripts/gb200-fp4-alt.sh' to fix.

🤖 Prompt for AI Agents
components/backends/sglang/slurm_jobs/scripts/gb200-fp4-alt.sh lines 1-1: the
script has a shebang but lacks executable permission causing the pre-commit
hook/pipeline to fail; fix by setting the executable bit and committing the
change (run chmod +x
components/backends/sglang/slurm_jobs/scripts/gb200-fp4-alt.sh locally, verify
the permission, then add and commit the file so the repo contains the executable
bit).

Comment on lines +68 to +79
if [ "$mode" = "prefill" ]; then
set -x
export TORCH_DISTRIBUTED_DEFAULT_TIMEOUT=1800
export SGLANG_DG_CACHE_DIR="/configs/deepgemm-kernels-10212025-ddcba74b"
export FLASHINFER_WORKSPACE_BASE="/configs/flashinfer-cache"

# temp we need to install newest cutedsl
python3 -m pip install --no-cache-dir --upgrade --pre nvidia-cutlass-dsl

# no expert locations collected for fp4 yet
if [[ "${USE_INIT_LOCATIONS,,}" == "true" ]]; then command_suffix=" "; fi

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Initialize command_suffix before conditional in prefill mode.

On line 78, command_suffix is only set if USE_INIT_LOCATIONS is "true". If it is not, the variable remains unset, which will cause an expansion error when ${command_suffix} is used on line 136. Compare this to the pattern in gb200-fp4.sh (lines 71–72), which explicitly initializes command_suffix="" before the conditional.

Apply this diff to initialize the variable:

 if [ "$mode" = "prefill" ]; then
     set -x
     export TORCH_DISTRIBUTED_DEFAULT_TIMEOUT=1800
     export SGLANG_DG_CACHE_DIR="/configs/deepgemm-kernels-10212025-ddcba74b"
     export FLASHINFER_WORKSPACE_BASE="/configs/flashinfer-cache"
 
     # temp we need to install newest cutedsl
     python3 -m pip install --no-cache-dir --upgrade --pre nvidia-cutlass-dsl
 
     # no expert locations collected for fp4 yet
+    command_suffix=""
     if [[ "${USE_INIT_LOCATIONS,,}" == "true" ]]; then command_suffix=" "; fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if [ "$mode" = "prefill" ]; then
set -x
export TORCH_DISTRIBUTED_DEFAULT_TIMEOUT=1800
export SGLANG_DG_CACHE_DIR="/configs/deepgemm-kernels-10212025-ddcba74b"
export FLASHINFER_WORKSPACE_BASE="/configs/flashinfer-cache"
# temp we need to install newest cutedsl
python3 -m pip install --no-cache-dir --upgrade --pre nvidia-cutlass-dsl
# no expert locations collected for fp4 yet
if [[ "${USE_INIT_LOCATIONS,,}" == "true" ]]; then command_suffix=" "; fi
if [ "$mode" = "prefill" ]; then
set -x
export TORCH_DISTRIBUTED_DEFAULT_TIMEOUT=1800
export SGLANG_DG_CACHE_DIR="/configs/deepgemm-kernels-10212025-ddcba74b"
export FLASHINFER_WORKSPACE_BASE="/configs/flashinfer-cache"
# temp we need to install newest cutedsl
python3 -m pip install --no-cache-dir --upgrade --pre nvidia-cutlass-dsl
# no expert locations collected for fp4 yet
command_suffix=""
if [[ "${USE_INIT_LOCATIONS,,}" == "true" ]]; then command_suffix=" "; fi
🤖 Prompt for AI Agents
components/backends/sglang/slurm_jobs/scripts/gb200-fp4-alt.sh around lines 68
to 79: command_suffix is only assigned inside the USE_INIT_LOCATIONS
conditional, leaving it unset when that branch is false and causing expansion
errors later; initialize command_suffix="" immediately before the if [[
"${USE_INIT_LOCATIONS,,}" == "true" ]] check (matching the pattern in
gb200-fp4.sh) so it always exists, then keep the conditional to overwrite it
when USE_INIT_LOCATIONS is true.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant