feat: more gb200 fp4 work #3886

ishandhanani · 2025-10-24T21:06:18Z

Summary by CodeRabbit

New Features
- Added a new GPU job configuration script supporting distributed workload processing with prefill and decode modes.
Chores
- Updated job orchestration scripts with refined resource allocation parameters and environment configurations.
- Enhanced cache and memory management for improved system stability.

coderabbitai · 2025-10-25T00:31:04Z

Walkthrough

This PR introduces a new Bash script (gb200-fp4-alt.sh) for orchestrating prefill and decode workflows in disaggregated inference, and modifies the existing gb200-fp4.sh script to adjust cache configuration, environment variables, and performance tuning parameters for GB200 GPU clusters running SLURM jobs.

Changes

Cohort / File(s)	Summary
GB200 FP4 SLURM scripts `components/backends/sglang/slurm_jobs/scripts/gb200-fp4-alt.sh`, `components/backends/sglang/slurm_jobs/scripts/gb200-fp4.sh`	New script added with two-mode workflow (prefill/decode) for disaggregated inference. Existing script modified: environment variables restructured, cache/workspace directories configured, TORCH_DISTRIBUTED_DEFAULT_TIMEOUT set consistently, token dispatch limits adjusted (1408→384), EP-size parameters added, and nvidia-cutlass-dsl pre-release installed for both modes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Parameter value changes (token limits, dispatch thresholds) and their performance impact
Environment variable configuration consistency across prefill/decode modes
Conditional logic handling for USE_INIT_LOCATIONS and command suffix construction
nvidia-cutlass-dsl installation version correctness and relevance
Verification that mode-specific flag configurations are appropriate for each workflow

Poem

🐰 Hop! Hop! New scripts script-hop in place,
GB200s racing through their scheduled race,
Prefill, decode—each mode tuned just right,
Cache paths set and timeouts aligned tight!
Disaggregated flows now burning so bright,
A well-coordinated cluster takes flight! ✨

Pre-merge checks

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The pull request has no description provided by the author. The repository template requires several key sections: an Overview describing the pull request, Details about the changes made, a section identifying where reviewers should start, and Related Issues information. Since the description is completely absent, none of these required sections are present, leaving reviewers without context about the purpose, scope, or impact of the changes, particularly important given the medium-to-high complexity of the modifications to two shell scripts.	Add a complete pull request description following the provided template. Include an overview explaining the purpose of adding the new gb200-fp4-alt.sh script and the rationale for modifying gb200-fp4.sh, provide details on the key changes (such as cache configuration, parameter adjustments, and cutlass-dsl updates), specify which files reviewers should focus on, and note any related issues or GitHub issue references.
Title Check	❓ Inconclusive	The title "feat: more gb200 fp4 work" refers to the actual subject matter of the changes (GB200 FP4 configurations), making it related to the changeset. However, the phrasing is vague and generic—"more ... work" does not convey what specifically was accomplished. A reviewer scanning commit history would not understand whether this adds new scripts, modifies existing ones, or implements specific optimizations without examining the details. The title lacks sufficient specificity to clearly summarize the primary change from the developer's perspective.	Consider revising the title to be more descriptive and specific. For example, something like "feat: add gb200-fp4-alt script and optimize gb200-fp4 configuration" would better convey that a new script was added and existing configurations were improved, making the intent clearer to reviewers scanning the history.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

components/backends/sglang/slurm_jobs/scripts/gb200-fp4-alt.sh (1)

1-208: High degree of code duplication with gb200-fp4.sh.

Both scripts implement nearly identical argument validation, environment setup, and command construction logic. While the different configuration parameters (offload settings, GPU allocation strategy, context lengths) justify separate scripts, consider extracting common functions (e.g., validate_env_vars(), setup_environment()) into a shared utility to reduce maintenance burden and ensure consistency across future updates.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d870d4b and 47d6661.

📒 Files selected for processing (2)

components/backends/sglang/slurm_jobs/scripts/gb200-fp4-alt.sh (1 hunks)
components/backends/sglang/slurm_jobs/scripts/gb200-fp4.sh (5 hunks)

🧰 Additional context used

🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/3886/merge) by ishandhanani.

components/backends/sglang/slurm_jobs/scripts/gb200-fp4-alt.sh

[error] 1-1: Pre-commit hook 'check-shebang-scripts-are-executable' failed: file has a shebang but is not executable. Run 'chmod +x components/backends/sglang/slurm_jobs/scripts/gb200-fp4-alt.sh' to fix.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: trtllm (amd64)
GitHub Check: operator (arm64)
GitHub Check: sglang
GitHub Check: vllm (amd64)
GitHub Check: vllm (arm64)
GitHub Check: Build and Test - dynamo

🔇 Additional comments (3)

components/backends/sglang/slurm_jobs/scripts/gb200-fp4.sh (2)

71-80: Environment variable setup correctly positioned before pip install and main command.

The placement of TORCH_DISTRIBUTED_DEFAULT_TIMEOUT, SGLANG_DG_CACHE_DIR, and FLASHINFER_WORKSPACE_BASE exports before the nvidia-cutlass-dsl install is appropriate for ensuring these are available early. The command_suffix logic for USE_INIT_LOCATIONS is correctly initialized to empty string and conditionally populated.

127-133: Resource parameter adjustments for FP4 cache constraints are well-documented.

The reduction of SGLANG_DEEPEP_NUM_MAX_DISPATCH_TOKENS_PER_RANK from 1408 to 384 (line 164) and --cuda-graph-bs from 1408 to 384 (line 191), paired with the explanatory comment at lines 135–140, demonstrates clear intent to address the integer overflow issue referenced in the FlashInfer GitHub issue. The addition of --ep-size "$TOTAL_GPUS" in both prefill (line 127) and decode (line 206) modes is consistent.

Also applies to: 164-211
components/backends/sglang/slurm_jobs/scripts/gb200-fp4-alt.sh (1)
130-132: Clarify why prefill uses TOTAL_GPUS - 2 while decode uses TOTAL_GPUS.

The prefill mode scales --ep-size, --tp-size, and --dp-size by $((TOTAL_GPUS - 2)), whereas decode mode uses the full $TOTAL_GPUS (lines 201–203). Given the presence of offload flags (lines 115–118) in prefill, this appears intentional (reserving 2 GPUs for CPU offload), but the reason is not documented.

Add a comment explaining this allocation strategy to improve clarity for future maintainers:
+    # Reserve 2 GPUs for CPU offload (offload-mode cpu); use remainder for prefill computation
 	--ep-size "$((TOTAL_GPUS - 2))" \
 	--tp-size "$((TOTAL_GPUS - 2))" \
 	--dp-size "$((TOTAL_GPUS - 2))" \

coderabbitai · 2025-10-25T00:31:07Z

components/backends/sglang/slurm_jobs/scripts/gb200-fp4-alt.sh

@@ -0,0 +1,208 @@
+#!/bin/bash


⚠️ Potential issue | 🔴 Critical

Fix: New script missing executable permission (pipeline failure).

The pre-commit hook detected that the file has a shebang but is not executable. Run chmod +x components/backends/sglang/slurm_jobs/scripts/gb200-fp4-alt.sh to fix this before merge.

🧰 Tools

🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/3886/merge) by ishandhanani.

[error] 1-1: Pre-commit hook 'check-shebang-scripts-are-executable' failed: file has a shebang but is not executable. Run 'chmod +x components/backends/sglang/slurm_jobs/scripts/gb200-fp4-alt.sh' to fix.

🤖 Prompt for AI Agents

components/backends/sglang/slurm_jobs/scripts/gb200-fp4-alt.sh lines 1-1: the script has a shebang but lacks executable permission causing the pre-commit hook/pipeline to fail; fix by setting the executable bit and committing the change (run chmod +x components/backends/sglang/slurm_jobs/scripts/gb200-fp4-alt.sh locally, verify the permission, then add and commit the file so the repo contains the executable bit).

coderabbitai · 2025-10-25T00:31:07Z

components/backends/sglang/slurm_jobs/scripts/gb200-fp4-alt.sh

+if [ "$mode" = "prefill" ]; then
+    set -x
+    export TORCH_DISTRIBUTED_DEFAULT_TIMEOUT=1800
+    export SGLANG_DG_CACHE_DIR="/configs/deepgemm-kernels-10212025-ddcba74b"
+    export FLASHINFER_WORKSPACE_BASE="/configs/flashinfer-cache"
+
+    # temp we need to install newest cutedsl
+    python3 -m pip install --no-cache-dir --upgrade --pre nvidia-cutlass-dsl
+
+    # no expert locations collected for fp4 yet
+    if [[ "${USE_INIT_LOCATIONS,,}" == "true" ]]; then command_suffix=" "; fi
+


⚠️ Potential issue | 🔴 Critical

Initialize command_suffix before conditional in prefill mode.

On line 78, command_suffix is only set if USE_INIT_LOCATIONS is "true". If it is not, the variable remains unset, which will cause an expansion error when ${command_suffix} is used on line 136. Compare this to the pattern in gb200-fp4.sh (lines 71–72), which explicitly initializes command_suffix="" before the conditional.

Apply this diff to initialize the variable:

if [ "$mode" = "prefill" ]; then set -x export TORCH_DISTRIBUTED_DEFAULT_TIMEOUT=1800 export SGLANG_DG_CACHE_DIR="/configs/deepgemm-kernels-10212025-ddcba74b" export FLASHINFER_WORKSPACE_BASE="/configs/flashinfer-cache" # temp we need to install newest cutedsl python3 -m pip install --no-cache-dir --upgrade --pre nvidia-cutlass-dsl # no expert locations collected for fp4 yet + command_suffix="" if [[ "${USE_INIT_LOCATIONS,,}" == "true" ]]; then command_suffix=" "; fi

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if [ "$mode" = "prefill" ]; then

set -x

export TORCH_DISTRIBUTED_DEFAULT_TIMEOUT=1800

export SGLANG_DG_CACHE_DIR="/configs/deepgemm-kernels-10212025-ddcba74b"

export FLASHINFER_WORKSPACE_BASE="/configs/flashinfer-cache"

# temp we need to install newest cutedsl

python3 -m pip install --no-cache-dir --upgrade --pre nvidia-cutlass-dsl

# no expert locations collected for fp4 yet

if [[ "${USE_INIT_LOCATIONS,,}" == "true" ]]; then command_suffix=" "; fi

if [ "$mode" = "prefill" ]; then

set -x

export TORCH_DISTRIBUTED_DEFAULT_TIMEOUT=1800

export SGLANG_DG_CACHE_DIR="/configs/deepgemm-kernels-10212025-ddcba74b"

export FLASHINFER_WORKSPACE_BASE="/configs/flashinfer-cache"

# temp we need to install newest cutedsl

python3 -m pip install --no-cache-dir --upgrade --pre nvidia-cutlass-dsl

# no expert locations collected for fp4 yet

command_suffix=""

if [[ "${USE_INIT_LOCATIONS,,}" == "true" ]]; then command_suffix=" "; fi

🤖 Prompt for AI Agents

components/backends/sglang/slurm_jobs/scripts/gb200-fp4-alt.sh around lines 68 to 79: command_suffix is only assigned inside the USE_INIT_LOCATIONS conditional, leaving it unset when that branch is false and causing expansion errors later; initialize command_suffix="" immediately before the if [[ "${USE_INIT_LOCATIONS,,}" == "true" ]] check (matching the pattern in gb200-fp4.sh) so it always exists, then keep the conditional to overwrite it when USE_INIT_LOCATIONS is true.

bump

44884ef

pull-request-size bot added the size/M label Oct 24, 2025

github-actions bot added the feat label Oct 24, 2025

go

47d6661

pull-request-size bot added size/L and removed size/M labels Oct 25, 2025

copy-pr-bot bot temporarily deployed to GITLAB October 25, 2025 00:28 Inactive

copy-pr-bot bot temporarily deployed to GITLAB October 25, 2025 00:29 Inactive

coderabbitai bot reviewed Oct 25, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: more gb200 fp4 work #3886

feat: more gb200 fp4 work #3886

Uh oh!

ishandhanani commented Oct 24, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Oct 25, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Oct 25, 2025

Uh oh!

coderabbitai bot Oct 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: more gb200 fp4 work #3886

Are you sure you want to change the base?

feat: more gb200 fp4 work #3886

Uh oh!

Conversation

ishandhanani commented Oct 24, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 25, 2025

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ishandhanani commented Oct 24, 2025 •

edited by coderabbitai bot

Loading