-
Notifications
You must be signed in to change notification settings - Fork 659
feat: more gb200 fp4 work #3886
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThis PR introduces a new Bash script ( Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Poem
Pre-merge checks❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
components/backends/sglang/slurm_jobs/scripts/gb200-fp4-alt.sh (1)
1-208: High degree of code duplication withgb200-fp4.sh.Both scripts implement nearly identical argument validation, environment setup, and command construction logic. While the different configuration parameters (offload settings, GPU allocation strategy, context lengths) justify separate scripts, consider extracting common functions (e.g.,
validate_env_vars(),setup_environment()) into a shared utility to reduce maintenance burden and ensure consistency across future updates.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
components/backends/sglang/slurm_jobs/scripts/gb200-fp4-alt.sh(1 hunks)components/backends/sglang/slurm_jobs/scripts/gb200-fp4.sh(5 hunks)
🧰 Additional context used
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/3886/merge) by ishandhanani.
components/backends/sglang/slurm_jobs/scripts/gb200-fp4-alt.sh
[error] 1-1: Pre-commit hook 'check-shebang-scripts-are-executable' failed: file has a shebang but is not executable. Run 'chmod +x components/backends/sglang/slurm_jobs/scripts/gb200-fp4-alt.sh' to fix.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: trtllm (amd64)
- GitHub Check: operator (arm64)
- GitHub Check: sglang
- GitHub Check: vllm (amd64)
- GitHub Check: vllm (arm64)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (3)
components/backends/sglang/slurm_jobs/scripts/gb200-fp4.sh (2)
71-80: Environment variable setup correctly positioned before pip install and main command.The placement of
TORCH_DISTRIBUTED_DEFAULT_TIMEOUT,SGLANG_DG_CACHE_DIR, andFLASHINFER_WORKSPACE_BASEexports before the nvidia-cutlass-dsl install is appropriate for ensuring these are available early. Thecommand_suffixlogic forUSE_INIT_LOCATIONSis correctly initialized to empty string and conditionally populated.
127-133: Resource parameter adjustments for FP4 cache constraints are well-documented.The reduction of
SGLANG_DEEPEP_NUM_MAX_DISPATCH_TOKENS_PER_RANKfrom 1408 to 384 (line 164) and--cuda-graph-bsfrom 1408 to 384 (line 191), paired with the explanatory comment at lines 135–140, demonstrates clear intent to address the integer overflow issue referenced in the FlashInfer GitHub issue. The addition of--ep-size "$TOTAL_GPUS"in both prefill (line 127) and decode (line 206) modes is consistent.Also applies to: 164-211
components/backends/sglang/slurm_jobs/scripts/gb200-fp4-alt.sh (1)
130-132: Clarify why prefill usesTOTAL_GPUS - 2while decode usesTOTAL_GPUS.The prefill mode scales
--ep-size,--tp-size, and--dp-sizeby$((TOTAL_GPUS - 2)), whereas decode mode uses the full$TOTAL_GPUS(lines 201–203). Given the presence of offload flags (lines 115–118) in prefill, this appears intentional (reserving 2 GPUs for CPU offload), but the reason is not documented.Add a comment explaining this allocation strategy to improve clarity for future maintainers:
+ # Reserve 2 GPUs for CPU offload (offload-mode cpu); use remainder for prefill computation --ep-size "$((TOTAL_GPUS - 2))" \ --tp-size "$((TOTAL_GPUS - 2))" \ --dp-size "$((TOTAL_GPUS - 2))" \
| @@ -0,0 +1,208 @@ | |||
| #!/bin/bash | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix: New script missing executable permission (pipeline failure).
The pre-commit hook detected that the file has a shebang but is not executable. Run chmod +x components/backends/sglang/slurm_jobs/scripts/gb200-fp4-alt.sh to fix this before merge.
🧰 Tools
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/3886/merge) by ishandhanani.
[error] 1-1: Pre-commit hook 'check-shebang-scripts-are-executable' failed: file has a shebang but is not executable. Run 'chmod +x components/backends/sglang/slurm_jobs/scripts/gb200-fp4-alt.sh' to fix.
🤖 Prompt for AI Agents
components/backends/sglang/slurm_jobs/scripts/gb200-fp4-alt.sh lines 1-1: the
script has a shebang but lacks executable permission causing the pre-commit
hook/pipeline to fail; fix by setting the executable bit and committing the
change (run chmod +x
components/backends/sglang/slurm_jobs/scripts/gb200-fp4-alt.sh locally, verify
the permission, then add and commit the file so the repo contains the executable
bit).
| if [ "$mode" = "prefill" ]; then | ||
| set -x | ||
| export TORCH_DISTRIBUTED_DEFAULT_TIMEOUT=1800 | ||
| export SGLANG_DG_CACHE_DIR="/configs/deepgemm-kernels-10212025-ddcba74b" | ||
| export FLASHINFER_WORKSPACE_BASE="/configs/flashinfer-cache" | ||
|
|
||
| # temp we need to install newest cutedsl | ||
| python3 -m pip install --no-cache-dir --upgrade --pre nvidia-cutlass-dsl | ||
|
|
||
| # no expert locations collected for fp4 yet | ||
| if [[ "${USE_INIT_LOCATIONS,,}" == "true" ]]; then command_suffix=" "; fi | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initialize command_suffix before conditional in prefill mode.
On line 78, command_suffix is only set if USE_INIT_LOCATIONS is "true". If it is not, the variable remains unset, which will cause an expansion error when ${command_suffix} is used on line 136. Compare this to the pattern in gb200-fp4.sh (lines 71–72), which explicitly initializes command_suffix="" before the conditional.
Apply this diff to initialize the variable:
if [ "$mode" = "prefill" ]; then
set -x
export TORCH_DISTRIBUTED_DEFAULT_TIMEOUT=1800
export SGLANG_DG_CACHE_DIR="/configs/deepgemm-kernels-10212025-ddcba74b"
export FLASHINFER_WORKSPACE_BASE="/configs/flashinfer-cache"
# temp we need to install newest cutedsl
python3 -m pip install --no-cache-dir --upgrade --pre nvidia-cutlass-dsl
# no expert locations collected for fp4 yet
+ command_suffix=""
if [[ "${USE_INIT_LOCATIONS,,}" == "true" ]]; then command_suffix=" "; fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if [ "$mode" = "prefill" ]; then | |
| set -x | |
| export TORCH_DISTRIBUTED_DEFAULT_TIMEOUT=1800 | |
| export SGLANG_DG_CACHE_DIR="/configs/deepgemm-kernels-10212025-ddcba74b" | |
| export FLASHINFER_WORKSPACE_BASE="/configs/flashinfer-cache" | |
| # temp we need to install newest cutedsl | |
| python3 -m pip install --no-cache-dir --upgrade --pre nvidia-cutlass-dsl | |
| # no expert locations collected for fp4 yet | |
| if [[ "${USE_INIT_LOCATIONS,,}" == "true" ]]; then command_suffix=" "; fi | |
| if [ "$mode" = "prefill" ]; then | |
| set -x | |
| export TORCH_DISTRIBUTED_DEFAULT_TIMEOUT=1800 | |
| export SGLANG_DG_CACHE_DIR="/configs/deepgemm-kernels-10212025-ddcba74b" | |
| export FLASHINFER_WORKSPACE_BASE="/configs/flashinfer-cache" | |
| # temp we need to install newest cutedsl | |
| python3 -m pip install --no-cache-dir --upgrade --pre nvidia-cutlass-dsl | |
| # no expert locations collected for fp4 yet | |
| command_suffix="" | |
| if [[ "${USE_INIT_LOCATIONS,,}" == "true" ]]; then command_suffix=" "; fi |
🤖 Prompt for AI Agents
components/backends/sglang/slurm_jobs/scripts/gb200-fp4-alt.sh around lines 68
to 79: command_suffix is only assigned inside the USE_INIT_LOCATIONS
conditional, leaving it unset when that branch is false and causing expansion
errors later; initialize command_suffix="" immediately before the if [[
"${USE_INIT_LOCATIONS,,}" == "true" ]] check (matching the pattern in
gb200-fp4.sh) so it always exists, then keep the conditional to overwrite it
when USE_INIT_LOCATIONS is true.
Summary by CodeRabbit
New Features
Chores