Skip to content

[DYNAMO] add SLURM smoke config for Dynamo#2504

Draft
AmeenP wants to merge 1 commit into
feat/dynamo-deployment-examplefrom
codex/dynamo-slurm-smoke
Draft

[DYNAMO] add SLURM smoke config for Dynamo#2504
AmeenP wants to merge 1 commit into
feat/dynamo-deployment-examplefrom
codex/dynamo-slurm-smoke

Conversation

@AmeenP
Copy link
Copy Markdown
Contributor

@AmeenP AmeenP commented May 15, 2026

Summary

Adds a single-node SLURM smoke example for running prime-rl against NVIDIA Dynamo as the inference backend.

This is stacked on #2394 so it lives next to the existing Dynamo K8s/local smoke examples.

Files added:

  • tools/dynamo/slurm_smoke/smoke_rl.toml: RL config for a 2-GPU smoke run with no prime-rl-managed inference process.
  • tools/dynamo/slurm_smoke/dynamo_single_node_rl.sbatch.j2: custom sbatch template that starts Dynamo on GPU 0 with --discovery-backend file, waits for /health and /v1/models, then runs prime-rl on GPU 1.
  • tools/dynamo/slurm_smoke/README.md: dry-run, submit, environment override, and log instructions.

The Dynamo side uses file discovery and disables KV event publishing, so the sample does not require etcd or NATS for the single-node smoke path.

Validation

Local:

  • Parsed tools/dynamo/slurm_smoke/smoke_rl.toml through RLConfig using the config package and temporary validation dependencies.
  • Rendered dynamo_single_node_rl.sbatch.j2 with Jinja2 and checked the rendered script with bash -n.
  • git diff --cached --check

GitHub Actions:

  • Ruff: pass
  • Slim install (prime-rl-configs only): pass
  • Unit tests: pass

Note: full uv run rl ... --dry-run cannot run in this macOS workspace because prime-rl's lockfile supports Linux platforms only, and the local environment does not have the project console scripts synced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant