-
Notifications
You must be signed in to change notification settings - Fork 0
Closed
Labels
enhancementNew feature or requestNew feature or requesthpcHPC cluster management supportHPC cluster management support
Milestone
Description
Parent mapping issue: #615
Backlog milestone: https://github.com/adolago/rustible/milestone/2
Epic: HPC expected PR backlog (20 slices)
This epic tracks the PR-sized implementation sequence for HPC gap closure.
Execution order follows the roadmap milestones (M1 through M6) and links every child issue.
Ordered checklist
M1 — Core Quick Wins
- [HPC PR 01] Implement
lustre_mountmodule (FS-01) #616 FS-01lustre_mount - [HPC PR 02] Implement
ipmi_powerandipmi_bootmodules (BM-01) #617 BM-01ipmi_power+ipmi_boot - [HPC PR 03] Implement
slurm_nodemodule (SCH-01) #618 SCH-01slurm_node - [HPC PR 04] Implement
slurm_partitionmodule (SCH-02) #619 SCH-02slurm_partition
M2 — Extended Control
- [HPC PR 05] Implement
nvidia_drivermodule (GPU-01) #620 GPU-01nvidia_driver - [HPC PR 06] Implement Redfish power/info modules (BM-02) #621 BM-02 Redfish power/info
- [HPC PR 07] Implement
ipoibmodule (IB-03) #622 IB-03ipoib - [HPC PR 08] Implement BeeGFS client module (FS-03) #623 FS-03 BeeGFS client
M3 — Identity Stack
- [HPC PR 09] Implement Kerberos client module(s) (ID-02) #624 ID-02 Kerberos client modules
- [HPC PR 10] Implement
sssd_config/sssd_domainmodules (ID-01) #625 ID-01sssd_config/sssd_domain
M4 — Fabric and Scheduler Deepening
- [HPC PR 11] Implement
opensm_configmodule (IB-01) #626 IB-01opensm_config - [HPC PR 12] Implement
ib_partitionmodule (IB-02) #627 IB-02ib_partition - [HPC PR 13] Implement
cuda_toolkitmodule (GPU-02) #628 GPU-02cuda_toolkit - [HPC PR 14] Implement Slurm accounting modules (SCH-03) #629 SCH-03 Slurm accounting modules
M5 — Advanced Storage and Provisioning
- [HPC PR 15] Implement Lustre OST lifecycle module (FS-02) #630 FS-02 Lustre OST lifecycle
- [HPC PR 16] Implement PXE host/profile module(s) (BM-03) #631 BM-03 PXE host/profile modules
- [HPC PR 17] Implement Warewulf integration module(s) (BM-04) #632 BM-04 Warewulf integration modules
M6 — Scale Validation and Operations
- [HPC PR 18] Build 10k+ scale validation suite (SC-01) #633 SC-01 10k+ scale validation suite
- [HPC PR 19] Implement Lmod module management (SW-01) #634 SW-01 Lmod module management
- [HPC PR 20] Implement fabric diagnostics module (IB-05) #635 IB-05 Fabric diagnostics module
Completion criteria
- All child issues closed with merged PRs.
- Evidence artifacts updated for reliability/performance where applicable.
- Master gap map (
#615) updated to reflect closed gaps.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requesthpcHPC cluster management supportHPC cluster management support