feat(hpc): add InfiniBand diagnostics module (#635)#656
Conversation
Add IbDiagnosticsModule for running ibdiagnet/iblinkinfo/ibstat checks and parsing results into structured data. Closes #635 Co-Authored-By: Claude Opus 4.6 <[email protected]>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 258f87040d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| format!("IB diagnostics completed with {} error(s)", errors.len()) | ||
| }; | ||
|
|
||
| Ok(ModuleOutput::ok(summary) |
There was a problem hiding this comment.
Mark module failed when diagnostic commands fail
The loop records command failures in errors, but the function always returns ModuleOutput::ok(...), so runs where tools are missing or commands fail (e.g., ibdiagnet not installed or non-zero exit) are reported as successful to callers. In automation workflows this can let playbooks continue despite incomplete diagnostics, which undermines result reliability; the status should switch to failed (or return an error) when errors is non-empty.
Useful? React with 👍 / 👎.
| // Save to file | ||
| let output_file = format!("{}/{}.log", output_dir, cmd); | ||
| let escaped = stdout.replace('\'', "'\\''"); | ||
| let _ = run_cmd( |
There was a problem hiding this comment.
Propagate failures when writing diagnostic log files
The result of the log-write command is discarded with let _ = run_cmd(...), so permission issues, missing directories, or disk errors silently drop report files even though output_dir is returned as if the artifacts were saved. This makes diagnostics output misleading in environments where report persistence is required; log-write failures should be added to errors or fail the module.
Useful? React with 👍 / 👎.
Summary
IbDiagnosticsModulefor running ibdiagnet/iblinkinfo/ibstat fabric diagnosticsofedfeature flagCloses #635
Test plan
cargo clippy --features full-hpcpassescargo test --features full-hpc --lib -- ib_diagnostics— 5 tests pass🤖 Generated with Claude Code