fix: non-zero exit code in RDMA naming script#109
Closed
Conversation
Problem The azure_persistent_rdma_naming.service consistently fails with status=1/FAILURE on startup. Root Cause: The script /usr/sbin/azure_persistent_rdma_naming.sh uses set -e (exit on error). In the final iteration of the for loop, the last command executed is an increment or an assignment that returns a non-zero status (specifically when the loop finishes or if a sub-command within the loop logic returns 1). Because this is the last line of the script, the entire process exits with 1, causing systemd to report a service failure even if the naming logic was successful. Solution Added an explicit exit 0 at the end of the script. This ensures that if the script reaches the end of its logic without encountering a genuine fatal error, it returns a success code to systemd. Validation Manual Execution: Running sudo bash -x /usr/sbin/azure_persistent_rdma_naming.sh now completes with an exit code of 0. Systemd Status: systemctl status azure_persistent_rdma_naming.service now reports active (exited) instead of failed. Functionality: Verified that mlx5_ib devices are still correctly identified and processed by the script. Signed-off-by: Gaurav Goklani <ggoklani@redhat.com>
Reviewer's guide (collapsed on small PRs)Reviewer's GuideEnsures the RDMA persistent naming script always exits successfully when its logic completes without a real error, preventing spurious systemd service failures, by explicitly returning exit code 0 at the end of the script. Sequence diagram for RDMA naming script exit code handlingsequenceDiagram
participant systemd as systemd
participant service as azure_persistent_rdma_naming.service
participant script as azure_persistent_rdma_naming.sh
systemd->>service: Start service
service->>script: Execute script
script->>script: Perform RDMA device detection loop
alt fatal_error_encountered
script-->>service: exit 1 (or non-zero)
service-->>systemd: status=failed
else no_fatal_error
script->>script: reach end of script
script-->>service: exit 0 (explicit exit 0)
service-->>systemd: status=active (exited)
end
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The azure_persistent_rdma_naming.service consistently fails with status=1/FAILURE on startup.
Root Cause:
The script /usr/sbin/azure_persistent_rdma_naming.sh uses set -e (exit on error). In the final iteration of the for loop, the last command executed is an increment or an assignment that returns a non-zero status (specifically when the loop finishes or if a sub-command within the loop logic returns 1). Because this is the last line of the script, the entire process exits with 1, causing systemd to report a service failure even if the naming logic was successful.
Solution
Added an explicit exit 0 at the end of the script. This ensures that if the script reaches the end of its logic without encountering a genuine fatal error, it returns a success code to systemd.
Validation
Manual Execution: Running sudo bash -x /usr/sbin/azure_persistent_rdma_naming.sh now completes with an exit code of 0.
Systemd Status: systemctl status azure_persistent_rdma_naming.service now reports active (exited) instead of failed.
Functionality: Verified that mlx5_ib devices are still correctly identified and processed by the script.
Summary by Sourcery
Bug Fixes: