Skip to content

fix: non-zero exit code in RDMA naming script#109

Closed
ggoklani wants to merge 1 commit intomainfrom
fix_rdma_naming_script
Closed

fix: non-zero exit code in RDMA naming script#109
ggoklani wants to merge 1 commit intomainfrom
fix_rdma_naming_script

Conversation

@ggoklani
Copy link
Collaborator

@ggoklani ggoklani commented Mar 24, 2026

Problem
The azure_persistent_rdma_naming.service consistently fails with status=1/FAILURE on startup.

Root Cause:
The script /usr/sbin/azure_persistent_rdma_naming.sh uses set -e (exit on error). In the final iteration of the for loop, the last command executed is an increment or an assignment that returns a non-zero status (specifically when the loop finishes or if a sub-command within the loop logic returns 1). Because this is the last line of the script, the entire process exits with 1, causing systemd to report a service failure even if the naming logic was successful.

Solution
Added an explicit exit 0 at the end of the script. This ensures that if the script reaches the end of its logic without encountering a genuine fatal error, it returns a success code to systemd.

Validation
Manual Execution: Running sudo bash -x /usr/sbin/azure_persistent_rdma_naming.sh now completes with an exit code of 0.

Systemd Status: systemctl status azure_persistent_rdma_naming.service now reports active (exited) instead of failed.

Functionality: Verified that mlx5_ib devices are still correctly identified and processed by the script.

Summary by Sourcery

Bug Fixes:

  • Prevent the azure_persistent_rdma_naming systemd service from reporting a failure due to a non-zero exit code at the end of the RDMA naming script.

Problem
The azure_persistent_rdma_naming.service consistently fails with status=1/FAILURE on startup.

Root Cause:
The script /usr/sbin/azure_persistent_rdma_naming.sh uses set -e (exit on error). In the final iteration of the for loop,
the last command executed is an increment or an assignment that returns a non-zero status
(specifically when the loop finishes or if a sub-command within the loop logic returns 1).
Because this is the last line of the script, the entire process exits with 1, causing systemd to report a service failure even if the naming logic was successful.

Solution
Added an explicit exit 0 at the end of the script. This ensures that if the script reaches the end of its logic without encountering a genuine fatal error,
it returns a success code to systemd.

Validation
Manual Execution: Running sudo bash -x /usr/sbin/azure_persistent_rdma_naming.sh now completes with an exit code of 0.

Systemd Status: systemctl status azure_persistent_rdma_naming.service now reports active (exited) instead of failed.

Functionality: Verified that mlx5_ib devices are still correctly identified and processed by the script.

Signed-off-by: Gaurav Goklani <ggoklani@redhat.com>
@sourcery-ai
Copy link

sourcery-ai bot commented Mar 24, 2026

Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

Ensures the RDMA persistent naming script always exits successfully when its logic completes without a real error, preventing spurious systemd service failures, by explicitly returning exit code 0 at the end of the script.

Sequence diagram for RDMA naming script exit code handling

sequenceDiagram
    participant systemd as systemd
    participant service as azure_persistent_rdma_naming.service
    participant script as azure_persistent_rdma_naming.sh

    systemd->>service: Start service
    service->>script: Execute script
    script->>script: Perform RDMA device detection loop
    alt fatal_error_encountered
        script-->>service: exit 1 (or non-zero)
        service-->>systemd: status=failed
    else no_fatal_error
        script->>script: reach end of script
        script-->>service: exit 0 (explicit exit 0)
        service-->>systemd: status=active (exited)
    end
Loading

File-Level Changes

Change Details Files
Force the RDMA naming script to return a success exit code when it reaches the end of its logic without fatal errors.
  • Append an explicit exit 0 after the RDMA device processing loop to override non-zero exit codes from the last loop iteration or arithmetic/assignment operations.
  • Preserve existing RDMA device discovery and logging behavior while only altering the script’s final exit status semantics with systemd.
templates/rdma/azure_persistent_rdma_naming.sh.j2

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@ggoklani ggoklani closed this Mar 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant