Skip to content

Conversation

@sberss
Copy link

@sberss sberss commented Nov 25, 2025

Description

Fixed an issue in the AWS Batch backend where jobs that fail before writing all their expected outputs would cause the delocalization step to fail entirely. This prevented the upload of crucial debugging files (return code, stdout, stderr) to S3.

When a task command fails (e.g., exit 1) before creating all declared outputs, the subsequent delocalization step would attempt to upload these missing files. With set -e enabled, the first failed upload would abort the entire delocalization block, meaning rc/stdout/stderr were never uploaded if they came after a missing output file.

This PR:

  1. Removes set -e from the delocalization block - allows all upload attempts to proceed regardless of individual failures.
  2. Changes the exit code logic: if the task succeeded (rc=0) but delocalization failed, exit with code 1 (as before). If the task failed (rc≠0) and delocalization also failed, exit with the original task return code. This ensures task failures aren't masked by delocalization failures.
Task rc Delocalization Exit code Rationale
0 Success 0 Everything worked
0 Failed 1 Task succeeded but outputs missing = real problem
≠0 Success rc Task failed, outputs captured
≠0 Failed rc Task failed, missing outputs expected

Example task to reproduce the issue:

task EarlyFailure {

  command {
    echo "Hello, World" > hello.txt
    exit 1
    echo "Goodbye, World" > goodbye.txt
  }

  output {
    File hello = "hello.txt"
    File goodbye = "goodbye.txt"
  }
}

Prior to this PR, the generated script would attempt to delocalize goodbye.txt to S3, fail as it does not exist and exit before it uploaded the RC, stdout and stderr.

Release Notes Confirmation

CHANGELOG.md

  • I updated CHANGELOG.md in this PR
  • I assert that this change shouldn't be included in CHANGELOG.md because it doesn't impact community users

Terra Release Notes

  • I added a suggested release notes entry in this Jira ticket
  • I assert that this change doesn't need Jira release notes because it doesn't impact Terra users

@sberss sberss requested a review from a team as a code owner November 25, 2025 14:16
@sberss sberss force-pushed the aws-script-upload-rc-on-failure branch from 8d00465 to 9eae5f5 Compare November 25, 2025 14:16
@sberss sberss force-pushed the aws-script-upload-rc-on-failure branch from 9eae5f5 to a6c1279 Compare November 25, 2025 14:21
@LizBaldo
Copy link
Contributor

Thank you @sberss
Our team will be reviewing your PRs and approve the necessary steps to get it merged (same with your other Prs)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants