Skip to content

fix(objectstore): Add retry logic and structured error handling to S3 readObject()#59198

Draft
joshtrichards wants to merge 3 commits intomasterfrom
jtr/feat-s3-readObject-retry-alignment
Draft

fix(objectstore): Add retry logic and structured error handling to S3 readObject()#59198
joshtrichards wants to merge 3 commits intomasterfrom
jtr/feat-s3-readObject-retry-alignment

Conversation

@joshtrichards
Copy link
Member

  • Resolves: #

Summary

readObject() bypasses the AWS SDK's HTTP transport to enable seekable reads via HTTP range requests (SeekableHttpStream). A side effect is that the SDK's retry middleware and error parsing -- configured via retriesMaxAttempts -- have no effect on read operations. This means transient S3 failures (503, 429,
connection timeouts) cause immediate failure, and all errors surface as the generic "Failed to read object" with no S3 error context.

Changes:

  • Retry with backoff and jitter for transient failures (5xx, 429) and connection-level errors, honoring the existing retriesMaxAttempts config
  • ignore_errors stream context option so fopen() returns a stream on 4xx/5xx instead of bare false, allowing inspection of the actual response
  • S3 error response parsing -- extracts error code, message, x-amz-request-id, and x-amz-id-2 from both the XML body and response headers
  • Per-attempt logging at warning for retryable failures (visible even when retries succeed) and error for terminal failures
  • Improved exception messages including attempt count and structured S3 error detail instead of "Failed to read object $urn"
  • Correct status code parsing -- scans wrapper_data in reverse to handle 100 Continue / redirect / proxy response chains

Context:

See PR #20033 for the original motivation behind the SeekableHttpStream approach. The SDK bypass remains necessary for seekable range-request streams; this change brings the error handling closer to parity with what the SDK provides for all other S3 operations.

Not in scope:

  • Refactoring logger access across S3ConnectionTrait / S3ObjectTrait into a shared abstract method (flagged with a TODO for a follow-up)
  • Response checksum validation and permanent redirect handling, which are also bypassed but require deeper architectural changes

TODO

  • ...

Checklist

AI (if applicable)

  • The content of this PR was partly or fully generated using AI

Since we bypass the SDK to support #20033, we need to emulate more of its behavior to keep readObject consistent with other transactions. This adds basic retry behavior for the main scenarios.

Signed-off-by: Josh <josh.t.richards@gmail.com>
Signed-off-by: Josh <josh.t.richards@gmail.com>
Signed-off-by: Josh <josh.t.richards@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant