-
Notifications
You must be signed in to change notification settings - Fork 110
feat: implement on-demand batch presigning for multipart uploads #4004
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces on-demand batch presigning for multipart uploads to prevent failures from expired pre-signed URLs during long-running uploads. Previously, all part URLs were pre-signed upfront, causing later parts to expire (15-30 minutes). The new implementation presigns URLs in batches as needed.
Key Changes:
- Backend adds
presignUploadPartsmethod usingS3Presignerto sign specific part batches on-demand - API endpoint now supports
type=init(first batch) and newtype=signoperation (subsequent batches) - Frontend switches to RxJS
expandoperator for recursive, stateless batch fetching with configurable batch size (default: 100 parts)
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| frontend/src/app/dashboard/service/user/dataset/dataset.service.ts | Implements RxJS expand-based recursive batch fetching; adds signPendingParts method and urlBatchSize configuration |
| file-service/src/main/scala/org/apache/texera/service/util/S3StorageClient.scala | Adds S3Presigner client and presignUploadParts method with URI extraction helper |
| file-service/src/main/scala/org/apache/texera/service/resource/DatasetResource.scala | Adds "sign" operation handler; converts init response to Map format |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
frontend/src/app/dashboard/service/user/dataset/dataset.service.ts
Outdated
Show resolved
Hide resolved
file-service/src/main/scala/org/apache/texera/service/util/S3StorageClient.scala
Outdated
Show resolved
Hide resolved
file-service/src/main/scala/org/apache/texera/service/resource/DatasetResource.scala
Show resolved
Hide resolved
file-service/src/main/scala/org/apache/texera/service/resource/DatasetResource.scala
Show resolved
Hide resolved
…e.ts Co-authored-by: Copilot <[email protected]> Signed-off-by: Xinyuan Lin <[email protected]>
…torageClient.scala Co-authored-by: Copilot <[email protected]> Signed-off-by: Xinyuan Lin <[email protected]>
|
@aicam please review it before @aglinxinyuan does his review. |
What changes were proposed in this PR?
This PR introduces on-demand (batch) presigning for multipart uploads to reduce failures from expired pre-signed URLs. Previously, all part URLs were pre-signed at the start using an experimental LakeFS API. For long uploads, URLs for later parts could expire (after 15 min locally or 30 min on the server), causing the upload to fail midway. The revised implementation uses the LakeFS function for initial setup, then presigns URL batches on-demand directly using
S3Presigner.Changes (Backend)
presignUploadParts. This method usess3Presignerto sign a specific list of provided partNumberspendingPartslist andphysicalAddressfrom the client. It calls the newS3StorageClient.presignUploadPartsto sign the requested batch and returns the new URLs.Changes (Frontend)
concatMapfor sequential batch processing:type=init(no pre-signed URLs)type=presignfor each batch just before uploadingurlBatchSizevariable (default: 50) to control how many URLs are requested in each init and sign call.Changes (Config)
s3MultipartPresignExpiryMinutesconfiguration variable to control presigned URL expiration time (default: 30 minutes)Presigned URL Comparison
Any related issues, documentation, discussions?
Fixes #3837
Resolves URL expiration for pending parts. Fully handling interruptions during part uploads requires resumable uploads.
How was this PR tested?
Tested with existing automated test cases and local manual tests.
Was this PR authored or co-authored using generative AI tooling?
No