feat: implement on-demand batch presigning for multipart uploads #4004

xuang7 · 2025-10-27T01:08:03Z

What changes were proposed in this PR?

This PR introduces on-demand (batch) presigning for multipart uploads to reduce failures from expired pre-signed URLs. Previously, all part URLs were pre-signed at the start using an experimental LakeFS API. For long uploads, URLs for later parts could expire (after 15 min locally or 30 min on the server), causing the upload to fail midway. The revised implementation uses the LakeFS function for initial setup, then presigns URL batches on-demand directly using S3Presigner.

Changes (Backend)

Add a new method, presignUploadParts. This method uses s3Presigner to sign a specific list of provided partNumbers
The /multipart-upload endpoint coordinates new signing flow:
- type=init: Initiates upload with LakeFS (numParts=0), returns only uploadId and physicalAddress.
- type=presign(New operation): This endpoint receives a pendingParts list and physicalAddress from the client. It calls the new S3StorageClient.presignUploadParts to sign the requested batch and returns the new URLs.

Changes (Frontend)

Refactored multipart upload to use RxJS concatMap for sequential batch processing:
- Initiates with type=init (no pre-signed URLs)
- Processes uploads in batches, calling type=presign for each batch just before uploading
Introduce a urlBatchSize variable (default: 50) to control how many URLs are requested in each init and sign call.

Changes (Config)

Added s3MultipartPresignExpiryMinutes configuration variable to control presigned URL expiration time (default: 30 minutes)

Presigned URL Comparison

LakeFS initiatePresignedMultipartUploads	S3 presignUploadParts

Any related issues, documentation, discussions?

Fixes #3837
Resolves URL expiration for pending parts. Fully handling interruptions during part uploads requires resumable uploads.

How was this PR tested?

Tested with existing automated test cases and local manual tests.

Was this PR authored or co-authored using generative AI tooling?

No

Copilot

Pull Request Overview

This PR introduces on-demand batch presigning for multipart uploads to prevent failures from expired pre-signed URLs during long-running uploads. Previously, all part URLs were pre-signed upfront, causing later parts to expire (15-30 minutes). The new implementation presigns URLs in batches as needed.

Key Changes:

Backend adds presignUploadParts method using S3Presigner to sign specific part batches on-demand
API endpoint now supports type=init (first batch) and new type=sign operation (subsequent batches)
Frontend switches to RxJS expand operator for recursive, stateless batch fetching with configurable batch size (default: 100 parts)

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File	Description
frontend/src/app/dashboard/service/user/dataset/dataset.service.ts	Implements RxJS expand-based recursive batch fetching; adds signPendingParts method and urlBatchSize configuration
file-service/src/main/scala/org/apache/texera/service/util/S3StorageClient.scala	Adds S3Presigner client and presignUploadParts method with URI extraction helper
file-service/src/main/scala/org/apache/texera/service/resource/DatasetResource.scala	Adds "sign" operation handler; converts init response to Map format

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

frontend/src/app/dashboard/service/user/dataset/dataset.service.ts

file-service/src/main/scala/org/apache/texera/service/util/S3StorageClient.scala

file-service/src/main/scala/org/apache/texera/service/resource/DatasetResource.scala

…e.ts Co-authored-by: Copilot <[email protected]> Signed-off-by: Xinyuan Lin <[email protected]>

…torageClient.scala Co-authored-by: Copilot <[email protected]> Signed-off-by: Xinyuan Lin <[email protected]>

chenlica · 2025-11-15T07:33:47Z

@aicam please review it before @aglinxinyuan does his review.

xuang7 added 2 commits October 26, 2025 17:31

update.

dd3b156

Merge branch 'main' into feat/get-presigned-url-on-demand

5fe0148

github-actions bot added feature frontend Changes related to the frontend GUI service labels Oct 27, 2025

xuang7 marked this pull request as ready for review October 27, 2025 01:25

xuang7 marked this pull request as draft October 27, 2025 05:28

aglinxinyuan requested a review from Copilot October 27, 2025 05:33

Copilot AI reviewed Oct 27, 2025

View reviewed changes

aglinxinyuan and others added 2 commits October 29, 2025 20:41

Update frontend/src/app/dashboard/service/user/dataset/dataset.servic…

e80ef8e

…e.ts Co-authored-by: Copilot <[email protected]> Signed-off-by: Xinyuan Lin <[email protected]>

Update file-service/src/main/scala/org/apache/texera/service/util/S3S…

9175cb6

…torageClient.scala Co-authored-by: Copilot <[email protected]> Signed-off-by: Xinyuan Lin <[email protected]>

aglinxinyuan self-requested a review October 30, 2025 03:42

aglinxinyuan assigned xuang7 Oct 30, 2025

aglinxinyuan and others added 3 commits October 29, 2025 20:42

Merge branch 'main' into feat/get-presigned-url-on-demand

86c3d53

update.

af064a2

update..

540a411

github-actions bot added the common label Nov 1, 2025

update.

aee38ef

xuang7 marked this pull request as ready for review November 1, 2025 23:53

xuang7 added 2 commits November 3, 2025 13:42

Merge branch 'main' into feat/get-presigned-url-on-demand

4270314

Merge branch 'main' into feat/get-presigned-url-on-demand

93db24c

chenlica requested a review from aicam November 15, 2025 07:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: implement on-demand batch presigning for multipart uploads #4004

feat: implement on-demand batch presigning for multipart uploads #4004

xuang7 commented Oct 27, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chenlica commented Nov 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: implement on-demand batch presigning for multipart uploads #4004

Are you sure you want to change the base?

feat: implement on-demand batch presigning for multipart uploads #4004

Conversation

xuang7 commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this PR?

Any related issues, documentation, discussions?

How was this PR tested?

Was this PR authored or co-authored using generative AI tooling?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chenlica commented Nov 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

xuang7 commented Oct 27, 2025 •

edited

Loading