feat: switch single-file dataset downloads to use browser native downloads #3995

madisonmlin · 2025-10-21T20:06:52Z

What changes were proposed in this PR?

This pull request changes single-file dataset downloads to have the browser, rather than JavaScript, handle the download. Having the browser handle downloads enables the browser's file download UI, which includes a progress bar when downloading larger files.

Previously, we would use LakeFS API to generate a pre-signed URL for the file and request it as a Blob, then write the Blob to the local filesystem. Because the download is handled programmatically in JavaScript instead of by the browser, the browser’s download UI is not shown. Now, we instead create a hidden link element in the frontend that points to the pre-signed URL and programmatically click it. This lets the browser handle the file transfer and display its native download UI.

Additional Challenges

The previous pre-signed URL was generated through the LakeFS API, which doesn’t support custom headers such as Content-Disposition. Without this header, the browser cannot correctly set the filename and filetype, leading to generic names (e.g. 'd3eo286eh7os77u8d0k0,Gw_-cWconI7VjtJviVRvW8nHb6NvkYRSIIx6cOy-RCs' ) and MIME type. To address this issue, we can instead use the S3 Gateway API to generate the pre-signed URL, which allows for customization of headers.

Use of headers like Content-Disposition with GET requests to specify filename is documented in the official AWS documentation (https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObject.html).

Overview of Changes

added method to generate pre-signed URL via S3 Gateway API, with additional flexibility to specify file name and type
updated frontend and backend to programmatically click on a pre-signed URL generated via the new method for single file dataset downloads
removed success and error notifications for single file dataset downloads (the success and error of a file download cannot be obtained when using browser native download)

Any related issues, documentation, discussions?

Addresses issue #3842, which is a sub-issue of #3404.

How was this PR tested?

Dataset download functionality was manually tested locally across several browsers (Chrome, Edge, Firefox, Brave), and with larger file sizes that would trigger the browser download UI's progress bar. @aicam also tested the dataset download on a Kubernetes deployment.

Was this PR authored or co-authored using generative AI tooling?

No.

…ss-bar

…oryName' instead of 'datasetName'

aicam · 2025-10-21T20:08:12Z

file-service/src/main/scala/org/apache/texera/service/util/S3StorageClient.scala


+  // Initialize S3-compatible presigner for LakeFS S3 Gateway
+  private lazy val s3Presigner: S3Presigner = {
+    val fullUri = new URI(StorageConfig.lakefsEndpoint)


Can we avoid this override?

I don't believe so. If we don't override to specify the endpoint our s3Presigner should go through, it will use the default AWS endpoint. Then, the generated presigned URL will point to AWS instead of LakeFS and won't be able to access the target file.

I don't understand why AWS is involved here. AWS provides libraries to use MinIO because MinIO is using same protocol, However, it should not cause any issue for us. Currently, the path to MinIO is defined in LakeFS config (called blockstore), and it was using it in previous S3 signed URL generation, why its not using that anymore? maybe a better question is: Why your implementation needs to override baseURL but previous one didn't?

The previous presigner used LakeFS's own API, so it already implicitly knew its own endpoint through the LakeFS config. However, the s3Presigner is a generic AWS SDK tool that MinIO happens to be compatible with. It isn't intended solely for LakeFS. As such, it doesn't know that we want to use it specifically for LakeFS, and instead defaults to the AWS S3 endpoint.

If LakeFS has the URL, why don't you get it from there instead of environment variable? this environment variable is not correct on production environment.

Sorry, as an amendment to my previous explanation, the LakeFS presigner actually does have the LakeFS endpoint explicitly specified:

private lazy val apiClient: ApiClient = { val client = new ApiClient() client.setApiKey(StorageConfig.lakefsPassword) client.setUsername(StorageConfig.lakefsUsername) client.setPassword(StorageConfig.lakefsPassword) client.setServers( List( new ServerConfiguration( StorageConfig.lakefsEndpoint, "LakeFS API server endpoint", new java.util.HashMap[String, ServerVariable]() ) ).asJava ) client }

This apiClient is used to generate the pre-signed URL. It uses the same StorageConfig.lakefsEndpoint that I use to override the s3Presigner endpoint. The only difference is that I parse to remove "api/v1" from the endpoint for the s3Presigner. Could this be causing the incorrect URL issue you mention?

The LakeFS presigner can be found under LakeFSStorageClient.scala:
https://github.com/apache/texera/blob/b7944f3129973fe8a73e92c09e7594b2bff4930b/common/workflow-core/src/main/scala/org/apache/amber/core/storage/util/LakeFSStorageClient.scala

chenlica · 2025-10-22T02:30:35Z

@madisonmlin Please follow our PR template. Check #3960 as an example.

…tials

file-service/src/main/scala/org/apache/texera/service/util/S3StorageClient.scala

madisonmlin · 2025-10-29T18:18:54Z

The new S3 presigner uses the same environment variable endpoint as the original LakeFS presigner. Here is reference to the apiClient used by LakeFS presigner, which sets the server endpoint (StorageConfig.lakefsEndpoint):

  private lazy val apiClient: ApiClient = {
    val client = new ApiClient()
    client.setApiKey(StorageConfig.lakefsPassword)
    client.setUsername(StorageConfig.lakefsUsername)
    client.setPassword(StorageConfig.lakefsPassword)
    client.setServers(
      List(
        new ServerConfiguration(
          StorageConfig.lakefsEndpoint,
          "LakeFS API server endpoint",
          new java.util.HashMap[String, ServerVariable]()
        )
      ).asJava
    )
    client
  }

The original LakeFS presigner can be found under LakeFSStorageClient.scala: https://github.com/apache/texera/blob/b7944f3129973fe8a73e92c09e7594b2bff4930b/common/workflow-core/src/main/scala/org/apache/amber/core/storage/util/LakeFSStorageClient.scala

…-bar

aicam · 2025-11-14T19:07:55Z

@madisonmlin unfortunately, after iterating over your implementation, I realized that it is wrong since it expose LakeFS to outside and I could not find any implementation in their library, so I contacted the developers of LakeFS in Slack and they asked me to create an issue and request the feature. The issue is now open and I will get back to this PR after they add this feature.

chenlica · 2025-11-15T07:32:17Z

@aicam Thanks. Does the new finding affect any PR?

madisonmlin and others added 11 commits August 4, 2025 14:45

switch to native browser download for single-file dataset downloads

9cff3bd

remove old single-file dataset download methods and unit tests

c8ae975

fix and format via scalafixall and scalafmtall

3bf3754

remove unused code

60ac481

fix format with prettier and eslint

4a75c85

move constant declaration to the class scope

d5813e1

remove unnecessary empty user creation for public downloads

90f75c5

Merge branch 'master' into enable-dataset-single-file-download-progre…

d278ba2

…ss-bar

merge main into branch

0e3e746

update generatePresignedResponseWithS3 to use parameter name 'reposit…

5300760

…oryName' instead of 'datasetName'

fix and format with scalafixAll and scalafmtAll

bdf0161

github-actions bot added frontend Changes related to the frontend GUI service labels Oct 21, 2025

aicam requested changes Oct 21, 2025

View reviewed changes

change s3Presigner's credentials from S3 credentials to LakeFS creden…

aa48a2a

…tials

aicam reviewed Oct 28, 2025

View reviewed changes

file-service/src/main/scala/org/apache/texera/service/util/S3StorageClient.scala Show resolved Hide resolved

madisonmlin marked this pull request as ready for review October 28, 2025 22:46

Merge branch 'main' into enable-dataset-single-file-download-progress…

941dbcc

…-bar

aicam closed this Nov 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: switch single-file dataset downloads to use browser native downloads #3995

feat: switch single-file dataset downloads to use browser native downloads #3995

Uh oh!

madisonmlin commented Oct 21, 2025 •

edited

Loading

Uh oh!

aicam Oct 21, 2025

Uh oh!

madisonmlin Oct 28, 2025 •

edited

Loading

Uh oh!

aicam Oct 28, 2025

Uh oh!

madisonmlin Oct 29, 2025 •

edited

Loading

Uh oh!

aicam Oct 29, 2025

Uh oh!

madisonmlin Oct 29, 2025

Uh oh!

madisonmlin Oct 29, 2025

Uh oh!

chenlica commented Oct 22, 2025

Uh oh!

Uh oh!

madisonmlin commented Oct 29, 2025

Uh oh!

aicam commented Nov 14, 2025

Uh oh!

chenlica commented Nov 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: switch single-file dataset downloads to use browser native downloads #3995

feat: switch single-file dataset downloads to use browser native downloads #3995

Uh oh!

Conversation

madisonmlin commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this PR?

Additional Challenges

Overview of Changes

Any related issues, documentation, discussions?

How was this PR tested?

Was this PR authored or co-authored using generative AI tooling?

Uh oh!

aicam Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

madisonmlin Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aicam Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

madisonmlin Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aicam Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

madisonmlin Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

madisonmlin Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

chenlica commented Oct 22, 2025

Uh oh!

Uh oh!

madisonmlin commented Oct 29, 2025

Uh oh!

aicam commented Nov 14, 2025

Uh oh!

chenlica commented Nov 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

madisonmlin commented Oct 21, 2025 •

edited

Loading

madisonmlin Oct 28, 2025 •

edited

Loading

madisonmlin Oct 29, 2025 •

edited

Loading