-
Notifications
You must be signed in to change notification settings - Fork 110
fix(dataset): allow file upload api not to use multipart-upload #3755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
3b7a272 to
bab69e2
Compare
|
@aicam and @aglinxinyuan : please review this PR. |
c99aa9c to
3b93199
Compare
Add environment variable support for all configuration properties in default.conf, following the same pattern used in gui.conf. This allows deployment configurations to override defaults without modifying the configuration file directly. Environment variables added: - CONFIG_SERVICE_ALWAYS_RESET_CONFIGURATIONS_TO_DEFAULT_VALUES - GUI_LOGO_LOGO, GUI_LOGO_MINI_LOGO, GUI_LOGO_FAVICON - GUI_TABS_* for all tab configurations - DATASET_* for dataset upload configurations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we introduce another endpoint? What's the reason for when installing texera in single-node mode, the multipart upload is unable to reach to the host network. It's not a good practice that when A does not work with B, we introduce C. The codebase will become extremely large and unmanageable.
| config-service { | ||
| # Setting to true resets all site settings in the database to the defaults defined in this file. | ||
| always-reset-configurations-to-default-values = false | ||
| always-reset-configurations-to-default-values = ${?CONFIG_SERVICE_ALWAYS_RESET_CONFIGURATIONS_TO_DEFAULT_VALUES} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change should not belong to this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will revert this change later. This is for current release image building
I agree with the principle. In single node case, all containers are running in its own internal network and they cannot reach to host machine via Since current upload API forces the usage of multipart upload, I add a new parameter to toggle it. The reason why I didn't make it a separate endpoint is because at high level, either way (multipart or not) is logically doing file uploading. Therefore they should be handled by the same endpoint. |
I think we should manage functions based on their underlying logic, rather than just looking at the function name at a high level. I reviewed the code carefully, and with this flag in place, the function essentially skips 99% of its existing logic, making none of the code being reused. In effect, the function now contains two separate implementations: if the flag is true, it executes logic A; otherwise, it executes logic B. This check should happen on the frontend; for example, if we want to do A, call API A, and if we want to do B, call API B. By adding one more flag in the API parameter, we are pushing the logic to be handled at the frontend to the backend. That aside, I still don’t fully understand the issue: why does the service fail on PUT http://localhost:8080/presigned-url, while LakeFSStorageClient is able to access the service without problems? |
The key difference lies in where the code executes and which network it uses: LakeFS Storage Client works because:
Multipart upload fails because:
|
In this case, what's your suggestion on the API design for supporting non-multipart upload? |
We can introduce one more endpoint just for non-multipart upload. |
|
@bobbai00 @aglinxinyuan : It may be easier to have a meeting to discuss this topic, then include the discussion result here. |
For code review, it’s better to keep the discussion here. I don’t have any concerns with the design — the discussion is more about coding style. |
|
If a discussion can be done efficiently here, we can do so. If not, please feel free to do an offline discussion and summarize the decision. |
Overview
This PR let file upload API (POST
/dataset/{did}/upload) support both multipart and non-multipart uploads. This is needed because in some cases, e.g. loading example data when installing texera in single-node mode, the multipart upload is unable to reach to the host network.Changes