-
Notifications
You must be signed in to change notification settings - Fork 349
Add config to disable S3 trailing checksum validation #3357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add config to disable S3 trailing checksum validation #3357
Conversation
|
|
||
| public static class Builder extends PolarisEntity.BaseBuilder<NamespaceEntity, Builder> { | ||
|
|
||
| private static final int MAX_NAMESPACE_DEPTH = 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated change?
dimas-b
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into #3346 , @rishii-19-works !
| properties.getOrDefault("s3.enable-trailing-checksums", "true"); | ||
|
|
||
| if (!Boolean.parseBoolean(enableTrailingChecksums)) { | ||
| properties.put(S3FileIOProperties.CHECKSUM_VALIDATION_ENABLED, "false"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is probably preferable to handle FileIO-related config via StorageAccessConfig. This way it will apply both to Polaris and clients (engines).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change should not be necessary with the StorageAccessConfig-base solution.
|
Thanks for the review and feedback! I appreciate the suggestion about handling this config through Please let me know if you have pointers on where you’d prefer that configuration to be passed or stored. |
Given that the behaviour this config setting is trying to control is specific to the storage technology, I'd guess putting it into The next best choice is |
|
@rishii-19-works : please go ahead :) I'd approve a solution that I proposed 😉 |
|
I’ve updated the PR to move the setting into AwsStorageConfigurationInfo |
|
Thanks for the guidance! |
|
Thanks for the review! I’ve removed the unrelated namespace-depth change and refactored the S3 checksum configuration into AwsStorageConfigurationInfo as discussed. |
dimas-b
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix CI
| * | ||
| * <p><strong>WARNING:</strong> This property is intended for testing purposes only and should not | ||
| * be used in production environments. | ||
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubt all change in this file are related to the stated purpose of this PR. Could you keep only the minimally necessary changes?
| properties.getOrDefault("s3.enable-trailing-checksums", "true"); | ||
|
|
||
| if (!Boolean.parseBoolean(enableTrailingChecksums)) { | ||
| properties.put(S3FileIOProperties.CHECKSUM_VALIDATION_ENABLED, "false"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change should not be necessary with the StorageAccessConfig-base solution.
|
Thanks for the review! I’ve addressed the feedback by scoping the checksum behavior to AwsStorageConfigurationInfo, keeping DefaultFileIOFactory storage-agnostic, and reverting unrelated changes. All CI checks are now green. Please let me know if anything else is needed. |
adnanhemani
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm very confused here. Where are we actually using the config now? I can see in previous versions of the PR that the config was being utilized somewhere but now it's all gone...
| * when unset. | ||
| */ | ||
| @Nullable | ||
| public abstract Boolean getEnableTrailingChecksums(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be a "dangling" property 🤔 Is it possible for a user to set it? It also does not appear to be set in code 🤔 (basically what @adnanhemani commented)
| // Update with properties in case there are table-level overrides the credentials should | ||
| // always override table-level properties, since storage configuration will be found at | ||
| // whatever entity defines it | ||
| // Storage-level configuration should override table-level properties |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes LGTM, but please move them into a separate PR to avoid conflating "cleanup" with feature work.
|
Thanks for pointing this out — you’re right. The trailing checksum flag ended up being a dangling configuration in the current version, so I’ve removed it to keep the PR minimal and avoid introducing unused config. The remaining change keeps FileIO wiring storage-agnostic and scoped to StorageAccessConfig, as discussed. Happy to follow up with a separate PR if further cleanup is preferred. |
dimas-b
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rishii-19-works : sorry to be direct here, but in its current state the diff of this PR does not appear to be related to the purpose stated in the description 🤷 If you're making incremental progress, please put the PR in the "draft" state for the sake of clarity.
| /** Flag indicating whether path-style bucket access should be forced in S3 clients. */ | ||
| public abstract @Nullable Boolean getPathStyleAccess(); | ||
| @Nullable | ||
| public abstract Boolean getPathStyleAccess(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use separate PRs for cleanup / typo fixes. Mixing cleanup change with feature work really complicates reviews, IMHO.
|
Thanks for the clarification — I’ve converted this PR to draft while I finish wiring the checksum behavior and align the diff with the stated intent. I’ll mark it ready once complete. |
8fed697 to
c95f5f1
Compare
This PR adds a config-driven, S3-compatible way to disable AWS SDK trailing
checksum validation by wiring the setting through Iceberg’s S3FileIO properties.
Local note: Spotless passes locally. Full build fails on Windows during unrelated
JAR packaging tasks; CI on Linux should validate.
Fixes #3346