Skip to content

Conversation

@rishii-19-works
Copy link
Contributor

This PR adds a config-driven, S3-compatible way to disable AWS SDK trailing
checksum validation by wiring the setting through Iceberg’s S3FileIO properties.

  • Default behavior remains unchanged
  • No provider-specific logic
  • Applies to all S3-compatible object stores

Local note: Spotless passes locally. Full build fails on Windows during unrelated
JAR packaging tasks; CI on Linux should validate.

Fixes #3346


public static class Builder extends PolarisEntity.BaseBuilder<NamespaceEntity, Builder> {

private static final int MAX_NAMESPACE_DEPTH = 10;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated change?

Copy link
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into #3346 , @rishii-19-works !

properties.getOrDefault("s3.enable-trailing-checksums", "true");

if (!Boolean.parseBoolean(enableTrailingChecksums)) {
properties.put(S3FileIOProperties.CHECKSUM_VALIDATION_ENABLED, "false");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is probably preferable to handle FileIO-related config via StorageAccessConfig. This way it will apply both to Polaris and clients (engines).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change should not be necessary with the StorageAccessConfig-base solution.

@rishii-19-works
Copy link
Contributor Author

Thanks for the review and feedback!

I appreciate the suggestion about handling this config through StorageAccessConfig.
I’ll take a look at that approach and see how it can be incorporated so it applies more broadly to clients/engines as well.

Please let me know if you have pointers on where you’d prefer that configuration to be passed or stored.

@dimas-b
Copy link
Contributor

dimas-b commented Jan 5, 2026

Please let me know if you have pointers on where you’d prefer that configuration to be passed or stored.

Given that the behaviour this config setting is trying to control is specific to the storage technology, I'd guess putting it into AwsStorageConfigInfo would be appropriate (cf. pathStyleAccess). This way the same sever can have catalogs using different storage without them affecting one another.

The next best choice is RealmConfig (via FeatureConfiguration).

@dimas-b
Copy link
Contributor

dimas-b commented Jan 5, 2026

@rishii-19-works : please go ahead :) I'd approve a solution that I proposed 😉

@rishii-19-works
Copy link
Contributor Author

I’ve updated the PR to move the setting into AwsStorageConfigurationInfo
and apply the Iceberg-specific behavior during FileIO construction,
keeping it storage-scoped and per-catalog as discussed. Thanks!

@rishii-19-works
Copy link
Contributor Author

Thanks for the guidance!
I’ve refactored the config into AwsStorageConfigurationInfo as discussed.
CI is currently failing due to some model/codegen alignment issues — I’ll dig into that and push a fix shortly.

@rishii-19-works
Copy link
Contributor Author

Thanks for the review! I’ve removed the unrelated namespace-depth change and refactored the S3 checksum configuration into AwsStorageConfigurationInfo as discussed.
Local Windows builds still fail during JAR packaging due to reproducible-build permission handling, but this should pass on CI (Linux runners). Happy to make further changes if needed.

Copy link
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix CI

*
* <p><strong>WARNING:</strong> This property is intended for testing purposes only and should not
* be used in production environments.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt all change in this file are related to the stated purpose of this PR. Could you keep only the minimally necessary changes?

properties.getOrDefault("s3.enable-trailing-checksums", "true");

if (!Boolean.parseBoolean(enableTrailingChecksums)) {
properties.put(S3FileIOProperties.CHECKSUM_VALIDATION_ENABLED, "false");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change should not be necessary with the StorageAccessConfig-base solution.

@rishii-19-works
Copy link
Contributor Author

Thanks for the review!

I’ve addressed the feedback by scoping the checksum behavior to AwsStorageConfigurationInfo, keeping DefaultFileIOFactory storage-agnostic, and reverting unrelated changes.

All CI checks are now green. Please let me know if anything else is needed.

Copy link
Contributor

@adnanhemani adnanhemani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm very confused here. Where are we actually using the config now? I can see in previous versions of the PR that the config was being utilized somewhere but now it's all gone...

* when unset.
*/
@Nullable
public abstract Boolean getEnableTrailingChecksums();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a "dangling" property 🤔 Is it possible for a user to set it? It also does not appear to be set in code 🤔 (basically what @adnanhemani commented)

// Update with properties in case there are table-level overrides the credentials should
// always override table-level properties, since storage configuration will be found at
// whatever entity defines it
// Storage-level configuration should override table-level properties
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes LGTM, but please move them into a separate PR to avoid conflating "cleanup" with feature work.

@rishii-19-works
Copy link
Contributor Author

Thanks for pointing this out — you’re right.

The trailing checksum flag ended up being a dangling configuration in the current version, so I’ve removed it to keep the PR minimal and avoid introducing unused config.

The remaining change keeps FileIO wiring storage-agnostic and scoped to StorageAccessConfig, as discussed.

Happy to follow up with a separate PR if further cleanup is preferred.

Copy link
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rishii-19-works : sorry to be direct here, but in its current state the diff of this PR does not appear to be related to the purpose stated in the description 🤷 If you're making incremental progress, please put the PR in the "draft" state for the sake of clarity.

/** Flag indicating whether path-style bucket access should be forced in S3 clients. */
public abstract @Nullable Boolean getPathStyleAccess();
@Nullable
public abstract Boolean getPathStyleAccess();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use separate PRs for cleanup / typo fixes. Mixing cleanup change with feature work really complicates reviews, IMHO.

@rishii-19-works rishii-19-works marked this pull request as draft January 8, 2026 17:14
@rishii-19-works
Copy link
Contributor Author

Thanks for the clarification — I’ve converted this PR to draft while I finish wiring the checksum behavior and align the diff with the stated intent. I’ll mark it ready once complete.

@rishii-19-works rishii-19-works force-pushed the disable-s3-trailing-checksums branch from 8fed697 to c95f5f1 Compare January 8, 2026 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Alibaba Cloud OSS Compatibility - AWS Chunked Encoding Header Mismatch

3 participants