Skip to content

Fixed force_virtual_addressing problem #1923

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 17, 2025

Conversation

helmiazizm
Copy link
Contributor

@helmiazizm helmiazizm commented Apr 16, 2025

Rationale for this change

This fix addressed issue #1922 and changed the behavior for both _oss_fs and _s3_fs to be able to parse s3.force-virtual-addressing correctly.

Are these changes tested?

No unit tests are given to this change yet

Are there any user-facing changes?

No

@@ -408,7 +408,7 @@ def _initialize_oss_fs(self) -> FileSystem:
"access_key": get_first_property_value(self.properties, S3_ACCESS_KEY_ID, AWS_ACCESS_KEY_ID),
"secret_key": get_first_property_value(self.properties, S3_SECRET_ACCESS_KEY, AWS_SECRET_ACCESS_KEY),
"session_token": get_first_property_value(self.properties, S3_SESSION_TOKEN, AWS_SESSION_TOKEN),
"region": get_first_property_value(self.properties, S3_REGION, AWS_REGION),
"force_virtual_addressing": property_as_bool(self.properties, S3_FORCE_VIRTUAL_ADDRESSING, True),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can remove region?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it breaks the tests:

                # For oss scheme, user provided region is used instead
>               assert pyarrow_file_io.new_input(f"oss://{bucket_region[0]}/path/to/file")._filesystem.region == user_provided_region
E               AssertionError: assert 'us-east-1' == 'ap-southeast-1'
E                 - ap-southeast-1
E                 + us-east-1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can remove region?

For OSS authentication, it's not really necessary since they only need endpoint, but perhaps you're right that it's nicer to keep it.

Copy link
Contributor

@Fokko Fokko Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@helmiazizm Thoughts on changing the default?

Also, below we have False the default:

        if self.properties.get(S3_FORCE_VIRTUAL_ADDRESSING) is not None:
            client_kwargs["force_virtual_addressing"] = property_as_bool(self.properties, S3_FORCE_VIRTUAL_ADDRESSING, False)

Copy link
Contributor Author

@helmiazizm helmiazizm Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike S3 authentication, virtual addressing is mandatory in OSS. Not having it set to True will always throw access denied error, hence why setting it to True by default would make authentication easier for users in case if they ever forget to set s3.force-virtual-addressing parameter to True. I also don't think this setting for OSS will go away anytime soon as other S3-compatible object storages are also using the same method, so this change should be safe.

I still keep the property_as_bool function to make the authentication feel as graceful in case if users insist on setting it to False.

Copy link
Contributor

@Fokko Fokko Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@helmiazizm Thanks for the explanation, as I don't have any hands-on experience with Alibaba Cloud. Lets move this forward 👍

@helmiazizm helmiazizm requested a review from Fokko April 17, 2025 14:28
…ssing is set to True by default for OSS authentication
@Fokko Fokko merged commit 831170d into apache:main Apr 17, 2025
8 checks passed
@helmiazizm helmiazizm deleted the fix-virtual-addressing branch April 19, 2025 03:31
@Fokko Fokko added this to the PyIceberg 0.9.1 milestone Apr 20, 2025
@helmiazizm helmiazizm restored the fix-virtual-addressing branch April 22, 2025 02:36
@helmiazizm helmiazizm deleted the fix-virtual-addressing branch April 22, 2025 02:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants