Skip to content

Fixed force_virtual_addressing problem #1923

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 17, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion mkdocs/docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -189,7 +189,7 @@ PyIceberg uses [S3FileSystem](https://arrow.apache.org/docs/python/generated/pya
| s3.access-key-id | admin | Configure the static access key id used to access the FileIO. |
| s3.secret-access-key | password | Configure the static secret access key used to access the FileIO. |
| s3.session-token | AQoDYXdzEJr... | Configure the static session token used to access the FileIO. |
| s3.force-virtual-addressing | True | Whether to use virtual addressing of buckets. This must be set to True as OSS can only be accessed with virtual hosted style address. |
| s3.force-virtual-addressing | True | Whether to use virtual addressing of buckets. This is set to `True` by default as OSS can only be accessed with virtual hosted style address. |

<!-- markdown-link-check-enable-->

Expand Down
8 changes: 3 additions & 5 deletions pyiceberg/io/pyarrow.py
Original file line number Diff line number Diff line change
Expand Up @@ -409,6 +409,7 @@ def _initialize_oss_fs(self) -> FileSystem:
"secret_key": get_first_property_value(self.properties, S3_SECRET_ACCESS_KEY, AWS_SECRET_ACCESS_KEY),
"session_token": get_first_property_value(self.properties, S3_SESSION_TOKEN, AWS_SESSION_TOKEN),
"region": get_first_property_value(self.properties, S3_REGION, AWS_REGION),
"force_virtual_addressing": property_as_bool(self.properties, S3_FORCE_VIRTUAL_ADDRESSING, True),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can remove region?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it breaks the tests:

                # For oss scheme, user provided region is used instead
>               assert pyarrow_file_io.new_input(f"oss://{bucket_region[0]}/path/to/file")._filesystem.region == user_provided_region
E               AssertionError: assert 'us-east-1' == 'ap-southeast-1'
E                 - ap-southeast-1
E                 + us-east-1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can remove region?

For OSS authentication, it's not really necessary since they only need endpoint, but perhaps you're right that it's nicer to keep it.

Copy link
Contributor

@Fokko Fokko Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@helmiazizm Thoughts on changing the default?

Also, below we have False the default:

        if self.properties.get(S3_FORCE_VIRTUAL_ADDRESSING) is not None:
            client_kwargs["force_virtual_addressing"] = property_as_bool(self.properties, S3_FORCE_VIRTUAL_ADDRESSING, False)

Copy link
Contributor Author

@helmiazizm helmiazizm Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike S3 authentication, virtual addressing is mandatory in OSS. Not having it set to True will always throw access denied error, hence why setting it to True by default would make authentication easier for users in case if they ever forget to set s3.force-virtual-addressing parameter to True. I also don't think this setting for OSS will go away anytime soon as other S3-compatible object storages are also using the same method, so this change should be safe.

I still keep the property_as_bool function to make the authentication feel as graceful in case if users insist on setting it to False.

Copy link
Contributor

@Fokko Fokko Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@helmiazizm Thanks for the explanation, as I don't have any hands-on experience with Alibaba Cloud. Lets move this forward 👍

}

if proxy_uri := self.properties.get(S3_PROXY_URI):
Expand All @@ -426,9 +427,6 @@ def _initialize_oss_fs(self) -> FileSystem:
if session_name := get_first_property_value(self.properties, S3_ROLE_SESSION_NAME, AWS_ROLE_SESSION_NAME):
client_kwargs["session_name"] = session_name

if force_virtual_addressing := self.properties.get(S3_FORCE_VIRTUAL_ADDRESSING):
client_kwargs["force_virtual_addressing"] = property_as_bool(self.properties, force_virtual_addressing, False)

return S3FileSystem(**client_kwargs)

def _initialize_s3_fs(self, netloc: Optional[str]) -> FileSystem:
Expand Down Expand Up @@ -472,8 +470,8 @@ def _initialize_s3_fs(self, netloc: Optional[str]) -> FileSystem:
if session_name := get_first_property_value(self.properties, S3_ROLE_SESSION_NAME, AWS_ROLE_SESSION_NAME):
client_kwargs["session_name"] = session_name

if force_virtual_addressing := self.properties.get(S3_FORCE_VIRTUAL_ADDRESSING):
client_kwargs["force_virtual_addressing"] = property_as_bool(self.properties, force_virtual_addressing, False)
if self.properties.get(S3_FORCE_VIRTUAL_ADDRESSING) is not None:
client_kwargs["force_virtual_addressing"] = property_as_bool(self.properties, S3_FORCE_VIRTUAL_ADDRESSING, False)

return S3FileSystem(**client_kwargs)

Expand Down