-
Notifications
You must be signed in to change notification settings - Fork 352
Support hierarchical namespace in Azure #3347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
evindj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me, I do think we need to clarify the behavior when the feature is enabled but the account on azure does not have hierarchical namespace enabled.
spec/polaris-management-service.yml
Outdated
| description: >- | ||
| If set to `true`, instructs Polaris Servers to scope SAS tokens down to the most specific path | ||
| in the storage container (in most cases the table's base location). This flag should be set only | ||
| if hierarchical namespace is enabled in the Azure storage account. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the behavior if the flag is set but the Azure account does not have the feature enabled?
I am wondering if in a future iteration there will be a way to enable this feature based on whether or not the feature is enabled in Azure.
| CatalogEntity catalogEntity = | ||
| new CatalogEntity.Builder() | ||
| .setName("testAwsConfigRoundTrip") | ||
| .setName("testStorageConfigRoundTrip") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for removing AWS specific here.
| "Allowed read locations must not have more that one entry"); | ||
| Preconditions.checkArgument( | ||
| allowedWriteLocations.size() <= 1, | ||
| "Allowed write locations must not have more that one entry"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
N00b question for my own understanding, what is the use case for several allowedReadLocations and allowedWriteLocations?
Also why does this apply only to hierarchical use case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the use case for several allowedReadLocations and allowedWriteLocations?
TBH, I do not really know 😅 Currently, only one location is passed in.
Azure can restrict SAS to only location (base directory or specific file).
|
@evindj :
Good point! Thanks for flagging it. I'll update the description in YAML. |
87e57ea to
35afeda
Compare
snazy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change looks good overall!
Please do not forget to update the CLI as well.
I was wondering whether we could implement this without introducing a configuration flag, but could not find a (cheap) way to figure out whether a file-system is hierarchical or not. So I guess there's no (performance neutral) way beside the introduced config flag.
.../src/main/java/org/apache/polaris/core/storage/azure/AzureCredentialsStorageIntegration.java
Outdated
Show resolved
Hide resolved
.../src/main/java/org/apache/polaris/core/storage/azure/AzureCredentialsStorageIntegration.java
Show resolved
Hide resolved
| return new DataLakePathClientBuilder() | ||
| .endpoint(endpoint) | ||
| .fileSystemName(fileSystemNameOrContainer) | ||
| .pathName(path) // TODO: drop authority part |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
jbonofre
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks !
35afeda to
2ea6475
Compare
|
|
||
| /** The flag indicating whether the storage account supports hierarchical namespaces. */ | ||
| @Nullable | ||
| public abstract Boolean isHierarchical(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| public abstract Boolean isHierarchical(); | |
| @JsonInclude(JsonInclude.Include.NON_NULL) | |
| public abstract Boolean isHierarchical(); |
Could actually also be (primitive):
| public abstract Boolean isHierarchical(); | |
| @JsonInclude(JsonInclude.Include.NON_DEFAULT) | |
| public abstract boolean isHierarchical(); |
Both help with serialization size and backwards compatiblity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a blocker tho
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An object Boolean is meant to allow distinguishing unset from false in the (generated) AzureStorageConfigInfo class. A simple boolean would be exposed to clients even it it was not explicitly set.
I'd rather not change AzureStorageConfigInfo serialization in this PR :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Include.NON_DEFAULT is implied:
Line 95 in 992a428
| JsonInclude.Include.NON_NULL, JsonInclude.Include.NON_NULL)) |
* Add `hierarchical` to `AzureStorageConfigInfo` (the default is unset translating to current behaviour). * Use `DataLakeDirectoryClient` instead of `DataLakeFileSystemClient` for generating SAS tokens when `hierarchical` is set to `true`. * Add `cloudTest` classes for testing with credential vending in ADLS.
2ea6475 to
c24fce3
Compare
|
rebased to resolve CHANGELOG.md conflicts |
Add
hierarchicaltoAzureStorageConfigInfo(the default is unset translating to current behaviour).Use
DataLakeDirectoryClientinstead ofDataLakeFileSystemClientfor generating SAS tokens whenhierarchicalis set totrue.Add
cloudTestclasses for testing with credential vending in ADLS.Checklist
CHANGELOG.md(if needed)site/content/in-dev/unreleased(if needed)