Skip to content

Conversation

@dimas-b
Copy link
Contributor

@dimas-b dimas-b commented Jan 2, 2026

  • Add hierarchical to AzureStorageConfigInfo (the default is unset translating to current behaviour).

  • Use DataLakeDirectoryClient instead of DataLakeFileSystemClient for generating SAS tokens when hierarchical is set to true.

  • Add cloudTest classes for testing with credential vending in ADLS.

Checklist

  • 🛡️ Don't disclose security issues! (contact [email protected])
  • 🔗 Clearly explained why the changes are needed, or linked related issues: Fixes #
  • 🧪 Added/updated tests with good coverage, or manually tested (and explained how)
  • 💡 Added comments for complex logic
  • 🧾 Updated CHANGELOG.md (if needed)
  • 📚 Updated documentation in site/content/in-dev/unreleased (if needed)

Copy link

@evindj evindj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me, I do think we need to clarify the behavior when the feature is enabled but the account on azure does not have hierarchical namespace enabled.

description: >-
If set to `true`, instructs Polaris Servers to scope SAS tokens down to the most specific path
in the storage container (in most cases the table's base location). This flag should be set only
if hierarchical namespace is enabled in the Azure storage account.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the behavior if the flag is set but the Azure account does not have the feature enabled?
I am wondering if in a future iteration there will be a way to enable this feature based on whether or not the feature is enabled in Azure.

CatalogEntity catalogEntity =
new CatalogEntity.Builder()
.setName("testAwsConfigRoundTrip")
.setName("testStorageConfigRoundTrip")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for removing AWS specific here.

"Allowed read locations must not have more that one entry");
Preconditions.checkArgument(
allowedWriteLocations.size() <= 1,
"Allowed write locations must not have more that one entry");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

N00b question for my own understanding, what is the use case for several allowedReadLocations and allowedWriteLocations?
Also why does this apply only to hierarchical use case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the use case for several allowedReadLocations and allowedWriteLocations?

TBH, I do not really know 😅 Currently, only one location is passed in.

Azure can restrict SAS to only location (base directory or specific file).

@dimas-b
Copy link
Contributor Author

dimas-b commented Jan 5, 2026

@evindj :

do think we need to clarify the behavior when the feature is enabled but the account on azure does not have hierarchical namespace enabled.

Good point! Thanks for flagging it. I'll update the description in YAML.

Copy link
Member

@snazy snazy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks good overall!

Please do not forget to update the CLI as well.

I was wondering whether we could implement this without introducing a configuration flag, but could not find a (cheap) way to figure out whether a file-system is hierarchical or not. So I guess there's no (performance neutral) way beside the introduced config flag.

return new DataLakePathClientBuilder()
.endpoint(endpoint)
.fileSystemName(fileSystemNameOrContainer)
.pathName(path) // TODO: drop authority part
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

jbonofre
jbonofre previously approved these changes Jan 7, 2026
Copy link
Member

@jbonofre jbonofre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks !

@github-project-automation github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Jan 7, 2026
@dimas-b dimas-b requested a review from snazy January 8, 2026 16:43

/** The flag indicating whether the storage account supports hierarchical namespaces. */
@Nullable
public abstract Boolean isHierarchical();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public abstract Boolean isHierarchical();
@JsonInclude(JsonInclude.Include.NON_NULL)
public abstract Boolean isHierarchical();

Could actually also be (primitive):

Suggested change
public abstract Boolean isHierarchical();
@JsonInclude(JsonInclude.Include.NON_DEFAULT)
public abstract boolean isHierarchical();

Both help with serialization size and backwards compatiblity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a blocker tho

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An object Boolean is meant to allow distinguishing unset from false in the (generated) AzureStorageConfigInfo class. A simple boolean would be exposed to clients even it it was not explicitly set.

I'd rather not change AzureStorageConfigInfo serialization in this PR :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Include.NON_DEFAULT is implied:

JsonInclude.Include.NON_NULL, JsonInclude.Include.NON_NULL))

* Add `hierarchical` to `AzureStorageConfigInfo` (the default is unset translating to current behaviour).

* Use `DataLakeDirectoryClient` instead of `DataLakeFileSystemClient` for generating SAS tokens
  when `hierarchical` is set to `true`.

* Add `cloudTest` classes for testing with credential vending in ADLS.
@dimas-b
Copy link
Contributor Author

dimas-b commented Jan 9, 2026

rebased to resolve CHANGELOG.md conflicts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants