Skip to content

Conversation

@zheguang
Copy link
Contributor

@zheguang zheguang commented Nov 18, 2025

This PR address KAFKA-19782 by replacing hashset with trie for prefix search.

Previously the Authorizer uses character-by-character comparisons to search for prefix patterns to deny within a set of allowed literals. This prefix search is suboptimal when the set contains many strings that share the same prefix -- the complexity is linear in number of strings sharing a prefix. By constrast, a trie will reduce the complexity to constant.

In particular, this PR uses Apache Commons Collection's PATRICIA Trie, which is an efficient algorithm that offers worst-case O(k)-time operations, where k is the largest key bit-length.

The resulting code appears also clearer in logic.

@github-actions github-actions bot added triage PRs from the community build Gradle build or GitHub Actions clients labels Nov 18, 2025
@ekuvardin
Copy link
Contributor

ekuvardin commented Nov 18, 2025

Hi seems we make parallel changes :) on the same day
#20912

Maybe we should keep just one branch, and I'll merge the benchmarks into it and fix the bug with them?

@ekuvardin
Copy link
Contributor

ekuvardin commented Nov 18, 2025

Also take a look that this PR connected with big problem with benchmarks

  1. They are unstable and should be run and fixed also
  2. Error should be(oh.. it's the most dangerous zone) no more than 10 % percent in jmh.

I am stuck at this more than a month

@zheguang
Copy link
Contributor Author

Hi seems we make parallel changes :) on the same day #20912

Maybe we should keep just one branch, and I'll merge the benchmarks into it and fix the bug with them?

Small world :) Sure, happy to collaborate. I'll take a look at your benchmark changes. Once ready, are you able to push those benchmark changes to this branch? Otherwise I am also happy to merge your benchmark changes. Let me know!

@ekuvardin
Copy link
Contributor

@lucasbru
Hi Lucas, could you help us out?

We accidentally created two PRs for the same task, and we’ll actually be working together on it going forward.
Right now we have a question about which approach is better.

Could you coordinate us a bit and take a look at both PRs?

One PR introduces a PatriciaTrie-based Set on top of PatriciaTrie, - #20911 (this pr)
and the other one uses a PatriciaTrie Map. - #20912

In your opinion, which option is better to use? Both approaches work.

By the way, I’ve already written to a colleague from the library Apache common collections and suggested adding a PatriciaTrie Set, but he’s very busy and hasn’t responded yet - apache/commons-collections#589

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build Gradle build or GitHub Actions clients triage PRs from the community

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants