Accelerate HashMap
and HashSet
lookup by using if
based modulo in loops
#106569
+33
−13
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
HashMap
uses modulo to map hashes and positions to the0 to capacity
range.Modulo is slow, so it was exchanged for
fastmod
in #62327.However,
fastmod
is still a bottleneck forHashMap
lookup. I was able to accelerate calls by using anif
pseudo modulo instead, which is faster thanfastmod
.This change can accelerate
HashMap
lookup by 40%.Explanation
On my machine,
fastmod
uses 2 multiplications (3-10 clock cycles1 each) and a bitshift (1 clock cycle2).An
if
is an integer comparison (1 clock cycle2) and a mostly predictable branch (0-2 clock cycles3 normally, 12 - 25 clock cycles on modulo case).It stands to reason that in most cases, an
if
based modulo will be faster thanfastmod
. It can be used when we're sure the number is between(0, capacity * 2 - 1)
.Benchmarks
Theory is well and good, but benchmarks are better.
I measured an up to 40% improvement in
HashMap
get
, 11% forinsert
, and 2.5% forerase
.This will be the case notably mostly when the memory is in cache. When it is not in cache, the bottleneck is likely to be RAM access.
Test Code
On master:
On this PR:
Alternatives
@Nazarwadim has proposed in #90082 to switch to
&
for modulus. This would make the operation near instantaneous.However, we currently use prime based capacities, because those decrease collision counts when a mediocre hash strategy is chosen. There is definitely a trade off to be measured some time, but it should be measured with both strategies at an ideal implementation.
Since we know our hash functions are high quality, this may be a better solution in the end. Using power of two growth seems to make for a better trade-off in most modern
HashMap
implementations anyway. I will likely test this soon.Java apparently applies a secondary hash function to the given hash, to increase quality (https://stackoverflow.com/a/15437377/730797). We could always use that for unknown hash functions (and not do it for known, high quality hash functions).
Footnotes
https://agner.org/optimize/optimizing_cpp.pdf p. 149 ↩
https://agner.org/optimize/optimizing_cpp.pdf p. 30 ↩ ↩2
https://agner.org/optimize/optimizing_cpp.pdf p. 43 ↩