Optimize lookup: 5x faster than std.HashMap at 1M elements#1
Optimize lookup: 5x faster than std.HashMap at 1M elements#1joshuaisaact wants to merge 85 commits intomainfrom
Conversation
This reverts commit 740bc74.
This reverts commit bfd53c3.
This reverts commit 37c7627.
This reverts commit bb13814.
This reverts commit 53ffd0a.
This reverts commit e22f93b.
This reverts commit 6600b40.
This reverts commit 4e2f019.
- results.md: full writeup of all 62 experiments - results.tsv: machine-readable experiment log - insights.md: optimization patterns and architecture notes - bench.sh: benchmark runner script - program.md: autoresearch specification - src/autobench.zig: benchmark harness (fixed for Zig 0.16 nightly) - .gitignore + build.zig: minor updates for autobench support
📝 WalkthroughWalkthroughThis pull request optimizes an elastic hash table implementation through architectural changes and adds comprehensive benchmarking infrastructure. Changes include reduced probe counts, cached tier-0 bucket metadata, updated hash and fingerprint functions, new benchmark harness, build integration, documentation of experimental results, and scripts for running performance evaluations. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~28 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
The previous abseil numbers used operator[] (find-or-insert) instead of emplace()/find(). The corrected comparison shows we roughly match abseil at 99% load on lookup/insert, and beat it 3.9x on delete. At lower load factors and larger sizes, abseil wins.
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
src/hybrid.zig (3)
339-362:⚠️ Potential issue | 🟠 MajorAdd cleanup guards for partial allocation failures in
init().If any allocation after
tier_startsfails,init()returns with the earlier buffers leaked. This constructor should free partial state on the error path.Possible fix
const tier_starts = try allocator.alloc(usize, num_tiers); + errdefer allocator.free(tier_starts); const tier_bucket_counts = try allocator.alloc(usize, num_tiers); + errdefer allocator.free(tier_bucket_counts); const tier_slot_counts = try allocator.alloc(usize, num_tiers); + errdefer allocator.free(tier_slot_counts); @@ const fingerprints = try allocator.alloc([BUCKET_SIZE]u8, total_buckets); + errdefer allocator.free(fingerprints); const keys = try allocator.alloc([BUCKET_SIZE]u64, total_buckets); + errdefer allocator.free(keys); const values = try allocator.alloc([BUCKET_SIZE]u64, total_buckets); + errdefer allocator.free(values);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/hybrid.zig` around lines 339 - 362, In init(), add cleanup for partial allocations so earlier buffers aren't leaked when a later alloc fails: after allocating tier_starts, tier_bucket_counts, tier_slot_counts, fingerprints, keys, and values (in that order) ensure any subsequent allocation failure frees all previously allocated buffers (preferably in reverse order) before returning the error; implement this by using defer or explicit frees tied to the same allocator inside pub fn init to release tier_starts, tier_bucket_counts, tier_slot_counts, fingerprints, keys, and values as appropriate so the function leaves no leaked memory on error.
553-574:⚠️ Potential issue | 🔴 CriticalKeep a multi-tier fallback in
get()andremove().Insertion still spills into lower tiers via
insertWithBatch()andinsertAnyTier(), but the public lookup/delete path now only probes tier 0. Once a key lands in tier 1+,get()/remove()returnnull/falseeven thoughgetWithProbes()can still find it.Also applies to: 576-591
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/hybrid.zig` around lines 553 - 574, get() (and similarly remove()) currently only probes tier 0 which causes lookups/deletes to miss keys spilled into lower tiers; change both methods to loop over all tiers instead of only using tier0_bucket_mask: for each tier index use that tier's bucket mask (e.g., self.tierN_bucket_mask or equivalent per-tier mask), compute bucket0 = h & mask and run the same probe loop (using findKeyInBucket and returning self.values[...] or false/null) up to MAX_PROBES for each tier; mirror the probe logic used in getWithProbes()/insertWithBatch()/insertAnyTier() so keys in tier1+ are discovered by get() and removed by remove().
496-507:⚠️ Potential issue | 🟠 MajorDon't let tombstone reuse inflate
tier_slot_counts.
remove()leavestier_slot_countsunchanged, and all three insert paths increment it again when they overwrite a tombstone. After enough delete/reinsert churn,getEmptyFraction()drifts toward “full” and batch selection stops reflecting actual reusable capacity.Possible fix
if (self.findEmptyOrTombstoneInBucket(abs_bucket_idx)) |slot| { + const was_tombstone = self.fingerprints[abs_bucket_idx][slot] == TOMBSTONE; self.insertAt(abs_bucket_idx, slot, key, value, fp); - self.tier_slot_counts[tier] += 1; + if (!was_tombstone) self.tier_slot_counts[tier] += 1; self.count += 1; return; }Apply the same guard in
insertIntoTier(),tryInsertWithLimit(), andinsertAnyTier().Also applies to: 514-525, 532-546, 576-587
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/hybrid.zig` around lines 496 - 507, The bug is that overwriting a tombstone currently increments tier_slot_counts and count, causing counts to drift after delete/reinsert churn; in insertIntoTier (and likewise in tryInsertWithLimit and insertAnyTier) only increment self.tier_slot_counts[tier] and self.count when the chosen slot was truly empty, not when it was a tombstone—so after findEmptyOrTombstoneInBucket(abs_bucket_idx) returns |slot|, detect whether that slot is a tombstone (use the existing tombstone check used by remove()/find functions or compare the slot marker used in the codebase) and only call self.insertAt(...) plus the increments when the slot was empty; if it was a tombstone, call insertAt without increasing tier_slot_counts/tier count, and apply the same guard in the other functions named above.
🧹 Nitpick comments (5)
insights.md (1)
10-27: Add language identifier to fenced code block.The pseudocode block lacks a language specifier. While it's custom notation, adding
textorplaintextsatisfies markdown linting and improves rendering consistency.📝 Suggested fix
-``` +```text get(key):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@insights.md` around lines 10 - 27, The fenced pseudocode block containing get(key) and findKeyInBucket should include a language identifier to satisfy markdown linting and improve rendering; open the code fence with ```text (or ```plaintext) instead of just ``` so the block starting with "get(key):" is labeled, keeping the existing content (including symbols like get(key), findKeyInBucket, tier0_bucket_mask, and MAX_PROBES) unchanged.src/autobench.zig (1)
88-91: Minor inconsistency:fetchRemoveresult not wrapped indoNotOptimizeAway.The hybrid
removecall (line 61) usesdoNotOptimizeAway, butfetchRemovehere doesn't. While this is unlikely to affect results (the removal still happens), wrapping it would be consistent.♻️ Suggested fix for consistency
for (0..fill / 2) |i| { - _ = map.fetchRemove(i); + std.mem.doNotOptimizeAway(map.fetchRemove(i)); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/autobench.zig` around lines 88 - 91, The timing loop calls map.fetchRemove(i) but doesn't protect the return value from compiler optimizations like the earlier remove call does; wrap the result in the same helper used elsewhere (doNotOptimizeAway) to ensure the value isn't optimized away. Locate the loop that iterates over fetchRemove (the map.fetchRemove call) and pass its result into doNotOptimizeAway just like the earlier remove path so behavior and measurement consistency are preserved.bench.sh (1)
14-14: Consider portability of-Pflag.The
grep -oPuses Perl-compatible regex which requires GNU grep. This may fail on macOS (BSD grep) or minimal Linux environments. Since this is an internal benchmark script, this is likely acceptable, but worth noting.♻️ Portable alternative using sed
-grep "RESULT" bench.log | grep "n=1048576" | grep -oP 'lookup_ratio=\K[0-9.]+' +grep "RESULT" bench.log | grep "n=1048576" | sed -n 's/.*lookup_ratio=\([0-9.]*\).*/\1/p'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bench.sh` at line 14, The grep invocation uses GNU-only Perl regex via "grep -oP" which breaks on BSD/macOS; update the bench log parsing to a portable form (for example use sed, awk, or grep -E with a capturing pattern) instead of "grep -oP 'lookup_ratio=\K[0-9.]+'". Locate the pipeline that begins with grep "RESULT" bench.log | grep "n=1048576" and replace the final "grep -oP" stage with a POSIX-compatible command (e.g., use sed to extract the value after "lookup_ratio=" or awk to print the matched field) so the script works on systems without GNU grep.program.md (1)
75-77: Add language identifier to fenced code block.The TSV format example lacks a language specifier. Adding
tsvortextsatisfies markdown linting.📝 Suggested fix
- ``` + ```tsv <commit_hash>\t<lookup_ratio_1M>\t<insert_ratio_1M>\t<delete_ratio_1M>\t<keep|revert|crash>\t<description> ```🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@program.md` around lines 75 - 77, Add a language identifier to the fenced code block for the TSV example so markdown linting passes: change the opening fence from ``` to include a language like `tsv` or `text` (the block containing "<commit_hash>\t<lookup_ratio_1M>\t<insert_ratio_1M>\t<delete_ratio_1M>\t<keep|revert|crash>\t<description>") so the block becomes ```tsv (or ```text).results.md (1)
292-312: Add language identifier to fenced code block.The before/after architecture comparison lacks a language specifier. Consider using
textordifffor better rendering.📝 Suggested fix
-``` +```text Before: hash: wyhash (128-bit multiply)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@results.md` around lines 292 - 312, Add a language identifier to the fenced code block containing the "Before/After" architecture comparison so the renderer highlights it correctly; locate the triple-backtick that opens the block around the listed lines (the block starting with "Before:" and ending after "insert probe cap: MAX_PROBES") and change it from ``` to ```text (or ```diff) so the entire block is tagged, leaving the closing ``` unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@insights.md`:
- Line 1: Update the header in insights.md that currently reads "Autoresearch
Insights (58 experiments)" to reflect the actual experiment count documented in
results.md by changing the number to 62 so the title becomes "Autoresearch
Insights (62 experiments)"; ensure the updated header string exactly matches the
format used ("# Autoresearch Insights (62 experiments)").
In `@src/hybrid.zig`:
- Line 8: insertions are being silently dropped when the new MAX_PROBES (const
MAX_PROBES = 8) cap is hit because insertIntoTier() and insertAnyTier() simply
return after exhausting the fixed probe count; change them to not silently bail:
remove the early return on probe exhaustion and instead continue scanning the
remainder of the target bucket region (or perform a full linear probe within
that tier) until you either find a free slot or confirm the tier is full, and
then return a clear error/flag indicating insert failure. Update both
insertIntoTier and insertAnyTier to use MAX_PROBES as the initial probe window
but fall back to scanning the rest of the tier (or return Result/Err) so no
inserts are dropped silently. Ensure callers handle the explicit error/flag
instead of assuming success.
---
Outside diff comments:
In `@src/hybrid.zig`:
- Around line 339-362: In init(), add cleanup for partial allocations so earlier
buffers aren't leaked when a later alloc fails: after allocating tier_starts,
tier_bucket_counts, tier_slot_counts, fingerprints, keys, and values (in that
order) ensure any subsequent allocation failure frees all previously allocated
buffers (preferably in reverse order) before returning the error; implement this
by using defer or explicit frees tied to the same allocator inside pub fn init
to release tier_starts, tier_bucket_counts, tier_slot_counts, fingerprints,
keys, and values as appropriate so the function leaves no leaked memory on
error.
- Around line 553-574: get() (and similarly remove()) currently only probes tier
0 which causes lookups/deletes to miss keys spilled into lower tiers; change
both methods to loop over all tiers instead of only using tier0_bucket_mask: for
each tier index use that tier's bucket mask (e.g., self.tierN_bucket_mask or
equivalent per-tier mask), compute bucket0 = h & mask and run the same probe
loop (using findKeyInBucket and returning self.values[...] or false/null) up to
MAX_PROBES for each tier; mirror the probe logic used in
getWithProbes()/insertWithBatch()/insertAnyTier() so keys in tier1+ are
discovered by get() and removed by remove().
- Around line 496-507: The bug is that overwriting a tombstone currently
increments tier_slot_counts and count, causing counts to drift after
delete/reinsert churn; in insertIntoTier (and likewise in tryInsertWithLimit and
insertAnyTier) only increment self.tier_slot_counts[tier] and self.count when
the chosen slot was truly empty, not when it was a tombstone—so after
findEmptyOrTombstoneInBucket(abs_bucket_idx) returns |slot|, detect whether that
slot is a tombstone (use the existing tombstone check used by remove()/find
functions or compare the slot marker used in the codebase) and only call
self.insertAt(...) plus the increments when the slot was empty; if it was a
tombstone, call insertAt without increasing tier_slot_counts/tier count, and
apply the same guard in the other functions named above.
---
Nitpick comments:
In `@bench.sh`:
- Line 14: The grep invocation uses GNU-only Perl regex via "grep -oP" which
breaks on BSD/macOS; update the bench log parsing to a portable form (for
example use sed, awk, or grep -E with a capturing pattern) instead of "grep -oP
'lookup_ratio=\K[0-9.]+'". Locate the pipeline that begins with grep "RESULT"
bench.log | grep "n=1048576" and replace the final "grep -oP" stage with a
POSIX-compatible command (e.g., use sed to extract the value after
"lookup_ratio=" or awk to print the matched field) so the script works on
systems without GNU grep.
In `@insights.md`:
- Around line 10-27: The fenced pseudocode block containing get(key) and
findKeyInBucket should include a language identifier to satisfy markdown linting
and improve rendering; open the code fence with ```text (or ```plaintext)
instead of just ``` so the block starting with "get(key):" is labeled, keeping
the existing content (including symbols like get(key), findKeyInBucket,
tier0_bucket_mask, and MAX_PROBES) unchanged.
In `@program.md`:
- Around line 75-77: Add a language identifier to the fenced code block for the
TSV example so markdown linting passes: change the opening fence from ``` to
include a language like `tsv` or `text` (the block containing
"<commit_hash>\t<lookup_ratio_1M>\t<insert_ratio_1M>\t<delete_ratio_1M>\t<keep|revert|crash>\t<description>")
so the block becomes ```tsv (or ```text).
In `@results.md`:
- Around line 292-312: Add a language identifier to the fenced code block
containing the "Before/After" architecture comparison so the renderer highlights
it correctly; locate the triple-backtick that opens the block around the listed
lines (the block starting with "Before:" and ending after "insert probe cap:
MAX_PROBES") and change it from ``` to ```text (or ```diff) so the entire block
is tagged, leaving the closing ``` unchanged.
In `@src/autobench.zig`:
- Around line 88-91: The timing loop calls map.fetchRemove(i) but doesn't
protect the return value from compiler optimizations like the earlier remove
call does; wrap the result in the same helper used elsewhere (doNotOptimizeAway)
to ensure the value isn't optimized away. Locate the loop that iterates over
fetchRemove (the map.fetchRemove call) and pass its result into
doNotOptimizeAway just like the earlier remove path so behavior and measurement
consistency are preserved.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c78c7460-d30a-4d28-9c07-8beabb15f222
⛔ Files ignored due to path filters (1)
results.tsvis excluded by!**/*.tsv
📒 Files selected for processing (9)
.gitignoreREADME.mdbench.shbuild.ziginsights.mdprogram.mdresults.mdsrc/autobench.zigsrc/hybrid.zig
| @@ -0,0 +1,56 @@ | |||
| # Autoresearch Insights (58 experiments) | |||
There was a problem hiding this comment.
Experiment count mismatch.
The title says "58 experiments" but results.md documents 62 experiments. Consider updating the title to match.
📝 Suggested fix
-# Autoresearch Insights (58 experiments)
+# Autoresearch Insights (62 experiments)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Autoresearch Insights (58 experiments) | |
| # Autoresearch Insights (62 experiments) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@insights.md` at line 1, Update the header in insights.md that currently reads
"Autoresearch Insights (58 experiments)" to reflect the actual experiment count
documented in results.md by changing the number to 62 so the title becomes
"Autoresearch Insights (62 experiments)"; ensure the updated header string
exactly matches the format used ("# Autoresearch Insights (62 experiments)").
|
|
||
| pub const BUCKET_SIZE = 16; | ||
| const MAX_PROBES = 100; | ||
| const MAX_PROBES = 8; |
There was a problem hiding this comment.
Don't silently lose inserts after the new 8-probe cap.
With MAX_PROBES now 8, insertIntoTier() and insertAnyTier() can exhaust their probe windows and just return. That drops the write without any signal, even when unprobed buckets still have capacity.
Also applies to: 496-512, 532-551
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/hybrid.zig` at line 8, insertions are being silently dropped when the new
MAX_PROBES (const MAX_PROBES = 8) cap is hit because insertIntoTier() and
insertAnyTier() simply return after exhausting the fixed probe count; change
them to not silently bail: remove the early return on probe exhaustion and
instead continue scanning the remainder of the target bucket region (or perform
a full linear probe within that tier) until you either find a free slot or
confirm the tier is full, and then return a clear error/flag indicating insert
failure. Update both insertIntoTier and insertAnyTier to use MAX_PROBES as the
initial probe window but fall back to scanning the rest of the tier (or return
Result/Err) so no inserts are dropped silently. Ensure callers handle the
explicit error/flag instead of assuming success.
With the matchEmpty miss optimization, elastic hash now ranks #1 on hit lookup, miss lookup, insert, and delete at 50% load against abseil, Rust hashbrown+ahash, Go swiss.Map, and Go builtin map. Abseil only wins on misses at 75%+ load where tier 0 fills up.
Why
Elastic hashing (based on the paper) was 4-8x faster than std.HashMap on insert at 99% load, but ~25% slower on lookup at 1M+ elements due to tier-jumping cache misses.
What
62 experiments optimizing
HybridElasticHash.get()and supporting functions. Key changes tosrc/hybrid.zig:vs Zig std.HashMap (1M elements, 99% load)
vs Google abseil flat_hash_map (1M elements, 99% load)
Rigorous comparison using correct APIs (emplace/find/erase), -O3 -march=native -DNDEBUG, 10 measured runs:
At lower load factors (10-75%), abseil wins on lookup and insert. The elastic hash advantage is specific to high load factors and delete-heavy workloads.
Full experiment log:
results.mdReferences
Summary by CodeRabbit
New Features
Performance Improvements
Documentation