Skip to content

Optimize lookup: 5x faster than std.HashMap at 1M elements#1

Open
joshuaisaact wants to merge 85 commits intomainfrom
autoresearch/lookup-optimization
Open

Optimize lookup: 5x faster than std.HashMap at 1M elements#1
joshuaisaact wants to merge 85 commits intomainfrom
autoresearch/lookup-optimization

Conversation

@joshuaisaact
Copy link
Copy Markdown
Owner

@joshuaisaact joshuaisaact commented Mar 23, 2026

Why

Elastic hashing (based on the paper) was 4-8x faster than std.HashMap on insert at 99% load, but ~25% slower on lookup at 1M+ elements due to tier-jumping cache misses.

What

62 experiments optimizing HybridElasticHash.get() and supporting functions. Key changes to src/hybrid.zig:

  • MAX_PROBES 100 -> 8, single-tier lookup, removed all prefetching
  • Linear probing, single-multiply hash, separated probe 0 from loop
  • Fingerprint bit selection to reduce FP/bucket correlation
  • Tuned batch threshold and tier sizing

vs Zig std.HashMap (1M elements, 99% load)

Operation Ratio (std / elastic)
Lookup 5.2x faster
Insert 3.1x faster
Delete 1.8x faster

vs Google abseil flat_hash_map (1M elements, 99% load)

Rigorous comparison using correct APIs (emplace/find/erase), -O3 -march=native -DNDEBUG, 10 measured runs:

Operation Elastic Hash (us) Abseil (us) Notes
Lookup 8,550-9,000 8,200-8,850 Tied (overlapping stddevs)
Insert 16,800 15,685 Abseil ~7% faster
Delete 2,100-2,200 8,200-8,400 Elastic 3.9x faster

At lower load factors (10-75%), abseil wins on lookup and insert. The elastic hash advantage is specific to high load factors and delete-heavy workloads.

Full experiment log: results.md

References

Summary by CodeRabbit

  • New Features

    • Added automated benchmarking infrastructure for consistent performance evaluation across different table sizes
  • Performance Improvements

    • Optimized hash table lookup at extreme load factors (99%); improved delete performance versus alternatives
  • Documentation

    • Updated README with refreshed benchmark methodology and results; added architectural insights and research documentation

- results.md: full writeup of all 62 experiments
- results.tsv: machine-readable experiment log
- insights.md: optimization patterns and architecture notes
- bench.sh: benchmark runner script
- program.md: autoresearch specification
- src/autobench.zig: benchmark harness (fixed for Zig 0.16 nightly)
- .gitignore + build.zig: minor updates for autobench support
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 23, 2026

📝 Walkthrough

Walkthrough

This pull request optimizes an elastic hash table implementation through architectural changes and adds comprehensive benchmarking infrastructure. Changes include reduced probe counts, cached tier-0 bucket metadata, updated hash and fingerprint functions, new benchmark harness, build integration, documentation of experimental results, and scripts for running performance evaluations.

Changes

Cohort / File(s) Summary
Build & Configuration
.gitignore, build.zig
Added ignored build artifacts (zig-out, bench.log). Integrated new autobench executable and build step into build graph.
Documentation
README.md, insights.md, program.md, results.md
Updated README with refreshed benchmark methodology and results versus absl::flat_hash_map. Added new docs: insights on lookup architecture (58 experiments), autonomous research agent workflow, and comprehensive experiment log with performance data (62 experiments).
Benchmarking Infrastructure
bench.sh, src/autobench.zig
Added bash script to build and run benchmarks, logging results to bench.log. Added harness comparing HybridElasticHash against std.HashMap across multiple sizes and fill factors, measuring insert/lookup/delete latency and reporting ratios.
Core Hash Table Optimization
src/hybrid.zig
Reduced MAX_PROBES from 100 to 8. Reworked tier-0 indexing with cached tier0_bucket_mask and tier0_bucket_count fields. Updated hash mixing and fingerprint derivation. Modified lookup and removal to probe only tier 0. Added SIMD fast path in findKeyInBucket. Adjusted insert threshold for tier transitions and probe capping.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~28 minutes

Poem

🐰 Hops of logic, swiftly bound,
Buckets cached, no probes around!
Eight not one hundred, fast we race,
SIMD dances—benchmarks embrace!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: a 5.2x lookup performance improvement over std.HashMap at 1M elements, which is the primary objective documented in the PR summary.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch autoresearch/lookup-optimization

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

The previous abseil numbers used operator[] (find-or-insert) instead of
emplace()/find(). The corrected comparison shows we roughly match abseil
at 99% load on lookup/insert, and beat it 3.9x on delete. At lower load
factors and larger sizes, abseil wins.
@joshuaisaact joshuaisaact marked this pull request as ready for review March 23, 2026 07:40
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
src/hybrid.zig (3)

339-362: ⚠️ Potential issue | 🟠 Major

Add cleanup guards for partial allocation failures in init().

If any allocation after tier_starts fails, init() returns with the earlier buffers leaked. This constructor should free partial state on the error path.

Possible fix
         const tier_starts = try allocator.alloc(usize, num_tiers);
+        errdefer allocator.free(tier_starts);
         const tier_bucket_counts = try allocator.alloc(usize, num_tiers);
+        errdefer allocator.free(tier_bucket_counts);
         const tier_slot_counts = try allocator.alloc(usize, num_tiers);
+        errdefer allocator.free(tier_slot_counts);
@@
         const fingerprints = try allocator.alloc([BUCKET_SIZE]u8, total_buckets);
+        errdefer allocator.free(fingerprints);
         const keys = try allocator.alloc([BUCKET_SIZE]u64, total_buckets);
+        errdefer allocator.free(keys);
         const values = try allocator.alloc([BUCKET_SIZE]u64, total_buckets);
+        errdefer allocator.free(values);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/hybrid.zig` around lines 339 - 362, In init(), add cleanup for partial
allocations so earlier buffers aren't leaked when a later alloc fails: after
allocating tier_starts, tier_bucket_counts, tier_slot_counts, fingerprints,
keys, and values (in that order) ensure any subsequent allocation failure frees
all previously allocated buffers (preferably in reverse order) before returning
the error; implement this by using defer or explicit frees tied to the same
allocator inside pub fn init to release tier_starts, tier_bucket_counts,
tier_slot_counts, fingerprints, keys, and values as appropriate so the function
leaves no leaked memory on error.

553-574: ⚠️ Potential issue | 🔴 Critical

Keep a multi-tier fallback in get() and remove().

Insertion still spills into lower tiers via insertWithBatch() and insertAnyTier(), but the public lookup/delete path now only probes tier 0. Once a key lands in tier 1+, get()/remove() return null/false even though getWithProbes() can still find it.

Also applies to: 576-591

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/hybrid.zig` around lines 553 - 574, get() (and similarly remove())
currently only probes tier 0 which causes lookups/deletes to miss keys spilled
into lower tiers; change both methods to loop over all tiers instead of only
using tier0_bucket_mask: for each tier index use that tier's bucket mask (e.g.,
self.tierN_bucket_mask or equivalent per-tier mask), compute bucket0 = h & mask
and run the same probe loop (using findKeyInBucket and returning
self.values[...] or false/null) up to MAX_PROBES for each tier; mirror the probe
logic used in getWithProbes()/insertWithBatch()/insertAnyTier() so keys in
tier1+ are discovered by get() and removed by remove().

496-507: ⚠️ Potential issue | 🟠 Major

Don't let tombstone reuse inflate tier_slot_counts.

remove() leaves tier_slot_counts unchanged, and all three insert paths increment it again when they overwrite a tombstone. After enough delete/reinsert churn, getEmptyFraction() drifts toward “full” and batch selection stops reflecting actual reusable capacity.

Possible fix
             if (self.findEmptyOrTombstoneInBucket(abs_bucket_idx)) |slot| {
+                const was_tombstone = self.fingerprints[abs_bucket_idx][slot] == TOMBSTONE;
                 self.insertAt(abs_bucket_idx, slot, key, value, fp);
-                self.tier_slot_counts[tier] += 1;
+                if (!was_tombstone) self.tier_slot_counts[tier] += 1;
                 self.count += 1;
                 return;
             }

Apply the same guard in insertIntoTier(), tryInsertWithLimit(), and insertAnyTier().

Also applies to: 514-525, 532-546, 576-587

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/hybrid.zig` around lines 496 - 507, The bug is that overwriting a
tombstone currently increments tier_slot_counts and count, causing counts to
drift after delete/reinsert churn; in insertIntoTier (and likewise in
tryInsertWithLimit and insertAnyTier) only increment self.tier_slot_counts[tier]
and self.count when the chosen slot was truly empty, not when it was a
tombstone—so after findEmptyOrTombstoneInBucket(abs_bucket_idx) returns |slot|,
detect whether that slot is a tombstone (use the existing tombstone check used
by remove()/find functions or compare the slot marker used in the codebase) and
only call self.insertAt(...) plus the increments when the slot was empty; if it
was a tombstone, call insertAt without increasing tier_slot_counts/tier count,
and apply the same guard in the other functions named above.
🧹 Nitpick comments (5)
insights.md (1)

10-27: Add language identifier to fenced code block.

The pseudocode block lacks a language specifier. While it's custom notation, adding text or plaintext satisfies markdown linting and improves rendering consistency.

📝 Suggested fix
-```
+```text
 get(key):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@insights.md` around lines 10 - 27, The fenced pseudocode block containing
get(key) and findKeyInBucket should include a language identifier to satisfy
markdown linting and improve rendering; open the code fence with ```text (or
```plaintext) instead of just ``` so the block starting with "get(key):" is
labeled, keeping the existing content (including symbols like get(key),
findKeyInBucket, tier0_bucket_mask, and MAX_PROBES) unchanged.
src/autobench.zig (1)

88-91: Minor inconsistency: fetchRemove result not wrapped in doNotOptimizeAway.

The hybrid remove call (line 61) uses doNotOptimizeAway, but fetchRemove here doesn't. While this is unlikely to affect results (the removal still happens), wrapping it would be consistent.

♻️ Suggested fix for consistency
 for (0..fill / 2) |i| {
-    _ = map.fetchRemove(i);
+    std.mem.doNotOptimizeAway(map.fetchRemove(i));
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/autobench.zig` around lines 88 - 91, The timing loop calls
map.fetchRemove(i) but doesn't protect the return value from compiler
optimizations like the earlier remove call does; wrap the result in the same
helper used elsewhere (doNotOptimizeAway) to ensure the value isn't optimized
away. Locate the loop that iterates over fetchRemove (the map.fetchRemove call)
and pass its result into doNotOptimizeAway just like the earlier remove path so
behavior and measurement consistency are preserved.
bench.sh (1)

14-14: Consider portability of -P flag.

The grep -oP uses Perl-compatible regex which requires GNU grep. This may fail on macOS (BSD grep) or minimal Linux environments. Since this is an internal benchmark script, this is likely acceptable, but worth noting.

♻️ Portable alternative using sed
-grep "RESULT" bench.log | grep "n=1048576" | grep -oP 'lookup_ratio=\K[0-9.]+'
+grep "RESULT" bench.log | grep "n=1048576" | sed -n 's/.*lookup_ratio=\([0-9.]*\).*/\1/p'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bench.sh` at line 14, The grep invocation uses GNU-only Perl regex via "grep
-oP" which breaks on BSD/macOS; update the bench log parsing to a portable form
(for example use sed, awk, or grep -E with a capturing pattern) instead of "grep
-oP 'lookup_ratio=\K[0-9.]+'". Locate the pipeline that begins with grep
"RESULT" bench.log | grep "n=1048576" and replace the final "grep -oP" stage
with a POSIX-compatible command (e.g., use sed to extract the value after
"lookup_ratio=" or awk to print the matched field) so the script works on
systems without GNU grep.
program.md (1)

75-77: Add language identifier to fenced code block.

The TSV format example lacks a language specifier. Adding tsv or text satisfies markdown linting.

📝 Suggested fix
-   ```
+   ```tsv
    <commit_hash>\t<lookup_ratio_1M>\t<insert_ratio_1M>\t<delete_ratio_1M>\t<keep|revert|crash>\t<description>
    ```
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@program.md` around lines 75 - 77, Add a language identifier to the fenced
code block for the TSV example so markdown linting passes: change the opening
fence from ``` to include a language like `tsv` or `text` (the block containing
"<commit_hash>\t<lookup_ratio_1M>\t<insert_ratio_1M>\t<delete_ratio_1M>\t<keep|revert|crash>\t<description>")
so the block becomes ```tsv (or ```text).
results.md (1)

292-312: Add language identifier to fenced code block.

The before/after architecture comparison lacks a language specifier. Consider using text or diff for better rendering.

📝 Suggested fix
-```
+```text
 Before:
   hash: wyhash (128-bit multiply)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@results.md` around lines 292 - 312, Add a language identifier to the fenced
code block containing the "Before/After" architecture comparison so the renderer
highlights it correctly; locate the triple-backtick that opens the block around
the listed lines (the block starting with "Before:" and ending after "insert
probe cap: MAX_PROBES") and change it from ``` to ```text (or ```diff) so the
entire block is tagged, leaving the closing ``` unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@insights.md`:
- Line 1: Update the header in insights.md that currently reads "Autoresearch
Insights (58 experiments)" to reflect the actual experiment count documented in
results.md by changing the number to 62 so the title becomes "Autoresearch
Insights (62 experiments)"; ensure the updated header string exactly matches the
format used ("# Autoresearch Insights (62 experiments)").

In `@src/hybrid.zig`:
- Line 8: insertions are being silently dropped when the new MAX_PROBES (const
MAX_PROBES = 8) cap is hit because insertIntoTier() and insertAnyTier() simply
return after exhausting the fixed probe count; change them to not silently bail:
remove the early return on probe exhaustion and instead continue scanning the
remainder of the target bucket region (or perform a full linear probe within
that tier) until you either find a free slot or confirm the tier is full, and
then return a clear error/flag indicating insert failure. Update both
insertIntoTier and insertAnyTier to use MAX_PROBES as the initial probe window
but fall back to scanning the rest of the tier (or return Result/Err) so no
inserts are dropped silently. Ensure callers handle the explicit error/flag
instead of assuming success.

---

Outside diff comments:
In `@src/hybrid.zig`:
- Around line 339-362: In init(), add cleanup for partial allocations so earlier
buffers aren't leaked when a later alloc fails: after allocating tier_starts,
tier_bucket_counts, tier_slot_counts, fingerprints, keys, and values (in that
order) ensure any subsequent allocation failure frees all previously allocated
buffers (preferably in reverse order) before returning the error; implement this
by using defer or explicit frees tied to the same allocator inside pub fn init
to release tier_starts, tier_bucket_counts, tier_slot_counts, fingerprints,
keys, and values as appropriate so the function leaves no leaked memory on
error.
- Around line 553-574: get() (and similarly remove()) currently only probes tier
0 which causes lookups/deletes to miss keys spilled into lower tiers; change
both methods to loop over all tiers instead of only using tier0_bucket_mask: for
each tier index use that tier's bucket mask (e.g., self.tierN_bucket_mask or
equivalent per-tier mask), compute bucket0 = h & mask and run the same probe
loop (using findKeyInBucket and returning self.values[...] or false/null) up to
MAX_PROBES for each tier; mirror the probe logic used in
getWithProbes()/insertWithBatch()/insertAnyTier() so keys in tier1+ are
discovered by get() and removed by remove().
- Around line 496-507: The bug is that overwriting a tombstone currently
increments tier_slot_counts and count, causing counts to drift after
delete/reinsert churn; in insertIntoTier (and likewise in tryInsertWithLimit and
insertAnyTier) only increment self.tier_slot_counts[tier] and self.count when
the chosen slot was truly empty, not when it was a tombstone—so after
findEmptyOrTombstoneInBucket(abs_bucket_idx) returns |slot|, detect whether that
slot is a tombstone (use the existing tombstone check used by remove()/find
functions or compare the slot marker used in the codebase) and only call
self.insertAt(...) plus the increments when the slot was empty; if it was a
tombstone, call insertAt without increasing tier_slot_counts/tier count, and
apply the same guard in the other functions named above.

---

Nitpick comments:
In `@bench.sh`:
- Line 14: The grep invocation uses GNU-only Perl regex via "grep -oP" which
breaks on BSD/macOS; update the bench log parsing to a portable form (for
example use sed, awk, or grep -E with a capturing pattern) instead of "grep -oP
'lookup_ratio=\K[0-9.]+'". Locate the pipeline that begins with grep "RESULT"
bench.log | grep "n=1048576" and replace the final "grep -oP" stage with a
POSIX-compatible command (e.g., use sed to extract the value after
"lookup_ratio=" or awk to print the matched field) so the script works on
systems without GNU grep.

In `@insights.md`:
- Around line 10-27: The fenced pseudocode block containing get(key) and
findKeyInBucket should include a language identifier to satisfy markdown linting
and improve rendering; open the code fence with ```text (or ```plaintext)
instead of just ``` so the block starting with "get(key):" is labeled, keeping
the existing content (including symbols like get(key), findKeyInBucket,
tier0_bucket_mask, and MAX_PROBES) unchanged.

In `@program.md`:
- Around line 75-77: Add a language identifier to the fenced code block for the
TSV example so markdown linting passes: change the opening fence from ``` to
include a language like `tsv` or `text` (the block containing
"<commit_hash>\t<lookup_ratio_1M>\t<insert_ratio_1M>\t<delete_ratio_1M>\t<keep|revert|crash>\t<description>")
so the block becomes ```tsv (or ```text).

In `@results.md`:
- Around line 292-312: Add a language identifier to the fenced code block
containing the "Before/After" architecture comparison so the renderer highlights
it correctly; locate the triple-backtick that opens the block around the listed
lines (the block starting with "Before:" and ending after "insert probe cap:
MAX_PROBES") and change it from ``` to ```text (or ```diff) so the entire block
is tagged, leaving the closing ``` unchanged.

In `@src/autobench.zig`:
- Around line 88-91: The timing loop calls map.fetchRemove(i) but doesn't
protect the return value from compiler optimizations like the earlier remove
call does; wrap the result in the same helper used elsewhere (doNotOptimizeAway)
to ensure the value isn't optimized away. Locate the loop that iterates over
fetchRemove (the map.fetchRemove call) and pass its result into
doNotOptimizeAway just like the earlier remove path so behavior and measurement
consistency are preserved.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c78c7460-d30a-4d28-9c07-8beabb15f222

📥 Commits

Reviewing files that changed from the base of the PR and between 3e37318 and 1393871.

⛔ Files ignored due to path filters (1)
  • results.tsv is excluded by !**/*.tsv
📒 Files selected for processing (9)
  • .gitignore
  • README.md
  • bench.sh
  • build.zig
  • insights.md
  • program.md
  • results.md
  • src/autobench.zig
  • src/hybrid.zig

@@ -0,0 +1,56 @@
# Autoresearch Insights (58 experiments)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Experiment count mismatch.

The title says "58 experiments" but results.md documents 62 experiments. Consider updating the title to match.

📝 Suggested fix
-# Autoresearch Insights (58 experiments)
+# Autoresearch Insights (62 experiments)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Autoresearch Insights (58 experiments)
# Autoresearch Insights (62 experiments)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@insights.md` at line 1, Update the header in insights.md that currently reads
"Autoresearch Insights (58 experiments)" to reflect the actual experiment count
documented in results.md by changing the number to 62 so the title becomes
"Autoresearch Insights (62 experiments)"; ensure the updated header string
exactly matches the format used ("# Autoresearch Insights (62 experiments)").


pub const BUCKET_SIZE = 16;
const MAX_PROBES = 100;
const MAX_PROBES = 8;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Don't silently lose inserts after the new 8-probe cap.

With MAX_PROBES now 8, insertIntoTier() and insertAnyTier() can exhaust their probe windows and just return. That drops the write without any signal, even when unprobed buckets still have capacity.

Also applies to: 496-512, 532-551

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/hybrid.zig` at line 8, insertions are being silently dropped when the new
MAX_PROBES (const MAX_PROBES = 8) cap is hit because insertIntoTier() and
insertAnyTier() simply return after exhausting the fixed probe count; change
them to not silently bail: remove the early return on probe exhaustion and
instead continue scanning the remainder of the target bucket region (or perform
a full linear probe within that tier) until you either find a free slot or
confirm the tier is full, and then return a clear error/flag indicating insert
failure. Update both insertIntoTier and insertAnyTier to use MAX_PROBES as the
initial probe window but fall back to scanning the rest of the tier (or return
Result/Err) so no inserts are dropped silently. Ensure callers handle the
explicit error/flag instead of assuming success.

joshuaisaact added a commit that referenced this pull request Mar 24, 2026
With the matchEmpty miss optimization, elastic hash now ranks #1 on
hit lookup, miss lookup, insert, and delete at 50% load against
abseil, Rust hashbrown+ahash, Go swiss.Map, and Go builtin map.
Abseil only wins on misses at 75%+ load where tier 0 fills up.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant