Optimize lookup: 5x faster than std.HashMap at 1M elements by joshuaisaact · Pull Request #1 · joshuaisaact/elastic-hash

joshuaisaact · 2026-03-23T07:28:35Z

Why

Elastic hashing (based on the paper) was 4-8x faster than std.HashMap on insert at 99% load, but ~25% slower on lookup at 1M+ elements due to tier-jumping cache misses.

What

62 experiments optimizing HybridElasticHash.get() and supporting functions. Key changes to src/hybrid.zig:

MAX_PROBES 100 -> 8, single-tier lookup, removed all prefetching
Linear probing, single-multiply hash, separated probe 0 from loop
Fingerprint bit selection to reduce FP/bucket correlation
Tuned batch threshold and tier sizing

vs Zig std.HashMap (1M elements, 99% load)

Operation	Ratio (std / elastic)
Lookup	5.2x faster
Insert	3.1x faster
Delete	1.8x faster

vs Google abseil flat_hash_map (1M elements, 99% load)

Rigorous comparison using correct APIs (emplace/find/erase), -O3 -march=native -DNDEBUG, 10 measured runs:

Operation	Elastic Hash (us)	Abseil (us)	Notes
Lookup	8,550-9,000	8,200-8,850	Tied (overlapping stddevs)
Insert	16,800	15,685	Abseil ~7% faster
Delete	2,100-2,200	8,200-8,400	Elastic 3.9x faster

At lower load factors (10-75%), abseil wins on lookup and insert. The elastic hash advantage is specific to high load factors and delete-heavy workloads.

Full experiment log: results.md

References

Elastic Hashing paper

Summary by CodeRabbit

New Features
- Added automated benchmarking infrastructure for consistent performance evaluation across different table sizes
Performance Improvements
- Optimized hash table lookup at extreme load factors (99%); improved delete performance versus alternatives
Documentation
- Updated README with refreshed benchmark methodology and results; added architectural insights and research documentation

This reverts commit a4a2c7a.

This reverts commit 330ddb6.

This reverts commit 18b4ce8.

This reverts commit d42e9ee.

This reverts commit 9530f6c.

This reverts commit 907f4f8.

This reverts commit 6e5b3dc.

This reverts commit 5f62379.

This reverts commit f7328f9.

This reverts commit 8c9b749.

This reverts commit 740bc74.

This reverts commit bfd53c3.

This reverts commit 37c7627.

This reverts commit bb13814.

This reverts commit 53ffd0a.

This reverts commit e22f93b.

This reverts commit 6600b40.

This reverts commit 4e2f019.

- results.md: full writeup of all 62 experiments - results.tsv: machine-readable experiment log - insights.md: optimization patterns and architecture notes - bench.sh: benchmark runner script - program.md: autoresearch specification - src/autobench.zig: benchmark harness (fixed for Zig 0.16 nightly) - .gitignore + build.zig: minor updates for autobench support

coderabbitai · 2026-03-23T07:28:42Z

📝 Walkthrough

Walkthrough

This pull request optimizes an elastic hash table implementation through architectural changes and adds comprehensive benchmarking infrastructure. Changes include reduced probe counts, cached tier-0 bucket metadata, updated hash and fingerprint functions, new benchmark harness, build integration, documentation of experimental results, and scripts for running performance evaluations.

Changes

Cohort / File(s)	Summary
Build & Configuration `.gitignore`, `build.zig`	Added ignored build artifacts (`zig-out`, `bench.log`). Integrated new `autobench` executable and build step into build graph.
Documentation `README.md`, `insights.md`, `program.md`, `results.md`	Updated README with refreshed benchmark methodology and results versus `absl::flat_hash_map`. Added new docs: insights on lookup architecture (58 experiments), autonomous research agent workflow, and comprehensive experiment log with performance data (62 experiments).
Benchmarking Infrastructure `bench.sh`, `src/autobench.zig`	Added bash script to build and run benchmarks, logging results to `bench.log`. Added harness comparing `HybridElasticHash` against `std.HashMap` across multiple sizes and fill factors, measuring insert/lookup/delete latency and reporting ratios.
Core Hash Table Optimization `src/hybrid.zig`	Reduced `MAX_PROBES` from 100 to 8. Reworked tier-0 indexing with cached `tier0_bucket_mask` and `tier0_bucket_count` fields. Updated hash mixing and fingerprint derivation. Modified lookup and removal to probe only tier 0. Added SIMD fast path in `findKeyInBucket`. Adjusted insert threshold for tier transitions and probe capping.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~28 minutes

Poem

🐰 Hops of logic, swiftly bound,
Buckets cached, no probes around!
Eight not one hundred, fast we race,
SIMD dances—benchmarks embrace! ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: a 5.2x lookup performance improvement over std.HashMap at 1M elements, which is the primary objective documented in the PR summary.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch autoresearch/lookup-optimization

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

The previous abseil numbers used operator[] (find-or-insert) instead of emplace()/find(). The corrected comparison shows we roughly match abseil at 99% load on lookup/insert, and beat it 3.9x on delete. At lower load factors and larger sizes, abseil wins.

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

src/hybrid.zig (3)

339-362: ⚠️ Potential issue | 🟠 Major

Add cleanup guards for partial allocation failures in init().

If any allocation after tier_starts fails, init() returns with the earlier buffers leaked. This constructor should free partial state on the error path.

Possible fix

         const tier_starts = try allocator.alloc(usize, num_tiers);
+        errdefer allocator.free(tier_starts);
         const tier_bucket_counts = try allocator.alloc(usize, num_tiers);
+        errdefer allocator.free(tier_bucket_counts);
         const tier_slot_counts = try allocator.alloc(usize, num_tiers);
+        errdefer allocator.free(tier_slot_counts);
@@
         const fingerprints = try allocator.alloc([BUCKET_SIZE]u8, total_buckets);
+        errdefer allocator.free(fingerprints);
         const keys = try allocator.alloc([BUCKET_SIZE]u64, total_buckets);
+        errdefer allocator.free(keys);
         const values = try allocator.alloc([BUCKET_SIZE]u64, total_buckets);
+        errdefer allocator.free(values);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/hybrid.zig` around lines 339 - 362, In init(), add cleanup for partial
allocations so earlier buffers aren't leaked when a later alloc fails: after
allocating tier_starts, tier_bucket_counts, tier_slot_counts, fingerprints,
keys, and values (in that order) ensure any subsequent allocation failure frees
all previously allocated buffers (preferably in reverse order) before returning
the error; implement this by using defer or explicit frees tied to the same
allocator inside pub fn init to release tier_starts, tier_bucket_counts,
tier_slot_counts, fingerprints, keys, and values as appropriate so the function
leaves no leaked memory on error.

553-574: ⚠️ Potential issue | 🔴 Critical

Keep a multi-tier fallback in get() and remove().

Insertion still spills into lower tiers via insertWithBatch() and insertAnyTier(), but the public lookup/delete path now only probes tier 0. Once a key lands in tier 1+, get()/remove() return null/false even though getWithProbes() can still find it.

Also applies to: 576-591

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/hybrid.zig` around lines 553 - 574, get() (and similarly remove())
currently only probes tier 0 which causes lookups/deletes to miss keys spilled
into lower tiers; change both methods to loop over all tiers instead of only
using tier0_bucket_mask: for each tier index use that tier's bucket mask (e.g.,
self.tierN_bucket_mask or equivalent per-tier mask), compute bucket0 = h & mask
and run the same probe loop (using findKeyInBucket and returning
self.values[...] or false/null) up to MAX_PROBES for each tier; mirror the probe
logic used in getWithProbes()/insertWithBatch()/insertAnyTier() so keys in
tier1+ are discovered by get() and removed by remove().

496-507: ⚠️ Potential issue | 🟠 Major

Don't let tombstone reuse inflate tier_slot_counts.

remove() leaves tier_slot_counts unchanged, and all three insert paths increment it again when they overwrite a tombstone. After enough delete/reinsert churn, getEmptyFraction() drifts toward “full” and batch selection stops reflecting actual reusable capacity.

Possible fix

             if (self.findEmptyOrTombstoneInBucket(abs_bucket_idx)) |slot| {
+                const was_tombstone = self.fingerprints[abs_bucket_idx][slot] == TOMBSTONE;
                 self.insertAt(abs_bucket_idx, slot, key, value, fp);
-                self.tier_slot_counts[tier] += 1;
+                if (!was_tombstone) self.tier_slot_counts[tier] += 1;
                 self.count += 1;
                 return;
             }

Apply the same guard in insertIntoTier(), tryInsertWithLimit(), and insertAnyTier().

Also applies to: 514-525, 532-546, 576-587

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/hybrid.zig` around lines 496 - 507, The bug is that overwriting a
tombstone currently increments tier_slot_counts and count, causing counts to
drift after delete/reinsert churn; in insertIntoTier (and likewise in
tryInsertWithLimit and insertAnyTier) only increment self.tier_slot_counts[tier]
and self.count when the chosen slot was truly empty, not when it was a
tombstone—so after findEmptyOrTombstoneInBucket(abs_bucket_idx) returns |slot|,
detect whether that slot is a tombstone (use the existing tombstone check used
by remove()/find functions or compare the slot marker used in the codebase) and
only call self.insertAt(...) plus the increments when the slot was empty; if it
was a tombstone, call insertAt without increasing tier_slot_counts/tier count,
and apply the same guard in the other functions named above.

🧹 Nitpick comments (5)

insights.md (1)

10-27: Add language identifier to fenced code block.

The pseudocode block lacks a language specifier. While it's custom notation, adding text or plaintext satisfies markdown linting and improves rendering consistency.
📝 Suggested fix
-```
+```text
 get(key):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@insights.md` around lines 10 - 27, The fenced pseudocode block containing
get(key) and findKeyInBucket should include a language identifier to satisfy
markdown linting and improve rendering; open the code fence with ```text (or
```plaintext) instead of just ``` so the block starting with "get(key):" is
labeled, keeping the existing content (including symbols like get(key),
findKeyInBucket, tier0_bucket_mask, and MAX_PROBES) unchanged.

src/autobench.zig (1)

88-91: Minor inconsistency: fetchRemove result not wrapped in doNotOptimizeAway.

The hybrid remove call (line 61) uses doNotOptimizeAway, but fetchRemove here doesn't. While this is unlikely to affect results (the removal still happens), wrapping it would be consistent.
♻️ Suggested fix for consistency
 for (0..fill / 2) |i| {
-    _ = map.fetchRemove(i);
+    std.mem.doNotOptimizeAway(map.fetchRemove(i));
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/autobench.zig` around lines 88 - 91, The timing loop calls
map.fetchRemove(i) but doesn't protect the return value from compiler
optimizations like the earlier remove call does; wrap the result in the same
helper used elsewhere (doNotOptimizeAway) to ensure the value isn't optimized
away. Locate the loop that iterates over fetchRemove (the map.fetchRemove call)
and pass its result into doNotOptimizeAway just like the earlier remove path so
behavior and measurement consistency are preserved.

bench.sh (1)

14-14: Consider portability of -P flag.

The grep -oP uses Perl-compatible regex which requires GNU grep. This may fail on macOS (BSD grep) or minimal Linux environments. Since this is an internal benchmark script, this is likely acceptable, but worth noting.

♻️ Portable alternative using sed

-grep "RESULT" bench.log | grep "n=1048576" | grep -oP 'lookup_ratio=\K[0-9.]+'
+grep "RESULT" bench.log | grep "n=1048576" | sed -n 's/.*lookup_ratio=\([0-9.]*\).*/\1/p'

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@bench.sh` at line 14, The grep invocation uses GNU-only Perl regex via "grep
-oP" which breaks on BSD/macOS; update the bench log parsing to a portable form
(for example use sed, awk, or grep -E with a capturing pattern) instead of "grep
-oP 'lookup_ratio=\K[0-9.]+'". Locate the pipeline that begins with grep
"RESULT" bench.log | grep "n=1048576" and replace the final "grep -oP" stage
with a POSIX-compatible command (e.g., use sed to extract the value after
"lookup_ratio=" or awk to print the matched field) so the script works on
systems without GNU grep.

program.md (1)

75-77: Add language identifier to fenced code block.

The TSV format example lacks a language specifier. Adding tsv or text satisfies markdown linting.

📝 Suggested fix

-   ```
+   ```tsv
    <commit_hash>\t<lookup_ratio_1M>\t<insert_ratio_1M>\t<delete_ratio_1M>\t<keep|revert|crash>\t<description>
    ```

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@program.md` around lines 75 - 77, Add a language identifier to the fenced
code block for the TSV example so markdown linting passes: change the opening
fence from ``` to include a language like `tsv` or `text` (the block containing
"<commit_hash>\t<lookup_ratio_1M>\t<insert_ratio_1M>\t<delete_ratio_1M>\t<keep|revert|crash>\t<description>")
so the block becomes ```tsv (or ```text).

results.md (1)

292-312: Add language identifier to fenced code block.

The before/after architecture comparison lacks a language specifier. Consider using text or diff for better rendering.

📝 Suggested fix

-```
+```text
 Before:
   hash: wyhash (128-bit multiply)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@results.md` around lines 292 - 312, Add a language identifier to the fenced
code block containing the "Before/After" architecture comparison so the renderer
highlights it correctly; locate the triple-backtick that opens the block around
the listed lines (the block starting with "Before:" and ending after "insert
probe cap: MAX_PROBES") and change it from ``` to ```text (or ```diff) so the
entire block is tagged, leaving the closing ``` unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@insights.md`:
- Line 1: Update the header in insights.md that currently reads "Autoresearch
Insights (58 experiments)" to reflect the actual experiment count documented in
results.md by changing the number to 62 so the title becomes "Autoresearch
Insights (62 experiments)"; ensure the updated header string exactly matches the
format used ("# Autoresearch Insights (62 experiments)").

In `@src/hybrid.zig`:
- Line 8: insertions are being silently dropped when the new MAX_PROBES (const
MAX_PROBES = 8) cap is hit because insertIntoTier() and insertAnyTier() simply
return after exhausting the fixed probe count; change them to not silently bail:
remove the early return on probe exhaustion and instead continue scanning the
remainder of the target bucket region (or perform a full linear probe within
that tier) until you either find a free slot or confirm the tier is full, and
then return a clear error/flag indicating insert failure. Update both
insertIntoTier and insertAnyTier to use MAX_PROBES as the initial probe window
but fall back to scanning the rest of the tier (or return Result/Err) so no
inserts are dropped silently. Ensure callers handle the explicit error/flag
instead of assuming success.

---

Outside diff comments:
In `@src/hybrid.zig`:
- Around line 339-362: In init(), add cleanup for partial allocations so earlier
buffers aren't leaked when a later alloc fails: after allocating tier_starts,
tier_bucket_counts, tier_slot_counts, fingerprints, keys, and values (in that
order) ensure any subsequent allocation failure frees all previously allocated
buffers (preferably in reverse order) before returning the error; implement this
by using defer or explicit frees tied to the same allocator inside pub fn init
to release tier_starts, tier_bucket_counts, tier_slot_counts, fingerprints,
keys, and values as appropriate so the function leaves no leaked memory on
error.
- Around line 553-574: get() (and similarly remove()) currently only probes tier
0 which causes lookups/deletes to miss keys spilled into lower tiers; change
both methods to loop over all tiers instead of only using tier0_bucket_mask: for
each tier index use that tier's bucket mask (e.g., self.tierN_bucket_mask or
equivalent per-tier mask), compute bucket0 = h & mask and run the same probe
loop (using findKeyInBucket and returning self.values[...] or false/null) up to
MAX_PROBES for each tier; mirror the probe logic used in
getWithProbes()/insertWithBatch()/insertAnyTier() so keys in tier1+ are
discovered by get() and removed by remove().
- Around line 496-507: The bug is that overwriting a tombstone currently
increments tier_slot_counts and count, causing counts to drift after
delete/reinsert churn; in insertIntoTier (and likewise in tryInsertWithLimit and
insertAnyTier) only increment self.tier_slot_counts[tier] and self.count when
the chosen slot was truly empty, not when it was a tombstone—so after
findEmptyOrTombstoneInBucket(abs_bucket_idx) returns |slot|, detect whether that
slot is a tombstone (use the existing tombstone check used by remove()/find
functions or compare the slot marker used in the codebase) and only call
self.insertAt(...) plus the increments when the slot was empty; if it was a
tombstone, call insertAt without increasing tier_slot_counts/tier count, and
apply the same guard in the other functions named above.

---

Nitpick comments:
In `@bench.sh`:
- Line 14: The grep invocation uses GNU-only Perl regex via "grep -oP" which
breaks on BSD/macOS; update the bench log parsing to a portable form (for
example use sed, awk, or grep -E with a capturing pattern) instead of "grep -oP
'lookup_ratio=\K[0-9.]+'". Locate the pipeline that begins with grep "RESULT"
bench.log | grep "n=1048576" and replace the final "grep -oP" stage with a
POSIX-compatible command (e.g., use sed to extract the value after
"lookup_ratio=" or awk to print the matched field) so the script works on
systems without GNU grep.

In `@insights.md`:
- Around line 10-27: The fenced pseudocode block containing get(key) and
findKeyInBucket should include a language identifier to satisfy markdown linting
and improve rendering; open the code fence with ```text (or ```plaintext)
instead of just ``` so the block starting with "get(key):" is labeled, keeping
the existing content (including symbols like get(key), findKeyInBucket,
tier0_bucket_mask, and MAX_PROBES) unchanged.

In `@program.md`:
- Around line 75-77: Add a language identifier to the fenced code block for the
TSV example so markdown linting passes: change the opening fence from ``` to
include a language like `tsv` or `text` (the block containing
"<commit_hash>\t<lookup_ratio_1M>\t<insert_ratio_1M>\t<delete_ratio_1M>\t<keep|revert|crash>\t<description>")
so the block becomes ```tsv (or ```text).

In `@results.md`:
- Around line 292-312: Add a language identifier to the fenced code block
containing the "Before/After" architecture comparison so the renderer highlights
it correctly; locate the triple-backtick that opens the block around the listed
lines (the block starting with "Before:" and ending after "insert probe cap:
MAX_PROBES") and change it from ``` to ```text (or ```diff) so the entire block
is tagged, leaving the closing ``` unchanged.

In `@src/autobench.zig`:
- Around line 88-91: The timing loop calls map.fetchRemove(i) but doesn't
protect the return value from compiler optimizations like the earlier remove
call does; wrap the result in the same helper used elsewhere (doNotOptimizeAway)
to ensure the value isn't optimized away. Locate the loop that iterates over
fetchRemove (the map.fetchRemove call) and pass its result into
doNotOptimizeAway just like the earlier remove path so behavior and measurement
consistency are preserved.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c78c7460-d30a-4d28-9c07-8beabb15f222

📥 Commits

Reviewing files that changed from the base of the PR and between 3e37318 and 1393871.

⛔ Files ignored due to path filters (1)

results.tsv is excluded by !**/*.tsv

📒 Files selected for processing (9)

.gitignore
README.md
bench.sh
build.zig
insights.md
program.md
results.md
src/autobench.zig
src/hybrid.zig

coderabbitai · 2026-03-23T07:50:36Z

insights.md

@@ -0,0 +1,56 @@
+# Autoresearch Insights (58 experiments)


⚠️ Potential issue | 🟡 Minor

Experiment count mismatch.

The title says "58 experiments" but results.md documents 62 experiments. Consider updating the title to match.

📝 Suggested fix

-# Autoresearch Insights (58 experiments) +# Autoresearch Insights (62 experiments)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Autoresearch Insights (58 experiments)

# Autoresearch Insights (62 experiments)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@insights.md` at line 1, Update the header in insights.md that currently reads "Autoresearch Insights (58 experiments)" to reflect the actual experiment count documented in results.md by changing the number to 62 so the title becomes "Autoresearch Insights (62 experiments)"; ensure the updated header string exactly matches the format used ("# Autoresearch Insights (62 experiments)").

coderabbitai · 2026-03-23T07:50:37Z

src/hybrid.zig


 pub const BUCKET_SIZE = 16;
-const MAX_PROBES = 100;
+const MAX_PROBES = 8;


⚠️ Potential issue | 🔴 Critical

Don't silently lose inserts after the new 8-probe cap.

With MAX_PROBES now 8, insertIntoTier() and insertAnyTier() can exhaust their probe windows and just return. That drops the write without any signal, even when unprobed buckets still have capacity.

Also applies to: 496-512, 532-551

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/hybrid.zig` at line 8, insertions are being silently dropped when the new MAX_PROBES (const MAX_PROBES = 8) cap is hit because insertIntoTier() and insertAnyTier() simply return after exhausting the fixed probe count; change them to not silently bail: remove the early return on probe exhaustion and instead continue scanning the remainder of the target bucket region (or perform a full linear probe within that tier) until you either find a free slot or confirm the tier is full, and then return a clear error/flag indicating insert failure. Update both insertIntoTier and insertAnyTier to use MAX_PROBES as the initial probe window but fall back to scanning the rest of the tier (or return Result/Err) so no inserts are dropped silently. Ensure callers handle the explicit error/flag instead of assuming success.

With the matchEmpty miss optimization, elastic hash now ranks #1 on hit lookup, miss lookup, insert, and delete at 50% load against abseil, Rust hashbrown+ahash, Go swiss.Map, and Go builtin map. Abseil only wins on misses at 75%+ load where tier 0 fills up.

joshuaisaact added 30 commits March 23, 2026 01:02

tier-first search order in get() for better cache locality

a4a2c7a

Revert "tier-first search order in get() for better cache locality"

25c5de2

This reverts commit a4a2c7a.

cross-tier prefetching in get() to hide inter-tier latency

330ddb6

Revert "cross-tier prefetching in get() to hide inter-tier latency"

d5d671a

This reverts commit 330ddb6.

remove all prefetching from get() to test if it helps or hurts

f9fc247

add tier-0 probe-0 fast path in get()

18b4ce8

Revert "add tier-0 probe-0 fast path in get()"

446e450

This reverts commit 18b4ce8.

reduce MAX_PROBES from 32 to 24

b48b0a3

reduce MAX_PROBES from 24 to 20

6bbdd09

cap get() tier search to 8 tiers

6ca9ddc

reduce MAX_LOOKUP_TIERS from 8 to 6

d42e9ee

Revert "reduce MAX_LOOKUP_TIERS from 8 to 6"

33c9feb

This reverts commit d42e9ee.

use fixed-size arrays for tier metadata instead of heap slices

9530f6c

Revert "use fixed-size arrays for tier metadata instead of heap slices"

1c52805

This reverts commit 9530f6c.

switch hash to Stafford variant 13 (splitmix64 finalizer)

907f4f8

Revert "switch hash to Stafford variant 13 (splitmix64 finalizer)"

a8629ca

This reverts commit 907f4f8.

switch to linear probing for better cache locality

bbf67f0

add early exit on empty bucket slots in get()

6e5b3dc

Revert "add early exit on empty bucket slots in get()"

e580524

This reverts commit 6e5b3dc.

combined fp+empty SIMD check with per-tier early exit

5f62379

Revert "combined fp+empty SIMD check with per-tier early exit"

ac8b602

This reverts commit 5f62379.

double tier 0 size to concentrate elements for faster lookup

6f57a0a

quadruple tier 0 size (2x capacity / BUCKET_SIZE)

f7328f9

Revert "quadruple tier 0 size (2x capacity / BUCKET_SIZE)"

be1c046

This reverts commit f7328f9.

reduce MAX_LOOKUP_TIERS from 8 to 4 with larger tier 0

710938e

reduce MAX_LOOKUP_TIERS from 4 to 3

3d427e3

reduce MAX_LOOKUP_TIERS from 3 to 2

6188a77

reduce MAX_LOOKUP_TIERS to 1 (tier 0 only)

c9ea0fe

simplify get() to single tier 0 loop, no tier iteration

8c9b749

Revert "simplify get() to single tier 0 loop, no tier iteration"

eaf5bbe

This reverts commit 8c9b749.

joshuaisaact added 24 commits March 23, 2026 01:46

cache bucket mask directly, eliminate subtraction per probe

c9e700c

prefetch keys for probe 0 to hide key read latency

740bc74

Revert "prefetch keys for probe 0 to hide key read latency"

1e118e7

This reverts commit 740bc74.

branchless fingerprint using 7 bits + 1 (range 1-128)

bfd53c3

Revert "branchless fingerprint using 7 bits + 1 (range 1-128)"

2200af5

This reverts commit bfd53c3.

reorder struct fields: hot path first for cache line alignment

23f2a59

try inline for with precomputed probe array

37c7627

Revert "try inline for with precomputed probe array"

f6b581a

This reverts commit 37c7627.

use bits 32-39 for fingerprint, less correlation with bucket index

90f051a

try bits 24-31 for fingerprint

45de2ab

ultra-fast path: check slot 0 with validity check

b3fdcef

separate probe 0 check to avoid redundant work in main loop

e4a3802

unroll probes 0 and 1 before the loop

bb13814

Revert "unroll probes 0 and 1 before the loop"

4718a34

This reverts commit bb13814.

prefetch values for probe 0 while SIMD runs

53ffd0a

Revert "prefetch values for probe 0 while SIMD runs"

ebb9f26

This reverts commit 53ffd0a.

branchless fingerprint using max/min clamp

e22f93b

Revert "branchless fingerprint using max/min clamp"

97389fa

This reverts commit e22f93b.

batch threshold 0.06 (94% fill)

6600b40

Revert "batch threshold 0.06 (94% fill)"

f0f51aa

This reverts commit 6600b40.

batch threshold 0.12 (88% fill)

28eded4

batch threshold 0.15 (85% fill)

4e2f019

Revert "batch threshold 0.15 (85% fill)"

f380498

This reverts commit 4e2f019.

joshuaisaact marked this pull request as ready for review March 23, 2026 07:40

coderabbitai bot reviewed Mar 23, 2026

View reviewed changes

joshuaisaact mentioned this pull request Mar 24, 2026

miss optimization + validation: elastic hash beats abseil on every operation #6

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize lookup: 5x faster than std.HashMap at 1M elements#1

Optimize lookup: 5x faster than std.HashMap at 1M elements#1
joshuaisaact wants to merge 85 commits intomainfrom
autoresearch/lookup-optimization

joshuaisaact commented Mar 23, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 23, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 23, 2026

Uh oh!

coderabbitai bot Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	# Autoresearch Insights (58 experiments)
	# Autoresearch Insights (62 experiments)

Conversation

joshuaisaact commented Mar 23, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What

vs Zig std.HashMap (1M elements, 99% load)

vs Google abseil flat_hash_map (1M elements, 99% load)

References

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

joshuaisaact commented Mar 23, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 23, 2026 •

edited

Loading