string-key elastic hash: 36-97% faster than abseil on lookup by joshuaisaact · Pull Request #3 · joshuaisaact/elastic-hash

joshuaisaact · 2026-03-24T07:01:21Z

Why

The u64 benchmark showed elastic hash 15-20% faster than abseil at normal loads. But most real hash tables use string keys. This PR answers: does the advantage hold with strings?

What

StringElasticHash (src/string_hybrid.zig): Same tiered SIMD architecture as HybridElasticHash, adapted for []const u8 keys with wyhash. Stores key pointer + length + value per entry.

Benchmarks: 16-byte hex string keys (splitmix64 → hex), both sides use string_view semantics (no copy). Shuffled access verified.

Results

Shuffled hit lookup at n=1,048,576 (string_view keys, verified)

Load	Gap	Meaning
10%	1.97	Elastic 97% faster
25%	1.86	Elastic 86% faster
50%	1.74	Elastic 74% faster
75%	1.61	Elastic 61% faster
90%	1.50	Elastic 50% faster
99%	1.36	Elastic 36% faster

The advantage is much larger than u64 keys (which showed 15-20% shuffled). With strings, fingerprint filtering eliminates expensive 16-byte memcmp comparisons that abseil has to perform on every probe match. Delete is still 2-3x faster. Miss lookups are still 2-3x slower (structural).

Why strings help more

Fingerprint filtering is more valuable. With u64 keys, a false-positive key comparison costs ~1 cycle (single compare). With string keys, it costs ~5-10 cycles (memcmp). Our 8-bit fingerprints (1/254 false positive rate) filter out most of these.
Hash cost is equalized. Both sides use wyhash variants for strings, removing the hash-speed advantage we had with u64.
Cache density still matters. Our 1MB tier-0 fingerprints vs abseil's 2MB control bytes — same L2 vs L3 story as u64.

References

Parent PR: optimize lookup to compete with abseil flat_hash_map #2 (u64 key optimizations)
Optimal Bounds for Open Addressing Without Reordering

Summary by CodeRabbit

New Features
- Added a string-keyed elastic hash map and a faster tier-0 lookup fast path
- Multiple new realistic and string-key benchmarks comparing against Abseil
Tests
- New tests and verification programs for string-hash correctness and performance
Documentation
- README rewritten with Abseil comparisons, architecture, and optimization logs
- Extensive insights, verification, and experiment program documentation added
Build
- New benchmark and test build targets and automated benchmark runner scripts
Chores
- Updated .gitignore to exclude additional benchmark/build artifacts

This reverts commit a4a2c7a.

This reverts commit 330ddb6.

This reverts commit 18b4ce8.

This reverts commit d42e9ee.

This reverts commit 9530f6c.

This reverts commit 907f4f8.

This reverts commit 6e5b3dc.

This reverts commit 5f62379.

This reverts commit f7328f9.

This reverts commit 8c9b749.

…oad)" This reverts commit b725080.

…load)" This reverts commit 65aee5b.

This reverts commit 93a99b8.

This reverts commit e5abd48.

…n cascade)

…lot)" This reverts commit 32dd07a.

- src/string_hybrid.zig: StringElasticHash with wyhash, same tiered architecture - src/autobench-strings.zig: Zig benchmark with 16-byte hex keys - bench-abseil-strings.cpp: abseil benchmark with string_view keys - bench-strings.sh: comparison runner - build.zig: added string test and benchmark steps

… access Shuffled hit lookup gaps at 1M (string_view keys, 16-byte hex): 10%: 1.97x 25%: 1.86x 50%: 1.74x 75%: 1.61x 90%: 1.50x 99%: 1.36x The advantage is LARGER than u64 keys because: - Fingerprint filtering saves expensive 16-byte memcmp comparisons - Both sides pay more for wyhash, but our table structure saves more per probe

coderabbitai · 2026-03-24T07:01:36Z

📝 Walkthrough

Walkthrough

Adds multi-language benchmarking and verification infrastructure comparing an "elastic" Zig hash against Abseil, introduces a string-keyed elastic map, refactors hybrid get() with a tier-0 fast path and overflow hook, and supplies many automation scripts, capacity/hash verification tools, and experimental documentation.

Changes

Cohort / File(s)	Summary
Ignore & Build `\.gitignore`, `build.zig`	Ignore additional build/benchmark artifacts; add Zig build targets `autobench`, `autobench-strings`, and a new test dependency for `string_hybrid.zig`.
Top-level Documentation & Programs `README.md`, `AGENTS.md`, `program.md`, `program-v2.md`, `program-v3.md`, `program-v3-verify.md`, `program-v4-verify-strings.md`	New and revised README and contributor/experiment programs specifying frozen files, experiment loops, benchmarking setup, and acceptance criteria.
Insights / Results / Verification Docs `results.md`, `insights.md`, `insights-v2.md`, `insights-v3.md`, `verify-results.md`, `verify-strings-results.md`	Extensive experiment logs, optimization notes, verification findings, and summarized performance claims across multiple autoresearch phases.
C++ Benchmarks (Abseil & Verification) `bench-abseil.cpp`, `bench-abseil-strings.cpp`, `bench-realistic.cpp`, `bench-verify.cpp`, `bench-strings-verify.cpp`, `bench-strings-*.cpp`	Added multiple C++ benchmark programs measuring absl::flat_hash_map for integer and string keys across hit/miss/insert/delete and realistic workloads, with deterministic key generators and median-of-runs reporting.
C++ Analysis Utilities `verify-capacity.cpp`, `verify-hash-cost.cpp`, `verify-*.cpp`	Tools to inspect Abseil capacity behavior and to measure per-hash cost for comparison.
Zig Benchmark Executables `src/autobench.zig`, `src/autobench-verify.zig`, `src/autobench-strings.zig`, `src/autobench-strings-verify.zig`, `src/autobench-realistic.zig`	Five Zig benchmark programs driving `HybridElasticHash` and `StringElasticHash` across ordered/shuffled and realistic workloads, reporting median timings.
Benchmark Automation Scripts `bench.sh`, `bench-v2.sh`, `bench-strings.sh`	Shell scripts to build/run Abseil and Zig benchmarks, collect logs, parse `RESULT` lines, and compute per-operation gap ratios.
Core Hybrid Refactor `src/hybrid.zig`	Refactored `HybridElasticHash`: changed storage to per-bucket `Entry` arrays, reduced `MAX_PROBES` from 100→7, added tier-0 fast-path probing with `tier0_bucket_mask`/`shift`, replaced nested multi-tier probe loops with an overflow function pointer for tier-1+, adjusted insertion/removal logic and thresholds.
String-Keyed Implementation `src/string_hybrid.zig`	New `StringElasticHash` implementation (pub API): multi-tier layout, Wyhash-based hashing, 8-bit fingerprints, SIMD-style fingerprint matching, overflow hook, insertion/get/remove APIs, and tests.
Verify / Utility Zig Sources `src/verify-strings.zig`, `src/...`	Added Zig verification for string-key correctness and hashing behavior; deterministic key creation and measurement harnesses.

Sequence Diagram(s)

sequenceDiagram
  participant Script as bench-v2.sh (runner)
  participant Build as Builder (g++, zig)
  participant Abseil as bench-abseil (C++)
  participant Elastic as autobench (Zig)
  participant Logs as Logs (abseil-v2.log / elastic-v2.log)

  Script->>Build: compile `bench-abseil.cpp` (pkg-config, g++)
  Script->>Build: build `autobench` (`zig build autobench`)
  Build-->>Abseil: produces `bench-abseil`
  Build-->>Elastic: produces `autobench` binary
  Script->>Abseil: run, capture stdout → `abseil-v2.log`
  Script->>Elastic: run, capture stderr → `elastic-v2.log`
  Abseil-->>Logs: append RESULT lines
  Elastic-->>Logs: append RESULT lines
  Script->>Logs: parse RESULT lines, match by `n`/`load`
  Script->>Script: compute gap ratios and print comparison table

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐇 I debugged and probed by moonlight's gleam,

Buckets sharpened, fingerprints in stream,
Scripts that race and C++ that sings,
Zig's elastic leaps on nimble springs,
A rabbit cheers: new benches and dreams!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 18.60% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately captures the main addition: a string-key elastic hash implementation that is significantly faster than Abseil on lookup operations, with specific performance improvements quantified (36-97% range).

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch autoresearch/string-keys

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 17

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/hybrid.zig (1)

611-628: ⚠️ Potential issue | 🔴 Critical

remove() only searches tier 0, leaving tier 1+ elements inaccessible.

The remove() function (lines 611-628) probes only tier 0, but elements overflow to tier 1 when the tier 0 bucket fills up during insertion (line 222: if (tier + 1 < num_tiers)). Unlike get() which delegates to defaultGetOverflow() for tier 1+ lookups, remove() returns false immediately, effectively making any element in tier 1 unretrievable.

The test "hybrid remove at high load" (line 743) inserts 800 elements (78% load) and expects remove() to succeed on 400 removals (line 754). At this load factor with MAX_PROBES=7, some insertions will cascade to tier 1, causing the test to fail or creating correctness issues where removed elements appear gone via get() but are still counted in map.count.

Add a removeOverflow() function to search tier 1+ tiers, mirroring the pattern used in defaultGetOverflow():

Proposed fix

    pub fn remove(self: *Self, key: u64) bool {
        const h = hash(key);
        const fp = fingerprint(h);
        const mask = self.tier0_bucket_mask;
        const bucket_base = h >> self.tier0_bucket_shift;

        var probe: usize = 0;
        while (probe < MAX_PROBES) : (probe += 1) {
            const bucket_idx = (bucket_base +% `@as`(u64, probe)) & mask;

            if (self.findKeyInBucket(bucket_idx, key, fp)) |slot| {
                self.fingerprints[bucket_idx][slot] = TOMBSTONE;
                self.count -= 1;
                return true;
            }
        }
+        return self.removeOverflow(h, key, fp);
+    }
+
+    fn removeOverflow(self: *Self, h: u64, key: u64, fp: u8) bool {
+        if (self.num_tiers <= 1) return false;
+        const num_buckets = self.tier_bucket_counts[1];
+        const tier_start = self.tier_starts[1];
+        for (0..@min(MAX_PROBES, num_buckets)) |probe| {
+            const abs_idx = tier_start + bucketIndex(h, probe, num_buckets);
+            if (self.findKeyInBucket(abs_idx, key, fp)) |slot| {
+                self.fingerprints[abs_idx][slot] = TOMBSTONE;
+                self.count -= 1;
+                return true;
+            }
+            if (matchEmpty(&self.fingerprints[abs_idx]) != 0) return false;
+        }
         return false;
     }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/hybrid.zig` around lines 611 - 628, remove() only searches tier 0 leaving
items that overflowed into tier 1+ unremoved; implement a removeOverflow(self:
*Self, key: u64) bool that mirrors defaultGetOverflow()’s tiered probing logic
and uses findKeyInBucket(bucket_idx, key, fp) to locate the slot in higher
tiers, set the slot to TOMBSTONE and decrement self.count when found, and then
call removeOverflow from remove() when the tier0 probe loop fails to find the
key; make sure removeOverflow uses the same bucket index computation
(bucket_base, probe, mask/shift per tier) and MAX_PROBES behavior so its
semantics match defaultGetOverflow and maintain consistent counting.

🧹 Nitpick comments (9)

program-v2.md (2)
21-39: Add language identifier to pseudo-code block.
📝 Suggested fix
-```
+```text
 get(key):
     h = key * 0x517cc1b727220a95; h ^= h >> 32
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@program-v2.md` around lines 21 - 39, The pseudo-code block lacks a language
identifier which affects syntax highlighting and readability; update the fenced
code block starting with the get(key) snippet to include a language tag (e.g.,
"text" or "pseudo") so tools render it properly, ensuring the block that
contains get(key), h calculation, fp, tier0_bucket_mask, bucket0, the probe loop
(MAX_PROBES=8) and findKeyInBucket description is changed to ```text (or another
appropriate identifier) at the opening fence and keep the rest of the content
unchanged.
125-127: Add language identifier to TSV format block.
📝 Suggested fix
-```
+```tsv
 commit\tlookup_gap_99\tlookup_gap_50\tinsert_gap_99\tstatus\tdescription
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against the current code and only fix it if needed.

In @program-v2.md around lines 125 - 127, The fenced code block containing the
TSV header line
"commit\tlookup_gap_99\tlookup_gap_50\tinsert_gap_99\tstatus\tdescription" lacks
a language identifier; update the opening fence from totsv so the block
is labeled as TSV (e.g., change the code fence that wraps the header line in
program-v2.md to use ```tsv).
</details>

</blockquote></details>
<details>
<summary>program-v3.md (2)</summary><blockquote>

`162-164`: **Add language identifier for TSV format block.**


<details>
<summary>📝 Suggested fix</summary>

```diff
-```
+```tsv
 commit	find_rate	lookup_gap_99	miss_gap_99	insert_gap_99	status	description
 ```
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against the current code and only fix it if needed.

In @program-v3.md around lines 162 - 164, The TSV code block showing the header
line
"commit\tfind_rate\tlookup_gap_99\tmiss_gap_99\tinsert_gap_99\tstatus\tdescription"
needs a language identifier; update the fenced code block that currently uses
to instead usetsv so the block becomes a TSV-marked code fence (i.e.,
change the opening fence for the block containing the header line to ```tsv).
</details>

---

`92-99`: **Consider adding language identifiers to fenced code blocks.**

The memory layout diagrams could use a `text` or `plaintext` language identifier to satisfy markdown linting and improve rendering consistency.


<details>
<summary>📝 Suggested fix</summary>

```diff
-```
+```text
 fingerprints: [tier0 bucket 0][tier0 bucket 1]...[tier1 bucket 0][tier1 bucket 1]...
 ```
 
 Interleave so that tier 0 and tier 1 for the same hash region are adjacent:
-```
+```text
 fingerprints: [tier0 bucket 0][tier1 bucket 0][tier0 bucket 1][tier1 bucket 1]...
 ```
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against the current code and only fix it if needed.

In @program-v3.md around lines 92 - 99, Update the two fenced code blocks that
show the memory layout (the lines starting with "fingerprints: [tier0 bucket
0]...") to include a language identifier (use "text" or "plaintext") on the
opening fence; change the opening fences for the original and interleaved
examples (the blocks containing "fingerprints: [tier0 bucket 0][tier0 bucket
1]..." and "fingerprints: [tier0 bucket 0][tier1 bucket 0]...") from ``` to
src/autobench-strings-verify.zig (1)
45-57: Consider formatting the Fisher-Yates shuffle for readability.

The shuffle logic on line 56 is compressed into a single line, making it harder to read and debug compared to the multi-line version in autobench-verify.zig.
♻️ Suggested formatting
     var rng: u64 = 42;
     var idx: usize = fill;
-    while (idx > 1) { idx -= 1; const j = splitmix64(&rng) % (idx + 1); const tmp = shuffle_order[idx]; shuffle_order[idx] = shuffle_order[j]; shuffle_order[j] = tmp; }
+    while (idx > 1) {
+        idx -= 1;
+        const j = splitmix64(&rng) % (idx + 1);
+        const tmp = shuffle_order[idx];
+        shuffle_order[idx] = shuffle_order[j];
+        shuffle_order[j] = tmp;
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/autobench-strings-verify.zig` around lines 45 - 57, The Fisher–Yates
shuffle inside function bench is written as a single compressed line which
reduces readability; refactor the while loop that mutates shuffle_order, rng,
idx and uses splitmix64 into a multi-line, well-indented block (e.g., decrement
idx, compute j = splitmix64(&rng) % (idx + 1), swap shuffle_order[idx] and
shuffle_order[j]) so it matches the clearer style used in autobench-verify.zig
while keeping the same logic and using the existing symbols (bench,
shuffle_order, splitmix64, rng, idx).
insights.md (1)
10-27: Add language identifier to pseudo-code block.
📝 Suggested fix
-```
+```text
 get(key):
     h = key * 0x517cc1b727220a95; h ^= h >> 32
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@insights.md` around lines 10 - 27, The fenced pseudo-code block lacks a
language identifier; update the block delimiter to include one (e.g., ```text or
```pseudo) so code renderers apply monospaced formatting and no highlighting
errors occur; locate the block that documents get(key) and findKeyInBucket
(mentions MAX_PROBES, probe loop, bucket0, fp) and change the opening ``` to
```text (or another appropriate identifier) to satisfy the reviewer suggestion.
verify-hash-cost.cpp (1)
35-69: Use the median here, not the per-hasher minimum.

The rest of the suite is warmup+median, but this verifier reports the single fastest run for each side. That makes the final ratio much more sensitive to timer noise and weakens the “hash cost is comparable” conclusion.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@verify-hash-cost.cpp` around lines 35 - 69, The timing currently takes the
per-hasher minimum (best_ours / best_abseil); change this to compute the median
of the RUNS durations instead. In the loops that measure elastic_hash and
abseil_hasher, collect each run's microsecond duration into vectors (e.g.,
ours_times, abseil_times) rather than updating best_ours/best_abseil; after the
runs compute the median (sort or use std::nth_element) and use those median
values in the printf and ratio calculation. Keep the existing warmup behavior
and use the same measurement codepaths (elastic_hash, abseil_hasher,
do_not_optimize) but replace the min-based best_* logic with median-based
selection.
insights-v3.md (1)
49-68: Add language specifier to fenced code block.

The pseudocode block lacks a language identifier, which affects syntax highlighting and accessibility.
📝 Proposed fix
-```
+```text
 get(key):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@insights-v3.md` around lines 49 - 68, Add a language specifier to the fenced
code block containing the pseudocode (the block starting with get(key): and
including get_overflow(self, h, key, fp):) to enable proper syntax
highlighting/accessibility; update the opening fence from ``` to something like
```text (or ```pseudo) so the entire block is treated as a specified language
without changing the pseudocode content.
src/string_hybrid.zig (1)
39-47: Document key lifetime requirements for StringEntry.

StringEntry stores only pointers to key data. Callers must ensure the key memory outlives the map entry. This is intentional for performance (avoiding copies), but the contract should be documented.
📝 Proposed documentation addition
+/// Entry storing a string key reference and value.
+/// IMPORTANT: The key memory is NOT owned by this struct. Callers must ensure
+/// the key data outlives the entry in the map. This design avoids allocations
+/// for string_view-like semantics matching the Abseil benchmark.
 const StringEntry = struct {
     key_ptr: [*]const u8,
     key_len: usize,
     value: u64,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/string_hybrid.zig` around lines 39 - 47, Add documentation to the
StringEntry declaration explaining that it stores only a raw pointer (key_ptr)
and length (key_len) and does not own or copy the key bytes, so callers must
guarantee the pointed-to key memory outlives the StringEntry/map entry; mention
this contract near the StringEntry struct and decorate the key(self:
StringEntry) method with a brief note about the borrowed lifetime and the
performance rationale for avoiding copies.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bench-realistic.cpp`:
- Around line 48-63: The benchmark currently constructs a std::mt19937_64 rng
and uses it to pick op types/keys (see rng, OPS, op_types, op_keys,
initial_keys), which differs from the Zig harness that uses splitmix64; replace
the std::mt19937_64 usage with the same splitmix64-family generator and seeding
(reuse ks or an equivalent seed variable) so the random sequence and key draws
(calls that produce r and indices into initial_keys) mirror the Zig
workload—update all occurrences (the rng declaration and loops at the sites
using rng, e.g., the block that sets op_types/op_keys and the other two noted
blocks) to call the project's splitmix64 generator consistently.

In `@bench-strings.sh`:
- Around line 32-48: The script currently skips benchmark rows silently when
elastic-strings.log lacks a matching RESULT line or parsed fields (e_line,
e_ins, e_get, e_del, e_miss) are empty/invalid; change the behavior to fail
loudly instead: in the block that checks e_line and the parsed e_* variables, if
e_line is empty or any of e_ins/e_get/e_del/e_miss are missing or
non-numeric/zero, print a clear error message including n_val and load_val and
exit with a non-zero status (exit 1) rather than continuing; similarly, if the
numeric checks for >0 fail before computing *_gap, emit an error and exit 1 so
missing/partial rows cannot be silently omitted. Ensure the checks reference the
exact variables e_line, e_ins, e_get, e_del, e_miss and the loop variables
n_val/load_val so the failure message pinpoints the failing configuration.
- Around line 5-7: The g++ invocation that builds bench-abseil-strings.cpp in
bench-strings.sh omits a C++ standard flag causing std::string_view usage in
bench-abseil-strings.cpp to fail on compilers defaulting to pre-C++17; update
the g++ command that produces bench-abseil-strings to include -std=c++17 (e.g.,
add -std=c++17 alongside -O3 -march=native ...) so the build supports
std::string_view and other C++17 features.

In `@bench-v2.sh`:
- Around line 31-47: The script currently skips missing/malformed benchmark rows
(when e_line is empty or extracted vars e_ins/e_get/e_del/e_miss are
missing/non-numeric/zero) which can hide a stopped benchmark; change the
behavior in the block that builds e_line/e_ins/e_get/e_del/e_miss so that
instead of continuing silently it prints a clear error (including n=${n_val} and
load=${load_val} and the offending e_line) and exits non‑zero. Concretely:
replace the "if [ -z \"$e_line\" ]; then continue; fi" with a failure path that
logs the missing RESULT row and runs "exit 1"; also tighten the subsequent
validation of e_ins/e_get/e_del/e_miss (ensure they are non-empty and numeric
and >0) to log a descriptive error and exit 1 rather than silently skipping,
keeping the awk field-extraction logic (e_line, e_ins, e_get, e_del, e_miss) and
the later gap calculations unchanged.

In `@bench.sh`:
- Around line 13-14: The metric extraction fails because the script greps for
"lookup_ratio=..." but the log emits "lookup_us=..."; update the extraction in
bench.sh (the line using grep "RESULT" bench.log | grep "n=1048576" | grep -oP
'lookup_ratio=\K[0-9.]+') to instead extract the emitted field (e.g.,
"lookup_us") or add a fallback that checks for both "lookup_ratio" and
"lookup_us" so the script doesn't fail under set -euo pipefail; ensure the echo
"=== Primary metric ..." line matches the chosen metric name.

In `@program-v3-verify.md`:
- Around line 67-72: Update section 6 to include string-key verification by
adding the new string-key path and the bench-strings* artifacts into the frozen
verification scope: mention running the bench-strings* benchmarks (both Zig and
C++ variants) as part of the verification, describe a simple cross-language
comparison approach (e.g., normalize reported ops/time and document expected
overhead differences), and record any remaining limitations; reference the
"string-key path", "bench-strings*" artifacts and the "frozen verification
scope" so reviewers can locate and run the new checks.

In `@program.md`:
- Around line 75-77: The fenced code block starting with ``` in program.md lacks
a language tag, causing MD040; update that fenced block delimiter to include a
language tag (e.g., "tsv") so the block becomes ```tsv and keep the inner line
"<commit_hash>\t<lookup_ratio_1M>\t<insert_ratio_1M>\t<delete_ratio_1M>\t<keep|revert|crash>\t<description>"
unchanged.

In `@README.md`:
- Around line 90-110: Update the README to advertise the new string-key path:
add the new implementation entry (e.g., mention StringElasticHash and its source
file such as src/string.zig or the actual filename containing StringElasticHash)
to the "Files" list and add the string benchmark runner (bench-strings.sh) and
any string-specific benchmark targets (e.g., src/bench-strings.zig or
src/strings.zig and src/autobench-strings.zig if present) to the "Usage"
section; ensure you reference the symbol StringElasticHash and the script
bench-strings.sh so users can find and run the string-key benchmark path
(include the exact run command like bash bench-strings.sh).
- Around line 38-39: The README's cache-size claim is inconsistent with the
test-machine L2 size; update the text around the "Hit lookups at 500K-2M
elements..." sentence and the related lines (the "1MB for tier 0" and the later
note about "roughly 512KB L2") so they agree: either change the claim that
tier-0 fits in L2 to state it fits in L3 (or in CPU L2+L3 combined) or revise
the footprint to match a 512KB L2, and mirror the same correction in the other
referenced paragraph (lines ~56-58) so both the description of "1MB for tier 0"
and the machine L2 size are consistent and clear.

In `@results.md`:
- Around line 367-370: The sentence claiming Abseil does "rehash-on-delete" is
incorrect; update the paragraph comparing Elastic hash to Abseil's flat_hash_map
to state that flat_hash_map::erase is O(1) and does not trigger a full-table
rehash (cite Abseil docs), remove the "rehash-on-delete" assertion, and instead
explain the observed 3–4x delete speed advantage by attributing it to
implementation factors in Elastic hash (e.g., tombstone marking vs in-place
removal strategies, memory layout/cache locality, probing or chaining
differences, or lazy clean-up behavior); if unsure, add a short note that the
exact cause is unconfirmed and suggest adding microbenchmarks or profiling to
isolate whether tombstone marking, layout, or another factor drives the
difference.

In `@src/autobench-realistic.zig`:
- Around line 25-29: The usElapsed function currently multiplies seconds by
1_000_000_000 and only divides the nanoseconds delta by 1_000, yielding wrong
microseconds and incorrect handling of negative ns_diff; fix usElapsed by
computing the full time difference in nanoseconds using signed arithmetic (e.g.,
compute sec_diff as a signed integer, add ns_diff to sec_diff * 1_000_000_000 to
form total_ns) and then divide the combined total_ns by 1_000 to get
microseconds, finally casting the result to u64; make these changes inside fn
usElapsed (use linux.timespec values) and avoid casting ns_diff to u64 before
the division so negative deltas are preserved.

In `@src/autobench-strings-verify.zig`:
- Around line 34-38: The usElapsed function computing microseconds between two
linux.timespec values can panic when end.nsec < start.nsec; change the logic in
usElapsed to compute seconds and nanoseconds differences with a borrow: compute
sec_diff = end.sec - start.sec as signed (or u64 after borrow), compute ns_diff
= end.nsec - start.nsec as i64, and if ns_diff < 0 then decrement sec_diff by 1
and add 1_000_000_000 to ns_diff; then calculate total microseconds as sec_diff
* 1_000_000_000 + ns_diff and convert to u64 before dividing by 1_000 so no
negative value is cast to u64. Ensure you update the usElapsed function
signature/locals that reference linux.timespec accordingly.

In `@src/autobench-strings.zig`:
- Around line 40-44: The usElapsed function can panic when end.nsec < start.nsec
because it casts a negative ns_diff to u64; fix by computing differences in a
signed integer, normalize nanoseconds when ns_diff is negative (decrement the
seconds difference and add 1_000_000_000 to ns_diff) and only then combine into
a non-negative total before dividing by 1_000; specifically, in usElapsed use a
signed seconds delta (e.g., sec_diff_i64 = end.sec - start.sec), compute ns_diff
as i64, if ns_diff < 0 then sec_diff_i64 -= 1 and ns_diff += 1_000_000_000, then
produce the final microsecond result from sec_diff_i64 * 1_000_000_000 + ns_diff
cast safely to u64.

In `@src/autobench-verify.zig`:
- Around line 26-31: The usElapsed function can panic when end.nsec < start.nsec
because a negative ns_diff gets cast to u64; update usElapsed to compute the
seconds and nanosecond difference with proper borrow handling: compute sec_diff
= end.sec - start.sec and if end.nsec < start.nsec decrement sec_diff by 1 and
set ns_diff = end.nsec + 1_000_000_000 - start.nsec (all kept in unsigned types
after the borrow), then combine total_ns = sec_diff * 1_000_000_000 + ns_diff
and return total_ns / 1_000 to produce microseconds; apply this change inside fn
usElapsed(start: linux.timespec, end: linux.timespec) to avoid casting negative
i64 to u64.

In `@src/autobench.zig`:
- Around line 59-60: The reported "load" is inconsistent because benchOne() uses
fill / n while HybridElasticHash.init(allocator, n) creates tier0 plus overflow
tiers (HybridElasticHash.init), giving more physical slots than n; fix by
normalizing capacity: either adjust the init parameter to allocate only the
equivalent physical slots that match absl::flat_hash_map::reserve(n), or
(preferable) query the hybrid map's actual capacity after init (e.g., call its
capacity/size method) and use fill / effective_capacity when labeling load in
benchOne(); apply the same normalization change wherever
HybridElasticHash.init(...) is used for benchmarking (including the other
occurrence mentioned) so Abseil reserve(n) and HybridElasticHash capacity are
compared on the same physical-slot basis.
- Around line 28-32: The usElapsed function incorrectly casts a negative ns_diff
to u64 causing huge values when end.nsec < start.nsec; fix by normalizing the
borrow: compute s and ns_diff as signed values, and if ns_diff < 0 then
decrement s by 1 and add 1_000_000_000 to ns_diff so ns_diff becomes
non-negative, then compute total_ns = s * 1_000_000_000 + ns_diff using
appropriate casts to u64 (ensure s is non-negative before the final cast) and
return total_ns / 1_000 to preserve microseconds; update the variables in the
usElapsed function (s, ns_diff, total_ns) accordingly.

In `@src/string_hybrid.zig`:
- Around line 314-330: The remove() implementation only searches tier 0; update
it to scan all tiers (tier 0 through the overflow tiers) the same way
insertion/search does: loop over each tier index (e.g., 0 .. self.tier_count),
for each tier use that tier's bucket mask/shift (instead of always
tier0_bucket_mask/tier0_bucket_shift) to compute bucket_base and run the same
probe loop (probe < MAX_PROBES) calling findKeyInBucket(bucket_idx, key, fp);
when a slot is found, write TOMBSTONE into the correct fingerprints array for
that tier (e.g., self.fingerprints[tier][bucket_idx][slot]), decrement
self.count and return true; if no tiers find the key return false. Ensure you
reference and use the same symbols: remove, findKeyInBucket, fingerprint,
MAX_PROBES, TOMBSTONE, and self.count.

---

Outside diff comments:
In `@src/hybrid.zig`:
- Around line 611-628: remove() only searches tier 0 leaving items that
overflowed into tier 1+ unremoved; implement a removeOverflow(self: *Self, key:
u64) bool that mirrors defaultGetOverflow()’s tiered probing logic and uses
findKeyInBucket(bucket_idx, key, fp) to locate the slot in higher tiers, set the
slot to TOMBSTONE and decrement self.count when found, and then call
removeOverflow from remove() when the tier0 probe loop fails to find the key;
make sure removeOverflow uses the same bucket index computation (bucket_base,
probe, mask/shift per tier) and MAX_PROBES behavior so its semantics match
defaultGetOverflow and maintain consistent counting.

---

Nitpick comments:
In `@insights-v3.md`:
- Around line 49-68: Add a language specifier to the fenced code block
containing the pseudocode (the block starting with get(key): and including
get_overflow(self, h, key, fp):) to enable proper syntax
highlighting/accessibility; update the opening fence from ``` to something like
```text (or ```pseudo) so the entire block is treated as a specified language
without changing the pseudocode content.

In `@insights.md`:
- Around line 10-27: The fenced pseudo-code block lacks a language identifier;
update the block delimiter to include one (e.g., ```text or ```pseudo) so code
renderers apply monospaced formatting and no highlighting errors occur; locate
the block that documents get(key) and findKeyInBucket (mentions MAX_PROBES,
probe loop, bucket0, fp) and change the opening ``` to ```text (or another
appropriate identifier) to satisfy the reviewer suggestion.

In `@program-v2.md`:
- Around line 21-39: The pseudo-code block lacks a language identifier which
affects syntax highlighting and readability; update the fenced code block
starting with the get(key) snippet to include a language tag (e.g., "text" or
"pseudo") so tools render it properly, ensuring the block that contains
get(key), h calculation, fp, tier0_bucket_mask, bucket0, the probe loop
(MAX_PROBES=8) and findKeyInBucket description is changed to ```text (or another
appropriate identifier) at the opening fence and keep the rest of the content
unchanged.
- Around line 125-127: The fenced code block containing the TSV header line
"commit\tlookup_gap_99\tlookup_gap_50\tinsert_gap_99\tstatus\tdescription" lacks
a language identifier; update the opening fence from ``` to ```tsv so the block
is labeled as TSV (e.g., change the code fence that wraps the header line in
program-v2.md to use ```tsv).

In `@program-v3.md`:
- Around line 162-164: The TSV code block showing the header line
"commit\tfind_rate\tlookup_gap_99\tmiss_gap_99\tinsert_gap_99\tstatus\tdescription"
needs a language identifier; update the fenced code block that currently uses
``` to instead use ```tsv so the block becomes a TSV-marked code fence (i.e.,
change the opening fence for the block containing the header line to ```tsv).
- Around line 92-99: Update the two fenced code blocks that show the memory
layout (the lines starting with "fingerprints: [tier0 bucket 0]...") to include
a language identifier (use "text" or "plaintext") on the opening fence; change
the opening fences for the original and interleaved examples (the blocks
containing "fingerprints: [tier0 bucket 0][tier0 bucket 1]..." and
"fingerprints: [tier0 bucket 0][tier1 bucket 0]...") from ``` to ```text so
markdown linters/renderers treat them as plain text.

In `@src/autobench-strings-verify.zig`:
- Around line 45-57: The Fisher–Yates shuffle inside function bench is written
as a single compressed line which reduces readability; refactor the while loop
that mutates shuffle_order, rng, idx and uses splitmix64 into a multi-line,
well-indented block (e.g., decrement idx, compute j = splitmix64(&rng) % (idx +
1), swap shuffle_order[idx] and shuffle_order[j]) so it matches the clearer
style used in autobench-verify.zig while keeping the same logic and using the
existing symbols (bench, shuffle_order, splitmix64, rng, idx).

In `@src/string_hybrid.zig`:
- Around line 39-47: Add documentation to the StringEntry declaration explaining
that it stores only a raw pointer (key_ptr) and length (key_len) and does not
own or copy the key bytes, so callers must guarantee the pointed-to key memory
outlives the StringEntry/map entry; mention this contract near the StringEntry
struct and decorate the key(self: StringEntry) method with a brief note about
the borrowed lifetime and the performance rationale for avoiding copies.

In `@verify-hash-cost.cpp`:
- Around line 35-69: The timing currently takes the per-hasher minimum
(best_ours / best_abseil); change this to compute the median of the RUNS
durations instead. In the loops that measure elastic_hash and abseil_hasher,
collect each run's microsecond duration into vectors (e.g., ours_times,
abseil_times) rather than updating best_ours/best_abseil; after the runs compute
the median (sort or use std::nth_element) and use those median values in the
printf and ratio calculation. Keep the existing warmup behavior and use the same
measurement codepaths (elastic_hash, abseil_hasher, do_not_optimize) but replace
the min-based best_* logic with median-based selection.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 446ffdcc-c540-4a72-bf1b-ec3dedd629ab

📥 Commits

Reviewing files that changed from the base of the PR and between 3e37318 and 27303d9.

⛔ Files ignored due to path filters (6)

abseil-strings.log is excluded by !**/*.log
bench-v2.log is excluded by !**/*.log
elastic-strings.log is excluded by !**/*.log
results-v2.tsv is excluded by !**/*.tsv
results-v3.tsv is excluded by !**/*.tsv
results.tsv is excluded by !**/*.tsv

📒 Files selected for processing (37)

.gitignore
AGENTS.md
README.md
bench-abseil-clang
bench-abseil-strings
bench-abseil-strings.cpp
bench-abseil.cpp
bench-realistic
bench-realistic.cpp
bench-strings-verify
bench-strings-verify.cpp
bench-strings.sh
bench-v2.sh
bench-verify
bench-verify.cpp
bench.sh
build.zig
insights-v2.md
insights-v3.md
insights.md
program-v2.md
program-v3-verify.md
program-v3.md
program.md
results.md
src/autobench-realistic.zig
src/autobench-strings-verify.zig
src/autobench-strings.zig
src/autobench-verify.zig
src/autobench.zig
src/hybrid.zig
src/string_hybrid.zig
verify-capacity
verify-capacity.cpp
verify-hash-cost
verify-hash-cost.cpp
verify-results.md

coderabbitai · 2026-03-24T07:14:22Z

bench-realistic.cpp

+    std::mt19937_64 rng(12345);
+    for (size_t i = 0; i < OPS; i++) {
+        int r = rng() % 100;
+        if (r < 80) { // 80% hit lookup
+            op_types[i] = 0;
+            op_keys[i] = initial_keys[rng() % initial_size];
+        } else if (r < 90) { // 10% miss lookup
+            op_types[i] = 1;
+            op_keys[i] = splitmix64(ks); // very unlikely to be in table
+        } else if (r < 95) { // 5% insert
+            op_types[i] = 2;
+            op_keys[i] = splitmix64(ks);
+        } else { // 5% delete
+            op_types[i] = 3;
+            op_keys[i] = initial_keys[rng() % initial_size];
+        }


⚠️ Potential issue | 🟠 Major

Use the same RNG family as Zig to keep workload parity.

At Line 48, Line 115, and Line 157, this file uses std::mt19937_64, while src/autobench-realistic.zig generates equivalent workloads with splitmix64. That makes cross-language comparisons non-equivalent and can skew reported gaps.

Proposed alignment diff

- std::mt19937_64 rng(12345); + uint64_t rng = 12345; for (size_t i = 0; i < OPS; i++) { - int r = rng() % 100; + int r = splitmix64(rng) % 100; if (r < 80) { // 80% hit lookup op_types[i] = 0; - op_keys[i] = initial_keys[rng() % initial_size]; + op_keys[i] = initial_keys[splitmix64(rng) % initial_size]; } else if (r < 90) { // 10% miss lookup op_types[i] = 1; op_keys[i] = splitmix64(ks); // very unlikely to be in table } else if (r < 95) { // 5% insert op_types[i] = 2; op_keys[i] = splitmix64(ks); } else { // 5% delete op_types[i] = 3; - op_keys[i] = initial_keys[rng() % initial_size]; + op_keys[i] = initial_keys[splitmix64(rng) % initial_size]; } } - std::mt19937_64 rng(42); + uint64_t rng = 42; size_t hot_count = n / 5; // top 20% for (size_t i = 0; i < OPS; i++) { - if (rng() % 100 < 80) { - lookup_keys[i] = keys[rng() % hot_count]; // hot key + if (splitmix64(rng) % 100 < 80) { + lookup_keys[i] = keys[splitmix64(rng) % hot_count]; // hot key } else { - lookup_keys[i] = keys[rng() % n]; // any key + lookup_keys[i] = keys[splitmix64(rng) % n]; // any key } } - std::mt19937_64 rng(99); - for (size_t i = 0; i < n * READ_MULT; i++) read_order[i] = rng() % n; + uint64_t rng = 99; + for (size_t i = 0; i < n * READ_MULT; i++) read_order[i] = splitmix64(rng) % n;

Also applies to: 115-123, 157-158

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@bench-realistic.cpp` around lines 48 - 63, The benchmark currently constructs a std::mt19937_64 rng and uses it to pick op types/keys (see rng, OPS, op_types, op_keys, initial_keys), which differs from the Zig harness that uses splitmix64; replace the std::mt19937_64 usage with the same splitmix64-family generator and seeding (reuse ks or an equivalent seed variable) so the random sequence and key draws (calls that produce r and indices into initial_keys) mirror the Zig workload—update all occurrences (the rng declaration and loops at the sites using rng, e.g., the block that sets op_types/op_keys and the other two noted blocks) to call the project's splitmix64 generator consistently.

coderabbitai · 2026-03-24T07:14:22Z

bench-strings.sh

+g++ -O3 -march=native -DNDEBUG -DABSL_HASHTABLEZ_SAMPLE_PARAMETER=0 \
+    bench-abseil-strings.cpp -o bench-abseil-strings \
+    $(pkg-config --cflags --libs absl_hash absl_raw_hash_set absl_hashtablez_sampler)


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify that the string benchmark sources use std::string_view while the build command omits -std=... rg -n --type=cpp '\bstd::string_view\b' bench-abseil-strings.cpp bench-strings-verify.cpp rg -n --type=sh 'g\+\+ .*bench-abseil-strings\.cpp' bench-strings.sh

Repository: joshuaisaact/elastic-hash

Length of output: 735

🏁 Script executed:

# Check the exact content of bench-strings.sh around lines 5-7 head -n 10 bench-strings.sh # Verify that g++ command lacks -std flag grep -n "g++" bench-strings.sh

Repository: joshuaisaact/elastic-hash

Length of output: 519

🏁 Script executed:

# Double-check that std::string_view requires C++17 # and verify no other implicit standard-setting mechanism exists file bench-strings.sh stat bench-strings.sh

Repository: joshuaisaact/elastic-hash

Length of output: 156

Add -std=c++17 to the build command for the string benchmark.

bench-abseil-strings.cpp uses std::string_view (lines 56, 58-59, 65), which requires C++17. The current build command omits the -std flag and will fail on GCC 9/10 systems that default to C++14 or earlier.

🛠️ Suggested change

-g++ -O3 -march=native -DNDEBUG -DABSL_HASHTABLEZ_SAMPLE_PARAMETER=0 \ +g++ -std=c++17 -O3 -march=native -DNDEBUG -DABSL_HASHTABLEZ_SAMPLE_PARAMETER=0 \

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

g++ -O3 -march=native -DNDEBUG -DABSL_HASHTABLEZ_SAMPLE_PARAMETER=0 \

bench-abseil-strings.cpp -o bench-abseil-strings \

$(pkg-config --cflags --libs absl_hash absl_raw_hash_set absl_hashtablez_sampler)

g++ -std=c++17 -O3 -march=native -DNDEBUG -DABSL_HASHTABLEZ_SAMPLE_PARAMETER=0 \

bench-abseil-strings.cpp -o bench-abseil-strings \

$(pkg-config --cflags --libs absl_hash absl_raw_hash_set absl_hashtablez_sampler)

🧰 Tools

🪛 Shellcheck (0.11.0)

[warning] 7-7: Quote this to prevent word splitting.

(SC2046)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@bench-strings.sh` around lines 5 - 7, The g++ invocation that builds bench-abseil-strings.cpp in bench-strings.sh omits a C++ standard flag causing std::string_view usage in bench-abseil-strings.cpp to fail on compilers defaulting to pre-C++17; update the g++ command that produces bench-abseil-strings to include -std=c++17 (e.g., add -std=c++17 alongside -O3 -march=native ...) so the build supports std::string_view and other C++17 features.

coderabbitai · 2026-03-24T07:14:22Z

bench-strings.sh

+    e_line=$(grep "n=${n_val}" elastic-strings.log | grep "load=${load_val}" | head -1) || true
+    if [ -z "$e_line" ]; then continue; fi
+
+    e_ins=$(echo "$e_line" | awk -F'\t' '{for(i=1;i<=NF;i++) if($i ~ /^insert_us=/) print substr($i,11)}')
+    e_get=$(echo "$e_line" | awk -F'\t' '{for(i=1;i<=NF;i++) if($i ~ /^lookup_us=/) print substr($i,11)}')
+    e_del=$(echo "$e_line" | awk -F'\t' '{for(i=1;i<=NF;i++) if($i ~ /^delete_us=/) print substr($i,11)}')
+    e_miss=$(echo "$e_line" | awk -F'\t' '{for(i=1;i<=NF;i++) if($i ~ /^miss_us=/) print substr($i,9)}')
+
+    if [ -n "$e_ins" ] && [ -n "$e_get" ] && [ -n "$e_del" ] && [ -n "$e_miss" ] && \
+       [ "$e_ins" -gt 0 ] && [ "$e_get" -gt 0 ] && [ "$e_del" -gt 0 ] && [ "$e_miss" -gt 0 ]; then
+        ins_gap=$(awk "BEGIN {printf \"%.3f\", $a_ins / $e_ins}")
+        get_gap=$(awk "BEGIN {printf \"%.3f\", $a_get / $e_get}")
+        del_gap=$(awk "BEGIN {printf \"%.3f\", $a_del / $e_del}")
+        mis_gap=$(awk "BEGIN {printf \"%.3f\", $a_miss / $e_miss}")
+        printf "%-8s  %-6s  %8s %8s %8s %8s  %8s %8s %8s %8s  %8s %8s %8s %8s\n" \
+            "$n_val" "$load_val" "$a_ins" "$a_get" "$a_del" "$a_miss" "$e_ins" "$e_get" "$e_del" "$e_miss" "$ins_gap" "$get_gap" "$del_gap" "$mis_gap"
+    fi


⚠️ Potential issue | 🟠 Major

Don't silently drop missing benchmark rows.

If either side stops emitting a RESULT line, or the field parsing breaks, this script just omits that configuration and still exits successfully. That makes it too easy to publish a partial comparison table without noticing.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@bench-strings.sh` around lines 32 - 48, The script currently skips benchmark rows silently when elastic-strings.log lacks a matching RESULT line or parsed fields (e_line, e_ins, e_get, e_del, e_miss) are empty/invalid; change the behavior to fail loudly instead: in the block that checks e_line and the parsed e_* variables, if e_line is empty or any of e_ins/e_get/e_del/e_miss are missing or non-numeric/zero, print a clear error message including n_val and load_val and exit with a non-zero status (exit 1) rather than continuing; similarly, if the numeric checks for >0 fail before computing *_gap, emit an error and exit 1 so missing/partial rows cannot be silently omitted. Ensure the checks reference the exact variables e_line, e_ins, e_get, e_del, e_miss and the loop variables n_val/load_val so the failure message pinpoints the failing configuration.

coderabbitai · 2026-03-24T07:14:23Z

bench-v2.sh

+    e_line=$(grep "n=${n_val}" elastic-v2.log | grep "load=${load_val}" | head -1) || true
+    if [ -z "$e_line" ]; then continue; fi
+
+    e_ins=$(echo "$e_line" | awk -F'\t' '{for(i=1;i<=NF;i++) if($i ~ /^insert_us=/) print substr($i,11)}')
+    e_get=$(echo "$e_line" | awk -F'\t' '{for(i=1;i<=NF;i++) if($i ~ /^lookup_us=/) print substr($i,11)}')
+    e_del=$(echo "$e_line" | awk -F'\t' '{for(i=1;i<=NF;i++) if($i ~ /^delete_us=/) print substr($i,11)}')
+    e_miss=$(echo "$e_line" | awk -F'\t' '{for(i=1;i<=NF;i++) if($i ~ /^miss_us=/) print substr($i,9)}')
+
+    if [ -n "$e_ins" ] && [ -n "$e_get" ] && [ -n "$e_del" ] && [ -n "$e_miss" ] && \
+       [ "$e_ins" -gt 0 ] && [ "$e_get" -gt 0 ] && [ "$e_del" -gt 0 ] && [ "$e_miss" -gt 0 ]; then
+        ins_gap=$(awk "BEGIN {printf \"%.3f\", $a_ins / $e_ins}")
+        get_gap=$(awk "BEGIN {printf \"%.3f\", $a_get / $e_get}")
+        del_gap=$(awk "BEGIN {printf \"%.3f\", $a_del / $e_del}")
+        mis_gap=$(awk "BEGIN {printf \"%.3f\", $a_miss / $e_miss}")
+        printf "%-8s  %-6s  %8s %8s %8s %8s  %8s %8s %8s %8s  %8s %8s %8s %8s\n" \
+            "$n_val" "$load_val" "$a_ins" "$a_get" "$a_del" "$a_miss" "$e_ins" "$e_get" "$e_del" "$e_miss" "$ins_gap" "$get_gap" "$del_gap" "$mis_gap"
+    fi


⚠️ Potential issue | 🟠 Major

Fail fast on missing or malformed result rows.

If one benchmark stops emitting a matching RESULT line, this script currently just skips that configuration and keeps going. That can produce partial tables that look valid enough to trust even though one side never reported the row.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@bench-v2.sh` around lines 31 - 47, The script currently skips missing/malformed benchmark rows (when e_line is empty or extracted vars e_ins/e_get/e_del/e_miss are missing/non-numeric/zero) which can hide a stopped benchmark; change the behavior in the block that builds e_line/e_ins/e_get/e_del/e_miss so that instead of continuing silently it prints a clear error (including n=${n_val} and load=${load_val} and the offending e_line) and exits non‑zero. Concretely: replace the "if [ -z \"$e_line\" ]; then continue; fi" with a failure path that logs the missing RESULT row and runs "exit 1"; also tighten the subsequent validation of e_ins/e_get/e_del/e_miss (ensure they are non-empty and numeric and >0) to log a descriptive error and exit 1 rather than silently skipping, keeping the awk field-extraction logic (e_line, e_ins, e_get, e_del, e_miss) and the later gap calculations unchanged.

coderabbitai · 2026-03-24T07:14:23Z

bench.sh

+echo "=== Primary metric (lookup_ratio at n=1048576) ==="
+grep "RESULT" bench.log | grep "n=1048576" | grep -oP 'lookup_ratio=\K[0-9.]+'


⚠️ Potential issue | 🔴 Critical

Metric extraction is broken (lookup_ratio is not emitted).

At Line 14, the parser looks for lookup_ratio=..., but current RESULT output uses lookup_us=.... With set -euo pipefail, this causes the script to fail after running benchmarks.

Proposed fix

-echo "=== Primary metric (lookup_ratio at n=1048576) ===" -grep "RESULT" bench.log | grep "n=1048576" | grep -oP 'lookup_ratio=\K[0-9.]+' +echo "=== Primary metric (lookup_us at n=1048576, load=99) ===" +grep "RESULT" bench.log | grep "n=1048576" | grep "load=99" | grep -oP 'lookup_us=\K[0-9]+'

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@bench.sh` around lines 13 - 14, The metric extraction fails because the script greps for "lookup_ratio=..." but the log emits "lookup_us=..."; update the extraction in bench.sh (the line using grep "RESULT" bench.log | grep "n=1048576" | grep -oP 'lookup_ratio=\K[0-9.]+') to instead extract the emitted field (e.g., "lookup_us") or add a fallback that checks for both "lookup_ratio" and "lookup_us" so the script doesn't fail under set -euo pipefail; ensure the echo "=== Primary metric ..." line matches the chosen metric name.

coderabbitai · 2026-03-24T07:14:23Z

src/autobench-strings.zig

+fn usElapsed(start: linux.timespec, end: linux.timespec) u64 {
+    const s: u64 = @intCast(end.sec - start.sec);
+    const ns_diff: i64 = @as(i64, @intCast(end.nsec)) - @as(i64, @intCast(start.nsec));
+    return (s * 1_000_000_000 + @as(u64, @intCast(ns_diff))) / 1_000;
+}


⚠️ Potential issue | 🟡 Minor

Potential integer overflow when nsec wraps around.

When end.nsec < start.nsec (second boundary crossing), ns_diff is negative. The expression s * 1_000_000_000 + @as(u64, @intcast(ns_diff)) attempts to cast a negative i64 to u64, which will panic in safe mode or produce incorrect results.

🐛 Proposed fix

fn usElapsed(start: linux.timespec, end: linux.timespec) u64 { - const s: u64 = `@intCast`(end.sec - start.sec); - const ns_diff: i64 = `@as`(i64, `@intCast`(end.nsec)) - `@as`(i64, `@intCast`(start.nsec)); - return (s * 1_000_000_000 + `@as`(u64, `@intCast`(ns_diff))) / 1_000; + const end_ns: u64 = `@as`(u64, `@intCast`(end.sec)) * 1_000_000_000 + `@as`(u64, `@intCast`(end.nsec)); + const start_ns: u64 = `@as`(u64, `@intCast`(start.sec)) * 1_000_000_000 + `@as`(u64, `@intCast`(start.nsec)); + return (end_ns - start_ns) / 1_000; }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/autobench-strings.zig` around lines 40 - 44, The usElapsed function can panic when end.nsec < start.nsec because it casts a negative ns_diff to u64; fix by computing differences in a signed integer, normalize nanoseconds when ns_diff is negative (decrement the seconds difference and add 1_000_000_000 to ns_diff) and only then combine into a non-negative total before dividing by 1_000; specifically, in usElapsed use a signed seconds delta (e.g., sec_diff_i64 = end.sec - start.sec), compute ns_diff as i64, if ns_diff < 0 then sec_diff_i64 -= 1 and ns_diff += 1_000_000_000, then produce the final microsecond result from sec_diff_i64 * 1_000_000_000 + ns_diff cast safely to u64.

coderabbitai · 2026-03-24T07:14:23Z

src/autobench-verify.zig

+fn usElapsed(start: linux.timespec, end: linux.timespec) u64 {
+    const s: u64 = @intCast(end.sec - start.sec);
+    const ns_diff: i64 = @as(i64, @intCast(end.nsec)) - @as(i64, @intCast(start.nsec));
+    const total_ns: u64 = s * 1_000_000_000 + @as(u64, @intCast(ns_diff));
+    return total_ns / 1_000;
+}


⚠️ Potential issue | 🟠 Major

Potential panic when end.nsec < start.nsec.

When end.nsec < start.nsec (which happens whenever the seconds increment), ns_diff will be negative. Casting a negative i64 to u64 with @intCast will cause a runtime panic in safe modes or undefined behavior.

🐛 Proposed fix to handle nanosecond borrow correctly

fn usElapsed(start: linux.timespec, end: linux.timespec) u64 { - const s: u64 = `@intCast`(end.sec - start.sec); - const ns_diff: i64 = `@as`(i64, `@intCast`(end.nsec)) - `@as`(i64, `@intCast`(start.nsec)); - const total_ns: u64 = s * 1_000_000_000 + `@as`(u64, `@intCast`(ns_diff)); - return total_ns / 1_000; + const s_diff = end.sec - start.sec; + const ns_diff = end.nsec - start.nsec; + const total_ns: i128 = `@as`(i128, s_diff) * 1_000_000_000 + ns_diff; + return `@intCast`(`@divFloor`(total_ns, 1_000)); }

Alternatively, use the standard borrow pattern:

fn usElapsed(start: linux.timespec, end: linux.timespec) u64 { + var s: i64 = end.sec - start.sec; + var ns: i64 = end.nsec - start.nsec; + if (ns < 0) { + s -= 1; + ns += 1_000_000_000; + } + return `@intCast`(s * 1_000_000 + `@divFloor`(ns, 1_000)); - const s: u64 = `@intCast`(end.sec - start.sec); - const ns_diff: i64 = `@as`(i64, `@intCast`(end.nsec)) - `@as`(i64, `@intCast`(start.nsec)); - const total_ns: u64 = s * 1_000_000_000 + `@as`(u64, `@intCast`(ns_diff)); - return total_ns / 1_000; }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

fn usElapsed(start: linux.timespec, end: linux.timespec) u64 {

const s: u64 = @intCast(end.sec - start.sec);

const ns_diff: i64 = @as(i64, @intCast(end.nsec)) - @as(i64, @intCast(start.nsec));

const total_ns: u64 = s * 1_000_000_000 + @as(u64, @intCast(ns_diff));

return total_ns / 1_000;

}

fn usElapsed(start: linux.timespec, end: linux.timespec) u64 {

const s_diff = end.sec - start.sec;

const ns_diff = end.nsec - start.nsec;

const total_ns: i128 = `@as`(i128, s_diff) * 1_000_000_000 + ns_diff;

return `@intCast`(`@divFloor`(total_ns, 1_000));

}

Suggested change

fn usElapsed(start: linux.timespec, end: linux.timespec) u64 {

const s: u64 = @intCast(end.sec - start.sec);

const ns_diff: i64 = @as(i64, @intCast(end.nsec)) - @as(i64, @intCast(start.nsec));

const total_ns: u64 = s * 1_000_000_000 + @as(u64, @intCast(ns_diff));

return total_ns / 1_000;

}

fn usElapsed(start: linux.timespec, end: linux.timespec) u64 {

var s: i64 = end.sec - start.sec;

var ns: i64 = end.nsec - start.nsec;

if (ns < 0) {

s -= 1;

ns += 1_000_000_000;

}

return `@intCast`(s * 1_000_000 + `@divFloor`(ns, 1_000));

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/autobench-verify.zig` around lines 26 - 31, The usElapsed function can panic when end.nsec < start.nsec because a negative ns_diff gets cast to u64; update usElapsed to compute the seconds and nanosecond difference with proper borrow handling: compute sec_diff = end.sec - start.sec and if end.nsec < start.nsec decrement sec_diff by 1 and set ns_diff = end.nsec + 1_000_000_000 - start.nsec (all kept in unsigned types after the borrow), then combine total_ns = sec_diff * 1_000_000_000 + ns_diff and return total_ns / 1_000 to produce microseconds; apply this change inside fn usElapsed(start: linux.timespec, end: linux.timespec) to avoid casting negative i64 to u64.

coderabbitai · 2026-03-24T07:14:23Z

src/autobench.zig

+fn usElapsed(start: linux.timespec, end: linux.timespec) u64 {
+    const s: u64 = @intCast(end.sec - start.sec);
+    const ns_diff: i64 = @as(i64, @intCast(end.nsec)) - @as(i64, @intCast(start.nsec));
+    const total_ns: u64 = s * 1_000_000_000 + @as(u64, @intCast(ns_diff));
+    return total_ns / 1_000;


⚠️ Potential issue | 🟠 Major

Normalize the nanosecond borrow in usElapsed().

When end.nsec < start.nsec, ns_diff becomes negative and the cast to u64 turns it into a huge positive number. Any timed section that crosses a second boundary will report nonsense durations.

🛠️ Suggested change

fn usElapsed(start: linux.timespec, end: linux.timespec) u64 { - const s: u64 = `@intCast`(end.sec - start.sec); - const ns_diff: i64 = `@as`(i64, `@intCast`(end.nsec)) - `@as`(i64, `@intCast`(start.nsec)); - const total_ns: u64 = s * 1_000_000_000 + `@as`(u64, `@intCast`(ns_diff)); + var sec = end.sec - start.sec; + var ns_diff: i64 = `@as`(i64, `@intCast`(end.nsec)) - `@as`(i64, `@intCast`(start.nsec)); + if (ns_diff < 0) { + sec -= 1; + ns_diff += 1_000_000_000; + } + const total_ns: u64 = `@as`(u64, `@intCast`(sec)) * 1_000_000_000 + `@as`(u64, `@intCast`(ns_diff)); return total_ns / 1_000; }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

fn usElapsed(start: linux.timespec, end: linux.timespec) u64 {

const s: u64 = @intCast(end.sec - start.sec);

const ns_diff: i64 = @as(i64, @intCast(end.nsec)) - @as(i64, @intCast(start.nsec));

const total_ns: u64 = s * 1_000_000_000 + @as(u64, @intCast(ns_diff));

return total_ns / 1_000;

fn usElapsed(start: linux.timespec, end: linux.timespec) u64 {

var sec = end.sec - start.sec;

var ns_diff: i64 = `@as`(i64, `@intCast`(end.nsec)) - `@as`(i64, `@intCast`(start.nsec));

if (ns_diff < 0) {

sec -= 1;

ns_diff += 1_000_000_000;

}

const total_ns: u64 = `@as`(u64, `@intCast`(sec)) * 1_000_000_000 + `@as`(u64, `@intCast`(ns_diff));

return total_ns / 1_000;

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/autobench.zig` around lines 28 - 32, The usElapsed function incorrectly casts a negative ns_diff to u64 causing huge values when end.nsec < start.nsec; fix by normalizing the borrow: compute s and ns_diff as signed values, and if ns_diff < 0 then decrement s by 1 and add 1_000_000_000 to ns_diff so ns_diff becomes non-negative, then compute total_ns = s * 1_000_000_000 + ns_diff using appropriate casts to u64 (ensure s is non-negative before the final cast) and return total_ns / 1_000 to preserve microseconds; update the variables in the usElapsed function (s, ns_diff, total_ns) accordingly.

coderabbitai · 2026-03-24T07:14:23Z

src/autobench.zig

+    for (0..TOTAL_RUNS) |r| {
+        var map = HybridElasticHash.init(allocator, n) catch @panic("init failed");


⚠️ Potential issue | 🟠 Major

These “load” rows are not measuring the same capacity model as Abseil.

benchOne() labels load as fill / n, but HybridElasticHash.init(allocator, n) allocates tier 0 plus extra overflow tiers (src/hybrid.zig:347-370). So the elastic side owns substantially more slots than n, while absl::flat_hash_map::reserve(n) has a different spare-capacity curve. That makes the reported lookup/delete gaps materially apples-to-oranges until capacity is normalized or the effective physical load is reported.

Also applies to: 157-164

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/autobench.zig` around lines 59 - 60, The reported "load" is inconsistent because benchOne() uses fill / n while HybridElasticHash.init(allocator, n) creates tier0 plus overflow tiers (HybridElasticHash.init), giving more physical slots than n; fix by normalizing capacity: either adjust the init parameter to allocate only the equivalent physical slots that match absl::flat_hash_map::reserve(n), or (preferable) query the hybrid map's actual capacity after init (e.g., call its capacity/size method) and use fill / effective_capacity when labeling load in benchOne(); apply the same normalization change wherever HybridElasticHash.init(...) is used for benchmarking (including the other occurrence mentioned) so Abseil reserve(n) and HybridElasticHash capacity are compared on the same physical-slot basis.

coderabbitai · 2026-03-24T07:14:23Z

src/string_hybrid.zig

+    pub fn remove(self: *Self, key: []const u8) bool {
+        const h = hash(key);
+        const fp = fingerprint(h);
+        const mask = self.tier0_bucket_mask;
+        const bucket_base = h >> self.tier0_bucket_shift;
+
+        var probe: usize = 0;
+        while (probe < MAX_PROBES) : (probe += 1) {
+            const bucket_idx = (bucket_base +% @as(u64, probe)) & mask;
+            if (self.findKeyInBucket(bucket_idx, key, fp)) |slot| {
+                self.fingerprints[bucket_idx][slot] = TOMBSTONE;
+                self.count -= 1;
+                return true;
+            }
+        }
+        return false;
+    }


⚠️ Potential issue | 🟠 Major

remove() only searches tier 0, same issue as HybridElasticHash.

This mirrors the same limitation in src/hybrid.zig:611-628. Elements inserted into tier 1+ will not be found for removal.

🐛 Proposed fix - search overflow tier

pub fn remove(self: *Self, key: []const u8) bool { const h = hash(key); const fp = fingerprint(h); const mask = self.tier0_bucket_mask; const bucket_base = h >> self.tier0_bucket_shift; var probe: usize = 0; while (probe < MAX_PROBES) : (probe += 1) { const bucket_idx = (bucket_base +% `@as`(u64, probe)) & mask; if (self.findKeyInBucket(bucket_idx, key, fp)) |slot| { self.fingerprints[bucket_idx][slot] = TOMBSTONE; self.count -= 1; return true; } } + return self.removeOverflow(h, key, fp); + } + + fn removeOverflow(self: *Self, h: u64, key: []const u8, fp: u8) bool { + if (self.num_tiers <= 1) return false; + const num_buckets = self.tier_bucket_counts[1]; + const tier_start = self.tier_starts[1]; + for (0..@min(MAX_PROBES, num_buckets)) |probe| { + const abs_idx = tier_start + bucketIndex(h, probe, num_buckets); + if (self.findKeyInBucket(abs_idx, key, fp)) |slot| { + self.fingerprints[abs_idx][slot] = TOMBSTONE; + self.count -= 1; + return true; + } + if (matchEmpty(&self.fingerprints[abs_idx]) != 0) return false; + } return false; }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/string_hybrid.zig` around lines 314 - 330, The remove() implementation only searches tier 0; update it to scan all tiers (tier 0 through the overflow tiers) the same way insertion/search does: loop over each tier index (e.g., 0 .. self.tier_count), for each tier use that tier's bucket mask/shift (instead of always tier0_bucket_mask/tier0_bucket_shift) to compute bucket_base and run the same probe loop (probe < MAX_PROBES) calling findKeyInBucket(bucket_idx, key, fp); when a slot is found, write TOMBSTONE into the correct fingerprints array for that tier (e.g., self.fingerprints[tier][bucket_idx][slot]), decrement self.count and return true; if no tiers find the key return false. Ensure you reference and use the same symbols: remove, findKeyInBucket, fingerprint, MAX_PROBES, TOMBSTONE, and self.count.

…, sizes, despite slower hash

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (1)

verify-strings.cpp (1)

103-106: Fail explicitly on unexpected misses before dereferencing the iterator.

These loops turn a lookup miss into UB via it->second. In a verification binary, that should be a clear failure instead.

Minimal hardening pattern

+// add near the top of the file
+#include <cstdlib>
...
-                auto it = map.find(keys[order[i]]);
-                do_not_optimize(it->second);
+                auto it = map.find(keys[order[i]]);
+                if (it == map.end()) std::abort();
+                do_not_optimize(it->second);

Apply the same guard in the keys, sv_keys, and owned_keys lookup loops.

Also applies to: 147-149, 193-195, 210-212

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@verify-strings.cpp` around lines 103 - 106, The lookup loops use auto it =
map.find(keys[order[i]]) and then immediately access it->second which leads to
UB on misses; update each lookup loop (the ones iterating over keys, sv_keys,
and owned_keys that use map.find, e.g., the loop with variables order, fill, and
do_not_optimize) to check if it == map.end() and, if so, fail explicitly (e.g.,
abort(), throw runtime_error, or call a test-failure helper) with a clear
message before dereferencing; do this guard in all similar loops (the
occurrences around the current find/dereference pairs) so unexpected misses
produce a deterministic failure instead of undefined behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@program-v4-verify-strings.md`:
- Around line 16-17: The checklist under "### 4. Different table sizes" is
inconsistent with the experiment list — update the experiment/checklist so both
the section header and the detailed experiment list include the same set of
sizes: 100K, 500K, 1M, 2M, and 4M (remove the duplicate/missing 1M/4M mismatch).
Locate the "### 4. Different table sizes" heading and the corresponding
experiment list block (the checklist around the experiment items) and make them
identical so someone following either will run 100K, 500K, 1M, 2M, and 4M.

In `@src/verify-strings.zig`:
- Around line 165-170: The verification currently prints counts but still exits
successfully; modify the end-of-run logic in verify-strings.zig so that after
printing the "Inserted/Found/Correct" counts you check whether found == fill and
correct == fill (or simply correct == fill) and if not, terminate with a
non-zero exit (e.g., call std.os.exit(1) or return a non-zero status from main)
instead of letting execution continue to the final "DONE" print; use the
existing local variables fill, found, correct and replace or augment the final
std.debug.print("DONE", .{}) path to return/exit non‑zero on failure and only
print/exit 0 when verification passes.
- Around line 73-79: Prevalidate that every shuffled lookup is actually a hit
before you measure timing: before capturing start in the loop that iterates
shuffle_order and calls map.get(key_buf[ki * klen .. (ki + 1) * klen]), run a
lightweight validation pass over the same shuffled indices to ensure map.get
returns the expected value/Some and that the retrieved value matches the
expected 16-byte key (use existing key_buf, klen, shuffle_order and
doNotOptimizeAway to avoid optimizer removal); fail fast (panic or record an
error) if any lookup is a miss so the timed section (now(), usElapsed, times,
WARMUP) only measures confirmed hits.
- Around line 37-40: The usElapsed function currently computes ns_diff and casts
it to u64, which overflows when start.nsec > end.nsec; update usElapsed to
compute a signed second difference (e.g., s_diff = end.sec - start.sec as a
signed integer) and a signed ns_diff, then if ns_diff < 0 decrement s_diff by 1
and add 1_000_000_000 to ns_diff so both parts are non-negative; finally compute
microseconds by converting s_diff and adjusted ns_diff to u64 (seconds *
1_000_000 + ns_diff / 1_000) to avoid casting negative values to u64 and to
handle second-boundary crossings correctly in usElapsed.

In `@verify-strings-results.md`:
- Around line 63-66: The summary line that claims "36-97% faster on shuffled hit
lookups at 1M" is inconsistent with the benchmark details; update the
sentence(s) referencing "shuffled hit lookup" so they correctly state whether
the numbers came from ordered-access or shuffled-access benchmarks (e.g.,
replace the contradictory line 63 text with a clear statement that the 36–97%
figures are from ordered-access and add the correct shuffled-access numbers:
1.24x at 16B keys 50% load, 1.35x at 16B keys 50% load 1M size, and 1.62x at
256B keys), and ensure the exact quoted phrase "36-97% faster on shuffled hit
lookups at 1M" is either removed or moved to an ordered-access context to
eliminate the contradiction.

---

Nitpick comments:
In `@verify-strings.cpp`:
- Around line 103-106: The lookup loops use auto it = map.find(keys[order[i]])
and then immediately access it->second which leads to UB on misses; update each
lookup loop (the ones iterating over keys, sv_keys, and owned_keys that use
map.find, e.g., the loop with variables order, fill, and do_not_optimize) to
check if it == map.end() and, if so, fail explicitly (e.g., abort(), throw
runtime_error, or call a test-failure helper) with a clear message before
dereferencing; do this guard in all similar loops (the occurrences around the
current find/dereference pairs) so unexpected misses produce a deterministic
failure instead of undefined behavior.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dde4a2e8-9a45-4468-89dc-5ebfb049739d

📥 Commits

Reviewing files that changed from the base of the PR and between 27303d9 and 06ba5b9.

📒 Files selected for processing (5)

program-v4-verify-strings.md
src/verify-strings.zig
verify-strings-abseil
verify-strings-results.md
verify-strings.cpp

coderabbitai · 2026-03-24T08:17:39Z

program-v4-verify-strings.md

+### 4. Different table sizes
+We only tested 1M. Need 100K, 500K, 2M, 4M.


⚠️ Potential issue | 🟡 Minor

Keep the checklist aligned with the gap you call out above.

Lines 17 and 35 disagree: section 4 says 1M is already covered and 4M is still missing, but the experiment list reintroduces 1M and drops 4M. Someone following only the checklist would miss the largest-size validation.

Also applies to: 33-35

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@program-v4-verify-strings.md` around lines 16 - 17, The checklist under "### 4. Different table sizes" is inconsistent with the experiment list — update the experiment/checklist so both the section header and the detailed experiment list include the same set of sizes: 100K, 500K, 1M, 2M, and 4M (remove the duplicate/missing 1M/4M mismatch). Locate the "### 4. Different table sizes" heading and the corresponding experiment list block (the checklist around the experiment items) and make them identical so someone following either will run 100K, 500K, 1M, 2M, and 4M.

coderabbitai · 2026-03-24T08:17:39Z

src/verify-strings.zig

+fn usElapsed(start: linux.timespec, end: linux.timespec) u64 {
+    const s: u64 = @intCast(end.sec - start.sec);
+    const ns_diff: i64 = @as(i64, @intCast(end.nsec)) - @as(i64, @intCast(start.nsec));
+    return (s * 1_000_000_000 + @as(u64, @intCast(ns_diff))) / 1_000;


⚠️ Potential issue | 🟠 Major

Fix usElapsed() for second-boundary crossings.

Line 40 casts a possibly negative nanosecond delta to u64. If start.nsec > end.nsec, the elapsed time wraps to a huge value and can poison the median.

Suggested fix

fn usElapsed(start: linux.timespec, end: linux.timespec) u64 { - const s: u64 = `@intCast`(end.sec - start.sec); - const ns_diff: i64 = `@as`(i64, `@intCast`(end.nsec)) - `@as`(i64, `@intCast`(start.nsec)); - return (s * 1_000_000_000 + `@as`(u64, `@intCast`(ns_diff))) / 1_000; + const sec_ns: i128 = + (`@as`(i128, `@intCast`(end.sec)) - `@as`(i128, `@intCast`(start.sec))) * 1_000_000_000; + const ns_diff: i128 = + `@as`(i128, `@intCast`(end.nsec)) - `@as`(i128, `@intCast`(start.nsec)); + return `@intCast`((sec_ns + ns_diff) / 1_000); }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

fn usElapsed(start: linux.timespec, end: linux.timespec) u64 {

const s: u64 = @intCast(end.sec - start.sec);

const ns_diff: i64 = @as(i64, @intCast(end.nsec)) - @as(i64, @intCast(start.nsec));

return (s * 1_000_000_000 + @as(u64, @intCast(ns_diff))) / 1_000;

fn usElapsed(start: linux.timespec, end: linux.timespec) u64 {

const sec_ns: i128 =

(`@as`(i128, `@intCast`(end.sec)) - `@as`(i128, `@intCast`(start.sec))) * 1_000_000_000;

const ns_diff: i128 =

`@as`(i128, `@intCast`(end.nsec)) - `@as`(i128, `@intCast`(start.nsec));

return `@intCast`((sec_ns + ns_diff) / 1_000);

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/verify-strings.zig` around lines 37 - 40, The usElapsed function currently computes ns_diff and casts it to u64, which overflows when start.nsec > end.nsec; update usElapsed to compute a signed second difference (e.g., s_diff = end.sec - start.sec as a signed integer) and a signed ns_diff, then if ns_diff < 0 decrement s_diff by 1 and add 1_000_000_000 to ns_diff so both parts are non-negative; finally compute microseconds by converting s_diff and adjusted ns_diff to u64 (seconds * 1_000_000 + ns_diff / 1_000) to avoid casting negative values to u64 and to handle second-boundary crossings correctly in usElapsed.

coderabbitai · 2026-03-24T08:17:39Z

src/verify-strings.zig

+        const start = now();
+        for (0..fill) |i| {
+            const ki = shuffle_order[i];
+            std.mem.doNotOptimizeAway(map.get(key_buf[ki * klen .. (ki + 1) * klen]));
+        }
+        const end = now();
+        if (r >= WARMUP) times[r - WARMUP] = usElapsed(start, end);


⚠️ Potential issue | 🟠 Major

Prevalidate shuffled hits before timing.

Line 76 benchmarks the raw ?u64, so a miss can still look like a valid sample. Test 4 only validates 16-byte keys, which leaves the length and size sweeps unverified.

One low-overhead way to fail fast

+ for (0..fill) |i| { + const ki = shuffle_order[i]; + const v = map.get(key_buf[ki * klen .. (ki + 1) * klen]) orelse `@panic`("unexpected miss"); + if (v != ki) `@panic`("unexpected value"); + } + const start = now(); for (0..fill) |i| { const ki = shuffle_order[i]; std.mem.doNotOptimizeAway(map.get(key_buf[ki * klen .. (ki + 1) * klen])); }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const start = now();

for (0..fill) |i| {

const ki = shuffle_order[i];

std.mem.doNotOptimizeAway(map.get(key_buf[ki * klen .. (ki + 1) * klen]));

}

const end = now();

if (r >= WARMUP) times[r - WARMUP] = usElapsed(start, end);

for (0..fill) |i| {

const ki = shuffle_order[i];

const v = map.get(key_buf[ki * klen .. (ki + 1) * klen]) orelse `@panic`("unexpected miss");

if (v != ki) `@panic`("unexpected value");

}

const start = now();

for (0..fill) |i| {

const ki = shuffle_order[i];

std.mem.doNotOptimizeAway(map.get(key_buf[ki * klen .. (ki + 1) * klen]));

}

const end = now();

if (r >= WARMUP) times[r - WARMUP] = usElapsed(start, end);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/verify-strings.zig` around lines 73 - 79, Prevalidate that every shuffled lookup is actually a hit before you measure timing: before capturing start in the loop that iterates shuffle_order and calls map.get(key_buf[ki * klen .. (ki + 1) * klen]), run a lightweight validation pass over the same shuffled indices to ensure map.get returns the expected value/Some and that the retrieved value matches the expected 16-byte key (use existing key_buf, klen, shuffle_order and doNotOptimizeAway to avoid optimizer removal); fail fast (panic or record an error) if any lookup is a miss so the timed section (now(), usElapsed, times, WARMUP) only measures confirmed hits.

coderabbitai · 2026-03-24T08:17:39Z

src/verify-strings.zig

+        std.debug.print("  Inserted: {d}\n", .{fill});
+        std.debug.print("  Found:    {d} ({d:.1}%)\n", .{ found, @as(f64, @floatFromInt(found)) / @as(f64, @floatFromInt(fill)) * 100 });
+        std.debug.print("  Correct:  {d} ({d:.1}%)\n\n", .{ correct, @as(f64, @floatFromInt(correct)) / @as(f64, @floatFromInt(fill)) * 100 });
+    }
+
+    std.debug.print("DONE\n", .{});


⚠️ Potential issue | 🟠 Major

Return a failing status when verification fails.

This block prints the counts but still exits 0 and reaches DONE, so automation cannot distinguish a verified run from a broken one.

Suggested fix

std.debug.print(" Inserted: {d}\n", .{fill}); std.debug.print(" Found: {d} ({d:.1}%)\n", .{ found, `@as`(f64, `@floatFromInt`(found)) / `@as`(f64, `@floatFromInt`(fill)) * 100 }); std.debug.print(" Correct: {d} ({d:.1}%)\n\n", .{ correct, `@as`(f64, `@floatFromInt`(correct)) / `@as`(f64, `@floatFromInt`(fill)) * 100 }); + if (found != fill or correct != fill) { + return error.VerificationFailed; + } } std.debug.print("DONE\n", .{});

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

std.debug.print(" Inserted: {d}\n", .{fill});

std.debug.print(" Found: {d} ({d:.1}%)\n", .{ found, @as(f64, @floatFromInt(found)) / @as(f64, @floatFromInt(fill)) * 100 });

std.debug.print(" Correct: {d} ({d:.1}%)\n\n", .{ correct, @as(f64, @floatFromInt(correct)) / @as(f64, @floatFromInt(fill)) * 100 });

}

std.debug.print("DONE\n", .{});

std.debug.print(" Inserted: {d}\n", .{fill});

std.debug.print(" Found: {d} ({d:.1}%)\n", .{ found, `@as`(f64, `@floatFromInt`(found)) / `@as`(f64, `@floatFromInt`(fill)) * 100 });

std.debug.print(" Correct: {d} ({d:.1}%)\n\n", .{ correct, `@as`(f64, `@floatFromInt`(correct)) / `@as`(f64, `@floatFromInt`(fill)) * 100 });

if (found != fill or correct != fill) {

return error.VerificationFailed;

}

}

std.debug.print("DONE\n", .{});

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/verify-strings.zig` around lines 165 - 170, The verification currently prints counts but still exits successfully; modify the end-of-run logic in verify-strings.zig so that after printing the "Inserted/Found/Correct" counts you check whether found == fill and correct == fill (or simply correct == fill) and if not, terminate with a non-zero exit (e.g., call std.os.exit(1) or return a non-zero status from main) instead of letting execution continue to the final "DONE" print; use the existing local variables fill, found, correct and replace or augment the final std.debug.print("DONE", .{}) path to return/exit non‑zero on failure and only print/exit 0 when verification passes.

coderabbitai · 2026-03-24T08:17:39Z

verify-strings-results.md

+The claim "36-97% faster on shuffled hit lookups at 1M" was from the ordered-access benchmark. With shuffled access the numbers are:
+- **1.24x** at 16B keys, 50% load (24% faster)
+- **1.35x** at 16B keys, 50% load, 1M size (35% faster)
+- **1.62x** at 256B keys (62% faster)


⚠️ Potential issue | 🟡 Minor

Clarify whether this headline came from ordered or shuffled access.

Line 63 says the "shuffled hit lookup" claim came from an ordered-access benchmark. That sentence contradicts itself and makes the summary hard to trust.

Possible wording

-The claim "36-97% faster on shuffled hit lookups at 1M" was from the ordered-access benchmark. With shuffled access the numbers are: +The 36-97% headline came from the 1M shuffled-hit benchmark across load factors. For the specific shuffled scenarios above, the numbers are:

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

The claim "36-97% faster on shuffled hit lookups at 1M" was from the ordered-access benchmark. With shuffled access the numbers are:

- **1.24x** at 16B keys, 50% load (24% faster)

- **1.35x** at 16B keys, 50% load, 1M size (35% faster)

- **1.62x** at 256B keys (62% faster)

The 36-97% headline came from the 1M shuffled-hit benchmark across load factors. For the specific shuffled scenarios above, the numbers are:

- **1.24x** at 16B keys, 50% load (24% faster)

- **1.35x** at 16B keys, 50% load, 1M size (35% faster)

- **1.62x** at 256B keys (62% faster)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@verify-strings-results.md` around lines 63 - 66, The summary line that claims "36-97% faster on shuffled hit lookups at 1M" is inconsistent with the benchmark details; update the sentence(s) referencing "shuffled hit lookup" so they correctly state whether the numbers came from ordered-access or shuffled-access benchmarks (e.g., replace the contradictory line 63 text with a clear statement that the 36–97% figures are from ordered-access and add the correct shuffled-access numbers: 1.24x at 16B keys 50% load, 1.35x at 16B keys 50% load 1M size, and 1.62x at 256B keys), and ensure the exact quoted phrase "36-97% faster on shuffled hit lookups at 1M" is either removed or moved to an ordered-access context to eliminate the contradiction.

joshuaisaact added 30 commits March 23, 2026 01:02

tier-first search order in get() for better cache locality

a4a2c7a

Revert "tier-first search order in get() for better cache locality"

25c5de2

This reverts commit a4a2c7a.

cross-tier prefetching in get() to hide inter-tier latency

330ddb6

Revert "cross-tier prefetching in get() to hide inter-tier latency"

d5d671a

This reverts commit 330ddb6.

remove all prefetching from get() to test if it helps or hurts

f9fc247

add tier-0 probe-0 fast path in get()

18b4ce8

Revert "add tier-0 probe-0 fast path in get()"

446e450

This reverts commit 18b4ce8.

reduce MAX_PROBES from 32 to 24

b48b0a3

reduce MAX_PROBES from 24 to 20

6bbdd09

cap get() tier search to 8 tiers

6ca9ddc

reduce MAX_LOOKUP_TIERS from 8 to 6

d42e9ee

Revert "reduce MAX_LOOKUP_TIERS from 8 to 6"

33c9feb

This reverts commit d42e9ee.

use fixed-size arrays for tier metadata instead of heap slices

9530f6c

Revert "use fixed-size arrays for tier metadata instead of heap slices"

1c52805

This reverts commit 9530f6c.

switch hash to Stafford variant 13 (splitmix64 finalizer)

907f4f8

Revert "switch hash to Stafford variant 13 (splitmix64 finalizer)"

a8629ca

This reverts commit 907f4f8.

switch to linear probing for better cache locality

bbf67f0

add early exit on empty bucket slots in get()

6e5b3dc

Revert "add early exit on empty bucket slots in get()"

e580524

This reverts commit 6e5b3dc.

combined fp+empty SIMD check with per-tier early exit

5f62379

Revert "combined fp+empty SIMD check with per-tier early exit"

ac8b602

This reverts commit 5f62379.

double tier 0 size to concentrate elements for faster lookup

6f57a0a

quadruple tier 0 size (2x capacity / BUCKET_SIZE)

f7328f9

Revert "quadruple tier 0 size (2x capacity / BUCKET_SIZE)"

be1c046

This reverts commit f7328f9.

reduce MAX_LOOKUP_TIERS from 8 to 4 with larger tier 0

710938e

reduce MAX_LOOKUP_TIERS from 4 to 3

3d427e3

reduce MAX_LOOKUP_TIERS from 3 to 2

6188a77

reduce MAX_LOOKUP_TIERS to 1 (tier 0 only)

c9ea0fe

simplify get() to single tier 0 loop, no tier iteration

8c9b749

Revert "simplify get() to single tier 0 loop, no tier iteration"

eaf5bbe

This reverts commit 8c9b749.

joshuaisaact added 24 commits March 24, 2026 05:39

Revert "limit getSlowPath to tier 1 only (tiers 2+ are empty at 99% l…

65aee5b

…oad)" This reverts commit b725080.

Reapply "limit getSlowPath to tier 1 only (tiers 2+ are empty at 99% …

48345e4

…load)" This reverts commit 65aee5b.

restore tier-0-only get() as clean base for insert-side experiments

0c05a9b

MAX_PROBES 7 -> 8 (more elements fit in tier 0)

2696554

batch threshold 0.05 (95% fill) with MAX_PROBES=8

93a99b8

Revert "batch threshold 0.05 (95% fill) with MAX_PROBES=8"

e5abd48

This reverts commit 93a99b8.

Reapply "batch threshold 0.05 (95% fill) with MAX_PROBES=8"

3a03edf

This reverts commit e5abd48.

restore clean v3 baseline (MAX_PROBES=7, threshold=0.12)

c6f6467

paper-faithful get(): noinline cold-hinted getOtherTiers for tier 1+

7736f8b

use opaque function pointer for tier 1+ overflow (prevent LLVM codege…

b2448b0

…n cascade)

limit overflow to tier 1 only (tiers 2+ empty at 99% load)

906ae65

log v3 experiments

776cd39

early termination in overflow function (safe with 100% find rate)

6f7d750

early termination in tier-0 loop (jump to overflow on empty slot)

32dd07a

Revert "early termination in tier-0 loop (jump to overflow on empty s…

9837b5d

…lot)" This reverts commit 32dd07a.

log v3 early termination experiments

1c47413

v3 insights: elastic hash 40-65% faster than abseil at normal loads

94c1d05

add verification program: are these results real?

ad1bfed

verification checks 1-3: capacity, shuffled access, size independence

ab25178

complete verification: realistic workloads, hash cost, shuffled access

71f0fb9

update README and PR with verified benchmark results

7e7a711

verify: gcc vs clang abseil - no unfair compiler advantage

65683f7

coderabbitai bot reviewed Mar 24, 2026

View reviewed changes

comprehensive string key verification: advantage holds across lengths…

06ba5b9

…, sizes, despite slower hash

joshuaisaact mentioned this pull request Mar 24, 2026

cross-language benchmark: elastic hash 2-3x faster on both x86 and Apple M4 #4

Open

coderabbitai bot reviewed Mar 24, 2026

View reviewed changes

joshuaisaact mentioned this pull request Mar 24, 2026

abseil-inspired fast string hash: 1.87x at 50% load (up from 1.74x) #5

Open

		echo "=== Primary metric (lookup_ratio at n=1048576) ==="
		grep "RESULT" bench.log \| grep "n=1048576" \| grep -oP 'lookup_ratio=\K[0-9.]+'

		for (0..TOTAL_RUNS) \|r\| {
		var map = HybridElasticHash.init(allocator, n) catch @panic("init failed");

		### 4. Different table sizes
		We only tested 1M. Need 100K, 500K, 2M, 4M.

Conversation

joshuaisaact commented Mar 24, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What

Results

Shuffled hit lookup at n=1,048,576 (string_view keys, verified)

Why strings help more

References

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

joshuaisaact commented Mar 24, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 24, 2026 •

edited

Loading