Skip to content

string-key elastic hash: 36-97% faster than abseil on lookup#3

Open
joshuaisaact wants to merge 190 commits intomainfrom
autoresearch/string-keys
Open

string-key elastic hash: 36-97% faster than abseil on lookup#3
joshuaisaact wants to merge 190 commits intomainfrom
autoresearch/string-keys

Conversation

@joshuaisaact
Copy link
Copy Markdown
Owner

@joshuaisaact joshuaisaact commented Mar 24, 2026

Why

The u64 benchmark showed elastic hash 15-20% faster than abseil at normal loads. But most real hash tables use string keys. This PR answers: does the advantage hold with strings?

What

StringElasticHash (src/string_hybrid.zig): Same tiered SIMD architecture as HybridElasticHash, adapted for []const u8 keys with wyhash. Stores key pointer + length + value per entry.

Benchmarks: 16-byte hex string keys (splitmix64 → hex), both sides use string_view semantics (no copy). Shuffled access verified.

Results

Shuffled hit lookup at n=1,048,576 (string_view keys, verified)

Load Gap Meaning
10% 1.97 Elastic 97% faster
25% 1.86 Elastic 86% faster
50% 1.74 Elastic 74% faster
75% 1.61 Elastic 61% faster
90% 1.50 Elastic 50% faster
99% 1.36 Elastic 36% faster

The advantage is much larger than u64 keys (which showed 15-20% shuffled). With strings, fingerprint filtering eliminates expensive 16-byte memcmp comparisons that abseil has to perform on every probe match. Delete is still 2-3x faster. Miss lookups are still 2-3x slower (structural).

Why strings help more

  1. Fingerprint filtering is more valuable. With u64 keys, a false-positive key comparison costs ~1 cycle (single compare). With string keys, it costs ~5-10 cycles (memcmp). Our 8-bit fingerprints (1/254 false positive rate) filter out most of these.
  2. Hash cost is equalized. Both sides use wyhash variants for strings, removing the hash-speed advantage we had with u64.
  3. Cache density still matters. Our 1MB tier-0 fingerprints vs abseil's 2MB control bytes — same L2 vs L3 story as u64.

References

Summary by CodeRabbit

  • New Features

    • Added a string-keyed elastic hash map and a faster tier-0 lookup fast path
    • Multiple new realistic and string-key benchmarks comparing against Abseil
  • Tests

    • New tests and verification programs for string-hash correctness and performance
  • Documentation

    • README rewritten with Abseil comparisons, architecture, and optimization logs
    • Extensive insights, verification, and experiment program documentation added
  • Build

    • New benchmark and test build targets and automated benchmark runner scripts
  • Chores

    • Updated .gitignore to exclude additional benchmark/build artifacts

- src/string_hybrid.zig: StringElasticHash with wyhash, same tiered architecture
- src/autobench-strings.zig: Zig benchmark with 16-byte hex keys
- bench-abseil-strings.cpp: abseil benchmark with string_view keys
- bench-strings.sh: comparison runner
- build.zig: added string test and benchmark steps
… access

Shuffled hit lookup gaps at 1M (string_view keys, 16-byte hex):
  10%: 1.97x  25%: 1.86x  50%: 1.74x  75%: 1.61x  90%: 1.50x  99%: 1.36x

The advantage is LARGER than u64 keys because:
- Fingerprint filtering saves expensive 16-byte memcmp comparisons
- Both sides pay more for wyhash, but our table structure saves more per probe
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 24, 2026

📝 Walkthrough

Walkthrough

Adds multi-language benchmarking and verification infrastructure comparing an "elastic" Zig hash against Abseil, introduces a string-keyed elastic map, refactors hybrid get() with a tier-0 fast path and overflow hook, and supplies many automation scripts, capacity/hash verification tools, and experimental documentation.

Changes

Cohort / File(s) Summary
Ignore & Build
\.gitignore, build.zig
Ignore additional build/benchmark artifacts; add Zig build targets autobench, autobench-strings, and a new test dependency for string_hybrid.zig.
Top-level Documentation & Programs
README.md, AGENTS.md, program.md, program-v2.md, program-v3.md, program-v3-verify.md, program-v4-verify-strings.md
New and revised README and contributor/experiment programs specifying frozen files, experiment loops, benchmarking setup, and acceptance criteria.
Insights / Results / Verification Docs
results.md, insights.md, insights-v2.md, insights-v3.md, verify-results.md, verify-strings-results.md
Extensive experiment logs, optimization notes, verification findings, and summarized performance claims across multiple autoresearch phases.
C++ Benchmarks (Abseil & Verification)
bench-abseil.cpp, bench-abseil-strings.cpp, bench-realistic.cpp, bench-verify.cpp, bench-strings-verify.cpp, bench-strings-*.cpp
Added multiple C++ benchmark programs measuring absl::flat_hash_map for integer and string keys across hit/miss/insert/delete and realistic workloads, with deterministic key generators and median-of-runs reporting.
C++ Analysis Utilities
verify-capacity.cpp, verify-hash-cost.cpp, verify-*.cpp
Tools to inspect Abseil capacity behavior and to measure per-hash cost for comparison.
Zig Benchmark Executables
src/autobench.zig, src/autobench-verify.zig, src/autobench-strings.zig, src/autobench-strings-verify.zig, src/autobench-realistic.zig
Five Zig benchmark programs driving HybridElasticHash and StringElasticHash across ordered/shuffled and realistic workloads, reporting median timings.
Benchmark Automation Scripts
bench.sh, bench-v2.sh, bench-strings.sh
Shell scripts to build/run Abseil and Zig benchmarks, collect logs, parse RESULT lines, and compute per-operation gap ratios.
Core Hybrid Refactor
src/hybrid.zig
Refactored HybridElasticHash: changed storage to per-bucket Entry arrays, reduced MAX_PROBES from 100→7, added tier-0 fast-path probing with tier0_bucket_mask/shift, replaced nested multi-tier probe loops with an overflow function pointer for tier-1+, adjusted insertion/removal logic and thresholds.
String-Keyed Implementation
src/string_hybrid.zig
New StringElasticHash implementation (pub API): multi-tier layout, Wyhash-based hashing, 8-bit fingerprints, SIMD-style fingerprint matching, overflow hook, insertion/get/remove APIs, and tests.
Verify / Utility Zig Sources
src/verify-strings.zig, src/...
Added Zig verification for string-key correctness and hashing behavior; deterministic key creation and measurement harnesses.

Sequence Diagram(s)

sequenceDiagram
  participant Script as bench-v2.sh (runner)
  participant Build as Builder (g++, zig)
  participant Abseil as bench-abseil (C++)
  participant Elastic as autobench (Zig)
  participant Logs as Logs (abseil-v2.log / elastic-v2.log)

  Script->>Build: compile `bench-abseil.cpp` (pkg-config, g++)
  Script->>Build: build `autobench` (`zig build autobench`)
  Build-->>Abseil: produces `bench-abseil`
  Build-->>Elastic: produces `autobench` binary
  Script->>Abseil: run, capture stdout → `abseil-v2.log`
  Script->>Elastic: run, capture stderr → `elastic-v2.log`
  Abseil-->>Logs: append RESULT lines
  Elastic-->>Logs: append RESULT lines
  Script->>Logs: parse RESULT lines, match by `n`/`load`
  Script->>Script: compute gap ratios and print comparison table
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐇 I debugged and probed by moonlight's gleam,

Buckets sharpened, fingerprints in stream,
Scripts that race and C++ that sings,
Zig's elastic leaps on nimble springs,
A rabbit cheers: new benches and dreams!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 18.60% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately captures the main addition: a string-key elastic hash implementation that is significantly faster than Abseil on lookup operations, with specific performance improvements quantified (36-97% range).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch autoresearch/string-keys

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 17

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/hybrid.zig (1)

611-628: ⚠️ Potential issue | 🔴 Critical

remove() only searches tier 0, leaving tier 1+ elements inaccessible.

The remove() function (lines 611-628) probes only tier 0, but elements overflow to tier 1 when the tier 0 bucket fills up during insertion (line 222: if (tier + 1 < num_tiers)). Unlike get() which delegates to defaultGetOverflow() for tier 1+ lookups, remove() returns false immediately, effectively making any element in tier 1 unretrievable.

The test "hybrid remove at high load" (line 743) inserts 800 elements (78% load) and expects remove() to succeed on 400 removals (line 754). At this load factor with MAX_PROBES=7, some insertions will cascade to tier 1, causing the test to fail or creating correctness issues where removed elements appear gone via get() but are still counted in map.count.

Add a removeOverflow() function to search tier 1+ tiers, mirroring the pattern used in defaultGetOverflow():

Proposed fix
    pub fn remove(self: *Self, key: u64) bool {
        const h = hash(key);
        const fp = fingerprint(h);
        const mask = self.tier0_bucket_mask;
        const bucket_base = h >> self.tier0_bucket_shift;

        var probe: usize = 0;
        while (probe < MAX_PROBES) : (probe += 1) {
            const bucket_idx = (bucket_base +% `@as`(u64, probe)) & mask;

            if (self.findKeyInBucket(bucket_idx, key, fp)) |slot| {
                self.fingerprints[bucket_idx][slot] = TOMBSTONE;
                self.count -= 1;
                return true;
            }
        }
+        return self.removeOverflow(h, key, fp);
+    }
+
+    fn removeOverflow(self: *Self, h: u64, key: u64, fp: u8) bool {
+        if (self.num_tiers <= 1) return false;
+        const num_buckets = self.tier_bucket_counts[1];
+        const tier_start = self.tier_starts[1];
+        for (0..@min(MAX_PROBES, num_buckets)) |probe| {
+            const abs_idx = tier_start + bucketIndex(h, probe, num_buckets);
+            if (self.findKeyInBucket(abs_idx, key, fp)) |slot| {
+                self.fingerprints[abs_idx][slot] = TOMBSTONE;
+                self.count -= 1;
+                return true;
+            }
+            if (matchEmpty(&self.fingerprints[abs_idx]) != 0) return false;
+        }
         return false;
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/hybrid.zig` around lines 611 - 628, remove() only searches tier 0 leaving
items that overflowed into tier 1+ unremoved; implement a removeOverflow(self:
*Self, key: u64) bool that mirrors defaultGetOverflow()’s tiered probing logic
and uses findKeyInBucket(bucket_idx, key, fp) to locate the slot in higher
tiers, set the slot to TOMBSTONE and decrement self.count when found, and then
call removeOverflow from remove() when the tier0 probe loop fails to find the
key; make sure removeOverflow uses the same bucket index computation
(bucket_base, probe, mask/shift per tier) and MAX_PROBES behavior so its
semantics match defaultGetOverflow and maintain consistent counting.
🧹 Nitpick comments (9)
program-v2.md (2)

21-39: Add language identifier to pseudo-code block.

📝 Suggested fix
-```
+```text
 get(key):
     h = key * 0x517cc1b727220a95; h ^= h >> 32
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@program-v2.md` around lines 21 - 39, The pseudo-code block lacks a language
identifier which affects syntax highlighting and readability; update the fenced
code block starting with the get(key) snippet to include a language tag (e.g.,
"text" or "pseudo") so tools render it properly, ensuring the block that
contains get(key), h calculation, fp, tier0_bucket_mask, bucket0, the probe loop
(MAX_PROBES=8) and findKeyInBucket description is changed to ```text (or another
appropriate identifier) at the opening fence and keep the rest of the content
unchanged.

125-127: Add language identifier to TSV format block.

📝 Suggested fix
-```
+```tsv
 commit\tlookup_gap_99\tlookup_gap_50\tinsert_gap_99\tstatus\tdescription
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @program-v2.md around lines 125 - 127, The fenced code block containing the
TSV header line
"commit\tlookup_gap_99\tlookup_gap_50\tinsert_gap_99\tstatus\tdescription" lacks
a language identifier; update the opening fence from totsv so the block
is labeled as TSV (e.g., change the code fence that wraps the header line in
program-v2.md to use ```tsv).


</details>

</blockquote></details>
<details>
<summary>program-v3.md (2)</summary><blockquote>

`162-164`: **Add language identifier for TSV format block.**


<details>
<summary>📝 Suggested fix</summary>

```diff
-```
+```tsv
 commit	find_rate	lookup_gap_99	miss_gap_99	insert_gap_99	status	description
 ```
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @program-v3.md around lines 162 - 164, The TSV code block showing the header
line
"commit\tfind_rate\tlookup_gap_99\tmiss_gap_99\tinsert_gap_99\tstatus\tdescription"
needs a language identifier; update the fenced code block that currently uses
to instead usetsv so the block becomes a TSV-marked code fence (i.e.,
change the opening fence for the block containing the header line to ```tsv).


</details>

---

`92-99`: **Consider adding language identifiers to fenced code blocks.**

The memory layout diagrams could use a `text` or `plaintext` language identifier to satisfy markdown linting and improve rendering consistency.


<details>
<summary>📝 Suggested fix</summary>

```diff
-```
+```text
 fingerprints: [tier0 bucket 0][tier0 bucket 1]...[tier1 bucket 0][tier1 bucket 1]...
 ```
 
 Interleave so that tier 0 and tier 1 for the same hash region are adjacent:
-```
+```text
 fingerprints: [tier0 bucket 0][tier1 bucket 0][tier0 bucket 1][tier1 bucket 1]...
 ```
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @program-v3.md around lines 92 - 99, Update the two fenced code blocks that
show the memory layout (the lines starting with "fingerprints: [tier0 bucket
0]...") to include a language identifier (use "text" or "plaintext") on the
opening fence; change the opening fences for the original and interleaved
examples (the blocks containing "fingerprints: [tier0 bucket 0][tier0 bucket
1]..." and "fingerprints: [tier0 bucket 0][tier1 bucket 0]...") from ``` to

src/autobench-strings-verify.zig (1)

45-57: Consider formatting the Fisher-Yates shuffle for readability.

The shuffle logic on line 56 is compressed into a single line, making it harder to read and debug compared to the multi-line version in autobench-verify.zig.

♻️ Suggested formatting
     var rng: u64 = 42;
     var idx: usize = fill;
-    while (idx > 1) { idx -= 1; const j = splitmix64(&rng) % (idx + 1); const tmp = shuffle_order[idx]; shuffle_order[idx] = shuffle_order[j]; shuffle_order[j] = tmp; }
+    while (idx > 1) {
+        idx -= 1;
+        const j = splitmix64(&rng) % (idx + 1);
+        const tmp = shuffle_order[idx];
+        shuffle_order[idx] = shuffle_order[j];
+        shuffle_order[j] = tmp;
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/autobench-strings-verify.zig` around lines 45 - 57, The Fisher–Yates
shuffle inside function bench is written as a single compressed line which
reduces readability; refactor the while loop that mutates shuffle_order, rng,
idx and uses splitmix64 into a multi-line, well-indented block (e.g., decrement
idx, compute j = splitmix64(&rng) % (idx + 1), swap shuffle_order[idx] and
shuffle_order[j]) so it matches the clearer style used in autobench-verify.zig
while keeping the same logic and using the existing symbols (bench,
shuffle_order, splitmix64, rng, idx).
insights.md (1)

10-27: Add language identifier to pseudo-code block.

📝 Suggested fix
-```
+```text
 get(key):
     h = key * 0x517cc1b727220a95; h ^= h >> 32
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@insights.md` around lines 10 - 27, The fenced pseudo-code block lacks a
language identifier; update the block delimiter to include one (e.g., ```text or
```pseudo) so code renderers apply monospaced formatting and no highlighting
errors occur; locate the block that documents get(key) and findKeyInBucket
(mentions MAX_PROBES, probe loop, bucket0, fp) and change the opening ``` to
```text (or another appropriate identifier) to satisfy the reviewer suggestion.
verify-hash-cost.cpp (1)

35-69: Use the median here, not the per-hasher minimum.

The rest of the suite is warmup+median, but this verifier reports the single fastest run for each side. That makes the final ratio much more sensitive to timer noise and weakens the “hash cost is comparable” conclusion.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@verify-hash-cost.cpp` around lines 35 - 69, The timing currently takes the
per-hasher minimum (best_ours / best_abseil); change this to compute the median
of the RUNS durations instead. In the loops that measure elastic_hash and
abseil_hasher, collect each run's microsecond duration into vectors (e.g.,
ours_times, abseil_times) rather than updating best_ours/best_abseil; after the
runs compute the median (sort or use std::nth_element) and use those median
values in the printf and ratio calculation. Keep the existing warmup behavior
and use the same measurement codepaths (elastic_hash, abseil_hasher,
do_not_optimize) but replace the min-based best_* logic with median-based
selection.
insights-v3.md (1)

49-68: Add language specifier to fenced code block.

The pseudocode block lacks a language identifier, which affects syntax highlighting and accessibility.

📝 Proposed fix
-```
+```text
 get(key):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@insights-v3.md` around lines 49 - 68, Add a language specifier to the fenced
code block containing the pseudocode (the block starting with get(key): and
including get_overflow(self, h, key, fp):) to enable proper syntax
highlighting/accessibility; update the opening fence from ``` to something like
```text (or ```pseudo) so the entire block is treated as a specified language
without changing the pseudocode content.
src/string_hybrid.zig (1)

39-47: Document key lifetime requirements for StringEntry.

StringEntry stores only pointers to key data. Callers must ensure the key memory outlives the map entry. This is intentional for performance (avoiding copies), but the contract should be documented.

📝 Proposed documentation addition
+/// Entry storing a string key reference and value.
+/// IMPORTANT: The key memory is NOT owned by this struct. Callers must ensure
+/// the key data outlives the entry in the map. This design avoids allocations
+/// for string_view-like semantics matching the Abseil benchmark.
 const StringEntry = struct {
     key_ptr: [*]const u8,
     key_len: usize,
     value: u64,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/string_hybrid.zig` around lines 39 - 47, Add documentation to the
StringEntry declaration explaining that it stores only a raw pointer (key_ptr)
and length (key_len) and does not own or copy the key bytes, so callers must
guarantee the pointed-to key memory outlives the StringEntry/map entry; mention
this contract near the StringEntry struct and decorate the key(self:
StringEntry) method with a brief note about the borrowed lifetime and the
performance rationale for avoiding copies.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bench-realistic.cpp`:
- Around line 48-63: The benchmark currently constructs a std::mt19937_64 rng
and uses it to pick op types/keys (see rng, OPS, op_types, op_keys,
initial_keys), which differs from the Zig harness that uses splitmix64; replace
the std::mt19937_64 usage with the same splitmix64-family generator and seeding
(reuse ks or an equivalent seed variable) so the random sequence and key draws
(calls that produce r and indices into initial_keys) mirror the Zig
workload—update all occurrences (the rng declaration and loops at the sites
using rng, e.g., the block that sets op_types/op_keys and the other two noted
blocks) to call the project's splitmix64 generator consistently.

In `@bench-strings.sh`:
- Around line 32-48: The script currently skips benchmark rows silently when
elastic-strings.log lacks a matching RESULT line or parsed fields (e_line,
e_ins, e_get, e_del, e_miss) are empty/invalid; change the behavior to fail
loudly instead: in the block that checks e_line and the parsed e_* variables, if
e_line is empty or any of e_ins/e_get/e_del/e_miss are missing or
non-numeric/zero, print a clear error message including n_val and load_val and
exit with a non-zero status (exit 1) rather than continuing; similarly, if the
numeric checks for >0 fail before computing *_gap, emit an error and exit 1 so
missing/partial rows cannot be silently omitted. Ensure the checks reference the
exact variables e_line, e_ins, e_get, e_del, e_miss and the loop variables
n_val/load_val so the failure message pinpoints the failing configuration.
- Around line 5-7: The g++ invocation that builds bench-abseil-strings.cpp in
bench-strings.sh omits a C++ standard flag causing std::string_view usage in
bench-abseil-strings.cpp to fail on compilers defaulting to pre-C++17; update
the g++ command that produces bench-abseil-strings to include -std=c++17 (e.g.,
add -std=c++17 alongside -O3 -march=native ...) so the build supports
std::string_view and other C++17 features.

In `@bench-v2.sh`:
- Around line 31-47: The script currently skips missing/malformed benchmark rows
(when e_line is empty or extracted vars e_ins/e_get/e_del/e_miss are
missing/non-numeric/zero) which can hide a stopped benchmark; change the
behavior in the block that builds e_line/e_ins/e_get/e_del/e_miss so that
instead of continuing silently it prints a clear error (including n=${n_val} and
load=${load_val} and the offending e_line) and exits non‑zero. Concretely:
replace the "if [ -z \"$e_line\" ]; then continue; fi" with a failure path that
logs the missing RESULT row and runs "exit 1"; also tighten the subsequent
validation of e_ins/e_get/e_del/e_miss (ensure they are non-empty and numeric
and >0) to log a descriptive error and exit 1 rather than silently skipping,
keeping the awk field-extraction logic (e_line, e_ins, e_get, e_del, e_miss) and
the later gap calculations unchanged.

In `@bench.sh`:
- Around line 13-14: The metric extraction fails because the script greps for
"lookup_ratio=..." but the log emits "lookup_us=..."; update the extraction in
bench.sh (the line using grep "RESULT" bench.log | grep "n=1048576" | grep -oP
'lookup_ratio=\K[0-9.]+') to instead extract the emitted field (e.g.,
"lookup_us") or add a fallback that checks for both "lookup_ratio" and
"lookup_us" so the script doesn't fail under set -euo pipefail; ensure the echo
"=== Primary metric ..." line matches the chosen metric name.

In `@program-v3-verify.md`:
- Around line 67-72: Update section 6 to include string-key verification by
adding the new string-key path and the bench-strings* artifacts into the frozen
verification scope: mention running the bench-strings* benchmarks (both Zig and
C++ variants) as part of the verification, describe a simple cross-language
comparison approach (e.g., normalize reported ops/time and document expected
overhead differences), and record any remaining limitations; reference the
"string-key path", "bench-strings*" artifacts and the "frozen verification
scope" so reviewers can locate and run the new checks.

In `@program.md`:
- Around line 75-77: The fenced code block starting with ``` in program.md lacks
a language tag, causing MD040; update that fenced block delimiter to include a
language tag (e.g., "tsv") so the block becomes ```tsv and keep the inner line
"<commit_hash>\t<lookup_ratio_1M>\t<insert_ratio_1M>\t<delete_ratio_1M>\t<keep|revert|crash>\t<description>"
unchanged.

In `@README.md`:
- Around line 90-110: Update the README to advertise the new string-key path:
add the new implementation entry (e.g., mention StringElasticHash and its source
file such as src/string.zig or the actual filename containing StringElasticHash)
to the "Files" list and add the string benchmark runner (bench-strings.sh) and
any string-specific benchmark targets (e.g., src/bench-strings.zig or
src/strings.zig and src/autobench-strings.zig if present) to the "Usage"
section; ensure you reference the symbol StringElasticHash and the script
bench-strings.sh so users can find and run the string-key benchmark path
(include the exact run command like bash bench-strings.sh).
- Around line 38-39: The README's cache-size claim is inconsistent with the
test-machine L2 size; update the text around the "Hit lookups at 500K-2M
elements..." sentence and the related lines (the "1MB for tier 0" and the later
note about "roughly 512KB L2") so they agree: either change the claim that
tier-0 fits in L2 to state it fits in L3 (or in CPU L2+L3 combined) or revise
the footprint to match a 512KB L2, and mirror the same correction in the other
referenced paragraph (lines ~56-58) so both the description of "1MB for tier 0"
and the machine L2 size are consistent and clear.

In `@results.md`:
- Around line 367-370: The sentence claiming Abseil does "rehash-on-delete" is
incorrect; update the paragraph comparing Elastic hash to Abseil's flat_hash_map
to state that flat_hash_map::erase is O(1) and does not trigger a full-table
rehash (cite Abseil docs), remove the "rehash-on-delete" assertion, and instead
explain the observed 3–4x delete speed advantage by attributing it to
implementation factors in Elastic hash (e.g., tombstone marking vs in-place
removal strategies, memory layout/cache locality, probing or chaining
differences, or lazy clean-up behavior); if unsure, add a short note that the
exact cause is unconfirmed and suggest adding microbenchmarks or profiling to
isolate whether tombstone marking, layout, or another factor drives the
difference.

In `@src/autobench-realistic.zig`:
- Around line 25-29: The usElapsed function currently multiplies seconds by
1_000_000_000 and only divides the nanoseconds delta by 1_000, yielding wrong
microseconds and incorrect handling of negative ns_diff; fix usElapsed by
computing the full time difference in nanoseconds using signed arithmetic (e.g.,
compute sec_diff as a signed integer, add ns_diff to sec_diff * 1_000_000_000 to
form total_ns) and then divide the combined total_ns by 1_000 to get
microseconds, finally casting the result to u64; make these changes inside fn
usElapsed (use linux.timespec values) and avoid casting ns_diff to u64 before
the division so negative deltas are preserved.

In `@src/autobench-strings-verify.zig`:
- Around line 34-38: The usElapsed function computing microseconds between two
linux.timespec values can panic when end.nsec < start.nsec; change the logic in
usElapsed to compute seconds and nanoseconds differences with a borrow: compute
sec_diff = end.sec - start.sec as signed (or u64 after borrow), compute ns_diff
= end.nsec - start.nsec as i64, and if ns_diff < 0 then decrement sec_diff by 1
and add 1_000_000_000 to ns_diff; then calculate total microseconds as sec_diff
* 1_000_000_000 + ns_diff and convert to u64 before dividing by 1_000 so no
negative value is cast to u64. Ensure you update the usElapsed function
signature/locals that reference linux.timespec accordingly.

In `@src/autobench-strings.zig`:
- Around line 40-44: The usElapsed function can panic when end.nsec < start.nsec
because it casts a negative ns_diff to u64; fix by computing differences in a
signed integer, normalize nanoseconds when ns_diff is negative (decrement the
seconds difference and add 1_000_000_000 to ns_diff) and only then combine into
a non-negative total before dividing by 1_000; specifically, in usElapsed use a
signed seconds delta (e.g., sec_diff_i64 = end.sec - start.sec), compute ns_diff
as i64, if ns_diff < 0 then sec_diff_i64 -= 1 and ns_diff += 1_000_000_000, then
produce the final microsecond result from sec_diff_i64 * 1_000_000_000 + ns_diff
cast safely to u64.

In `@src/autobench-verify.zig`:
- Around line 26-31: The usElapsed function can panic when end.nsec < start.nsec
because a negative ns_diff gets cast to u64; update usElapsed to compute the
seconds and nanosecond difference with proper borrow handling: compute sec_diff
= end.sec - start.sec and if end.nsec < start.nsec decrement sec_diff by 1 and
set ns_diff = end.nsec + 1_000_000_000 - start.nsec (all kept in unsigned types
after the borrow), then combine total_ns = sec_diff * 1_000_000_000 + ns_diff
and return total_ns / 1_000 to produce microseconds; apply this change inside fn
usElapsed(start: linux.timespec, end: linux.timespec) to avoid casting negative
i64 to u64.

In `@src/autobench.zig`:
- Around line 59-60: The reported "load" is inconsistent because benchOne() uses
fill / n while HybridElasticHash.init(allocator, n) creates tier0 plus overflow
tiers (HybridElasticHash.init), giving more physical slots than n; fix by
normalizing capacity: either adjust the init parameter to allocate only the
equivalent physical slots that match absl::flat_hash_map::reserve(n), or
(preferable) query the hybrid map's actual capacity after init (e.g., call its
capacity/size method) and use fill / effective_capacity when labeling load in
benchOne(); apply the same normalization change wherever
HybridElasticHash.init(...) is used for benchmarking (including the other
occurrence mentioned) so Abseil reserve(n) and HybridElasticHash capacity are
compared on the same physical-slot basis.
- Around line 28-32: The usElapsed function incorrectly casts a negative ns_diff
to u64 causing huge values when end.nsec < start.nsec; fix by normalizing the
borrow: compute s and ns_diff as signed values, and if ns_diff < 0 then
decrement s by 1 and add 1_000_000_000 to ns_diff so ns_diff becomes
non-negative, then compute total_ns = s * 1_000_000_000 + ns_diff using
appropriate casts to u64 (ensure s is non-negative before the final cast) and
return total_ns / 1_000 to preserve microseconds; update the variables in the
usElapsed function (s, ns_diff, total_ns) accordingly.

In `@src/string_hybrid.zig`:
- Around line 314-330: The remove() implementation only searches tier 0; update
it to scan all tiers (tier 0 through the overflow tiers) the same way
insertion/search does: loop over each tier index (e.g., 0 .. self.tier_count),
for each tier use that tier's bucket mask/shift (instead of always
tier0_bucket_mask/tier0_bucket_shift) to compute bucket_base and run the same
probe loop (probe < MAX_PROBES) calling findKeyInBucket(bucket_idx, key, fp);
when a slot is found, write TOMBSTONE into the correct fingerprints array for
that tier (e.g., self.fingerprints[tier][bucket_idx][slot]), decrement
self.count and return true; if no tiers find the key return false. Ensure you
reference and use the same symbols: remove, findKeyInBucket, fingerprint,
MAX_PROBES, TOMBSTONE, and self.count.

---

Outside diff comments:
In `@src/hybrid.zig`:
- Around line 611-628: remove() only searches tier 0 leaving items that
overflowed into tier 1+ unremoved; implement a removeOverflow(self: *Self, key:
u64) bool that mirrors defaultGetOverflow()’s tiered probing logic and uses
findKeyInBucket(bucket_idx, key, fp) to locate the slot in higher tiers, set the
slot to TOMBSTONE and decrement self.count when found, and then call
removeOverflow from remove() when the tier0 probe loop fails to find the key;
make sure removeOverflow uses the same bucket index computation (bucket_base,
probe, mask/shift per tier) and MAX_PROBES behavior so its semantics match
defaultGetOverflow and maintain consistent counting.

---

Nitpick comments:
In `@insights-v3.md`:
- Around line 49-68: Add a language specifier to the fenced code block
containing the pseudocode (the block starting with get(key): and including
get_overflow(self, h, key, fp):) to enable proper syntax
highlighting/accessibility; update the opening fence from ``` to something like
```text (or ```pseudo) so the entire block is treated as a specified language
without changing the pseudocode content.

In `@insights.md`:
- Around line 10-27: The fenced pseudo-code block lacks a language identifier;
update the block delimiter to include one (e.g., ```text or ```pseudo) so code
renderers apply monospaced formatting and no highlighting errors occur; locate
the block that documents get(key) and findKeyInBucket (mentions MAX_PROBES,
probe loop, bucket0, fp) and change the opening ``` to ```text (or another
appropriate identifier) to satisfy the reviewer suggestion.

In `@program-v2.md`:
- Around line 21-39: The pseudo-code block lacks a language identifier which
affects syntax highlighting and readability; update the fenced code block
starting with the get(key) snippet to include a language tag (e.g., "text" or
"pseudo") so tools render it properly, ensuring the block that contains
get(key), h calculation, fp, tier0_bucket_mask, bucket0, the probe loop
(MAX_PROBES=8) and findKeyInBucket description is changed to ```text (or another
appropriate identifier) at the opening fence and keep the rest of the content
unchanged.
- Around line 125-127: The fenced code block containing the TSV header line
"commit\tlookup_gap_99\tlookup_gap_50\tinsert_gap_99\tstatus\tdescription" lacks
a language identifier; update the opening fence from ``` to ```tsv so the block
is labeled as TSV (e.g., change the code fence that wraps the header line in
program-v2.md to use ```tsv).

In `@program-v3.md`:
- Around line 162-164: The TSV code block showing the header line
"commit\tfind_rate\tlookup_gap_99\tmiss_gap_99\tinsert_gap_99\tstatus\tdescription"
needs a language identifier; update the fenced code block that currently uses
``` to instead use ```tsv so the block becomes a TSV-marked code fence (i.e.,
change the opening fence for the block containing the header line to ```tsv).
- Around line 92-99: Update the two fenced code blocks that show the memory
layout (the lines starting with "fingerprints: [tier0 bucket 0]...") to include
a language identifier (use "text" or "plaintext") on the opening fence; change
the opening fences for the original and interleaved examples (the blocks
containing "fingerprints: [tier0 bucket 0][tier0 bucket 1]..." and
"fingerprints: [tier0 bucket 0][tier1 bucket 0]...") from ``` to ```text so
markdown linters/renderers treat them as plain text.

In `@src/autobench-strings-verify.zig`:
- Around line 45-57: The Fisher–Yates shuffle inside function bench is written
as a single compressed line which reduces readability; refactor the while loop
that mutates shuffle_order, rng, idx and uses splitmix64 into a multi-line,
well-indented block (e.g., decrement idx, compute j = splitmix64(&rng) % (idx +
1), swap shuffle_order[idx] and shuffle_order[j]) so it matches the clearer
style used in autobench-verify.zig while keeping the same logic and using the
existing symbols (bench, shuffle_order, splitmix64, rng, idx).

In `@src/string_hybrid.zig`:
- Around line 39-47: Add documentation to the StringEntry declaration explaining
that it stores only a raw pointer (key_ptr) and length (key_len) and does not
own or copy the key bytes, so callers must guarantee the pointed-to key memory
outlives the StringEntry/map entry; mention this contract near the StringEntry
struct and decorate the key(self: StringEntry) method with a brief note about
the borrowed lifetime and the performance rationale for avoiding copies.

In `@verify-hash-cost.cpp`:
- Around line 35-69: The timing currently takes the per-hasher minimum
(best_ours / best_abseil); change this to compute the median of the RUNS
durations instead. In the loops that measure elastic_hash and abseil_hasher,
collect each run's microsecond duration into vectors (e.g., ours_times,
abseil_times) rather than updating best_ours/best_abseil; after the runs compute
the median (sort or use std::nth_element) and use those median values in the
printf and ratio calculation. Keep the existing warmup behavior and use the same
measurement codepaths (elastic_hash, abseil_hasher, do_not_optimize) but replace
the min-based best_* logic with median-based selection.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 446ffdcc-c540-4a72-bf1b-ec3dedd629ab

📥 Commits

Reviewing files that changed from the base of the PR and between 3e37318 and 27303d9.

⛔ Files ignored due to path filters (6)
  • abseil-strings.log is excluded by !**/*.log
  • bench-v2.log is excluded by !**/*.log
  • elastic-strings.log is excluded by !**/*.log
  • results-v2.tsv is excluded by !**/*.tsv
  • results-v3.tsv is excluded by !**/*.tsv
  • results.tsv is excluded by !**/*.tsv
📒 Files selected for processing (37)
  • .gitignore
  • AGENTS.md
  • README.md
  • bench-abseil-clang
  • bench-abseil-strings
  • bench-abseil-strings.cpp
  • bench-abseil.cpp
  • bench-realistic
  • bench-realistic.cpp
  • bench-strings-verify
  • bench-strings-verify.cpp
  • bench-strings.sh
  • bench-v2.sh
  • bench-verify
  • bench-verify.cpp
  • bench.sh
  • build.zig
  • insights-v2.md
  • insights-v3.md
  • insights.md
  • program-v2.md
  • program-v3-verify.md
  • program-v3.md
  • program.md
  • results.md
  • src/autobench-realistic.zig
  • src/autobench-strings-verify.zig
  • src/autobench-strings.zig
  • src/autobench-verify.zig
  • src/autobench.zig
  • src/hybrid.zig
  • src/string_hybrid.zig
  • verify-capacity
  • verify-capacity.cpp
  • verify-hash-cost
  • verify-hash-cost.cpp
  • verify-results.md

Comment on lines +48 to +63
std::mt19937_64 rng(12345);
for (size_t i = 0; i < OPS; i++) {
int r = rng() % 100;
if (r < 80) { // 80% hit lookup
op_types[i] = 0;
op_keys[i] = initial_keys[rng() % initial_size];
} else if (r < 90) { // 10% miss lookup
op_types[i] = 1;
op_keys[i] = splitmix64(ks); // very unlikely to be in table
} else if (r < 95) { // 5% insert
op_types[i] = 2;
op_keys[i] = splitmix64(ks);
} else { // 5% delete
op_types[i] = 3;
op_keys[i] = initial_keys[rng() % initial_size];
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use the same RNG family as Zig to keep workload parity.

At Line 48, Line 115, and Line 157, this file uses std::mt19937_64, while src/autobench-realistic.zig generates equivalent workloads with splitmix64. That makes cross-language comparisons non-equivalent and can skew reported gaps.

Proposed alignment diff
-    std::mt19937_64 rng(12345);
+    uint64_t rng = 12345;
     for (size_t i = 0; i < OPS; i++) {
-        int r = rng() % 100;
+        int r = splitmix64(rng) % 100;
         if (r < 80) { // 80% hit lookup
             op_types[i] = 0;
-            op_keys[i] = initial_keys[rng() % initial_size];
+            op_keys[i] = initial_keys[splitmix64(rng) % initial_size];
         } else if (r < 90) { // 10% miss lookup
             op_types[i] = 1;
             op_keys[i] = splitmix64(ks); // very unlikely to be in table
         } else if (r < 95) { // 5% insert
             op_types[i] = 2;
             op_keys[i] = splitmix64(ks);
         } else { // 5% delete
             op_types[i] = 3;
-            op_keys[i] = initial_keys[rng() % initial_size];
+            op_keys[i] = initial_keys[splitmix64(rng) % initial_size];
         }
     }

-    std::mt19937_64 rng(42);
+    uint64_t rng = 42;
     size_t hot_count = n / 5; // top 20%
     for (size_t i = 0; i < OPS; i++) {
-        if (rng() % 100 < 80) {
-            lookup_keys[i] = keys[rng() % hot_count]; // hot key
+        if (splitmix64(rng) % 100 < 80) {
+            lookup_keys[i] = keys[splitmix64(rng) % hot_count]; // hot key
         } else {
-            lookup_keys[i] = keys[rng() % n]; // any key
+            lookup_keys[i] = keys[splitmix64(rng) % n]; // any key
         }
     }

-    std::mt19937_64 rng(99);
-    for (size_t i = 0; i < n * READ_MULT; i++) read_order[i] = rng() % n;
+    uint64_t rng = 99;
+    for (size_t i = 0; i < n * READ_MULT; i++) read_order[i] = splitmix64(rng) % n;

Also applies to: 115-123, 157-158

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bench-realistic.cpp` around lines 48 - 63, The benchmark currently constructs
a std::mt19937_64 rng and uses it to pick op types/keys (see rng, OPS, op_types,
op_keys, initial_keys), which differs from the Zig harness that uses splitmix64;
replace the std::mt19937_64 usage with the same splitmix64-family generator and
seeding (reuse ks or an equivalent seed variable) so the random sequence and key
draws (calls that produce r and indices into initial_keys) mirror the Zig
workload—update all occurrences (the rng declaration and loops at the sites
using rng, e.g., the block that sets op_types/op_keys and the other two noted
blocks) to call the project's splitmix64 generator consistently.

Comment on lines +5 to +7
g++ -O3 -march=native -DNDEBUG -DABSL_HASHTABLEZ_SAMPLE_PARAMETER=0 \
bench-abseil-strings.cpp -o bench-abseil-strings \
$(pkg-config --cflags --libs absl_hash absl_raw_hash_set absl_hashtablez_sampler)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify that the string benchmark sources use std::string_view while the build command omits -std=...
rg -n --type=cpp '\bstd::string_view\b' bench-abseil-strings.cpp bench-strings-verify.cpp
rg -n --type=sh 'g\+\+ .*bench-abseil-strings\.cpp' bench-strings.sh

Repository: joshuaisaact/elastic-hash

Length of output: 735


🏁 Script executed:

# Check the exact content of bench-strings.sh around lines 5-7
head -n 10 bench-strings.sh

# Verify that g++ command lacks -std flag
grep -n "g++" bench-strings.sh

Repository: joshuaisaact/elastic-hash

Length of output: 519


🏁 Script executed:

# Double-check that std::string_view requires C++17
# and verify no other implicit standard-setting mechanism exists
file bench-strings.sh
stat bench-strings.sh

Repository: joshuaisaact/elastic-hash

Length of output: 156


Add -std=c++17 to the build command for the string benchmark.

bench-abseil-strings.cpp uses std::string_view (lines 56, 58-59, 65), which requires C++17. The current build command omits the -std flag and will fail on GCC 9/10 systems that default to C++14 or earlier.

🛠️ Suggested change
-g++ -O3 -march=native -DNDEBUG -DABSL_HASHTABLEZ_SAMPLE_PARAMETER=0 \
+g++ -std=c++17 -O3 -march=native -DNDEBUG -DABSL_HASHTABLEZ_SAMPLE_PARAMETER=0 \
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
g++ -O3 -march=native -DNDEBUG -DABSL_HASHTABLEZ_SAMPLE_PARAMETER=0 \
bench-abseil-strings.cpp -o bench-abseil-strings \
$(pkg-config --cflags --libs absl_hash absl_raw_hash_set absl_hashtablez_sampler)
g++ -std=c++17 -O3 -march=native -DNDEBUG -DABSL_HASHTABLEZ_SAMPLE_PARAMETER=0 \
bench-abseil-strings.cpp -o bench-abseil-strings \
$(pkg-config --cflags --libs absl_hash absl_raw_hash_set absl_hashtablez_sampler)
🧰 Tools
🪛 Shellcheck (0.11.0)

[warning] 7-7: Quote this to prevent word splitting.

(SC2046)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bench-strings.sh` around lines 5 - 7, The g++ invocation that builds
bench-abseil-strings.cpp in bench-strings.sh omits a C++ standard flag causing
std::string_view usage in bench-abseil-strings.cpp to fail on compilers
defaulting to pre-C++17; update the g++ command that produces
bench-abseil-strings to include -std=c++17 (e.g., add -std=c++17 alongside -O3
-march=native ...) so the build supports std::string_view and other C++17
features.

Comment on lines +32 to +48
e_line=$(grep "n=${n_val}" elastic-strings.log | grep "load=${load_val}" | head -1) || true
if [ -z "$e_line" ]; then continue; fi

e_ins=$(echo "$e_line" | awk -F'\t' '{for(i=1;i<=NF;i++) if($i ~ /^insert_us=/) print substr($i,11)}')
e_get=$(echo "$e_line" | awk -F'\t' '{for(i=1;i<=NF;i++) if($i ~ /^lookup_us=/) print substr($i,11)}')
e_del=$(echo "$e_line" | awk -F'\t' '{for(i=1;i<=NF;i++) if($i ~ /^delete_us=/) print substr($i,11)}')
e_miss=$(echo "$e_line" | awk -F'\t' '{for(i=1;i<=NF;i++) if($i ~ /^miss_us=/) print substr($i,9)}')

if [ -n "$e_ins" ] && [ -n "$e_get" ] && [ -n "$e_del" ] && [ -n "$e_miss" ] && \
[ "$e_ins" -gt 0 ] && [ "$e_get" -gt 0 ] && [ "$e_del" -gt 0 ] && [ "$e_miss" -gt 0 ]; then
ins_gap=$(awk "BEGIN {printf \"%.3f\", $a_ins / $e_ins}")
get_gap=$(awk "BEGIN {printf \"%.3f\", $a_get / $e_get}")
del_gap=$(awk "BEGIN {printf \"%.3f\", $a_del / $e_del}")
mis_gap=$(awk "BEGIN {printf \"%.3f\", $a_miss / $e_miss}")
printf "%-8s %-6s %8s %8s %8s %8s %8s %8s %8s %8s %8s %8s %8s %8s\n" \
"$n_val" "$load_val" "$a_ins" "$a_get" "$a_del" "$a_miss" "$e_ins" "$e_get" "$e_del" "$e_miss" "$ins_gap" "$get_gap" "$del_gap" "$mis_gap"
fi
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't silently drop missing benchmark rows.

If either side stops emitting a RESULT line, or the field parsing breaks, this script just omits that configuration and still exits successfully. That makes it too easy to publish a partial comparison table without noticing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bench-strings.sh` around lines 32 - 48, The script currently skips benchmark
rows silently when elastic-strings.log lacks a matching RESULT line or parsed
fields (e_line, e_ins, e_get, e_del, e_miss) are empty/invalid; change the
behavior to fail loudly instead: in the block that checks e_line and the parsed
e_* variables, if e_line is empty or any of e_ins/e_get/e_del/e_miss are missing
or non-numeric/zero, print a clear error message including n_val and load_val
and exit with a non-zero status (exit 1) rather than continuing; similarly, if
the numeric checks for >0 fail before computing *_gap, emit an error and exit 1
so missing/partial rows cannot be silently omitted. Ensure the checks reference
the exact variables e_line, e_ins, e_get, e_del, e_miss and the loop variables
n_val/load_val so the failure message pinpoints the failing configuration.

Comment on lines +31 to +47
e_line=$(grep "n=${n_val}" elastic-v2.log | grep "load=${load_val}" | head -1) || true
if [ -z "$e_line" ]; then continue; fi

e_ins=$(echo "$e_line" | awk -F'\t' '{for(i=1;i<=NF;i++) if($i ~ /^insert_us=/) print substr($i,11)}')
e_get=$(echo "$e_line" | awk -F'\t' '{for(i=1;i<=NF;i++) if($i ~ /^lookup_us=/) print substr($i,11)}')
e_del=$(echo "$e_line" | awk -F'\t' '{for(i=1;i<=NF;i++) if($i ~ /^delete_us=/) print substr($i,11)}')
e_miss=$(echo "$e_line" | awk -F'\t' '{for(i=1;i<=NF;i++) if($i ~ /^miss_us=/) print substr($i,9)}')

if [ -n "$e_ins" ] && [ -n "$e_get" ] && [ -n "$e_del" ] && [ -n "$e_miss" ] && \
[ "$e_ins" -gt 0 ] && [ "$e_get" -gt 0 ] && [ "$e_del" -gt 0 ] && [ "$e_miss" -gt 0 ]; then
ins_gap=$(awk "BEGIN {printf \"%.3f\", $a_ins / $e_ins}")
get_gap=$(awk "BEGIN {printf \"%.3f\", $a_get / $e_get}")
del_gap=$(awk "BEGIN {printf \"%.3f\", $a_del / $e_del}")
mis_gap=$(awk "BEGIN {printf \"%.3f\", $a_miss / $e_miss}")
printf "%-8s %-6s %8s %8s %8s %8s %8s %8s %8s %8s %8s %8s %8s %8s\n" \
"$n_val" "$load_val" "$a_ins" "$a_get" "$a_del" "$a_miss" "$e_ins" "$e_get" "$e_del" "$e_miss" "$ins_gap" "$get_gap" "$del_gap" "$mis_gap"
fi
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fail fast on missing or malformed result rows.

If one benchmark stops emitting a matching RESULT line, this script currently just skips that configuration and keeps going. That can produce partial tables that look valid enough to trust even though one side never reported the row.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bench-v2.sh` around lines 31 - 47, The script currently skips
missing/malformed benchmark rows (when e_line is empty or extracted vars
e_ins/e_get/e_del/e_miss are missing/non-numeric/zero) which can hide a stopped
benchmark; change the behavior in the block that builds
e_line/e_ins/e_get/e_del/e_miss so that instead of continuing silently it prints
a clear error (including n=${n_val} and load=${load_val} and the offending
e_line) and exits non‑zero. Concretely: replace the "if [ -z \"$e_line\" ]; then
continue; fi" with a failure path that logs the missing RESULT row and runs
"exit 1"; also tighten the subsequent validation of e_ins/e_get/e_del/e_miss
(ensure they are non-empty and numeric and >0) to log a descriptive error and
exit 1 rather than silently skipping, keeping the awk field-extraction logic
(e_line, e_ins, e_get, e_del, e_miss) and the later gap calculations unchanged.

Comment on lines +13 to +14
echo "=== Primary metric (lookup_ratio at n=1048576) ==="
grep "RESULT" bench.log | grep "n=1048576" | grep -oP 'lookup_ratio=\K[0-9.]+'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Metric extraction is broken (lookup_ratio is not emitted).

At Line 14, the parser looks for lookup_ratio=..., but current RESULT output uses lookup_us=.... With set -euo pipefail, this causes the script to fail after running benchmarks.

Proposed fix
-echo "=== Primary metric (lookup_ratio at n=1048576) ==="
-grep "RESULT" bench.log | grep "n=1048576" | grep -oP 'lookup_ratio=\K[0-9.]+'
+echo "=== Primary metric (lookup_us at n=1048576, load=99) ==="
+grep "RESULT" bench.log | grep "n=1048576" | grep "load=99" | grep -oP 'lookup_us=\K[0-9]+'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bench.sh` around lines 13 - 14, The metric extraction fails because the
script greps for "lookup_ratio=..." but the log emits "lookup_us=..."; update
the extraction in bench.sh (the line using grep "RESULT" bench.log | grep
"n=1048576" | grep -oP 'lookup_ratio=\K[0-9.]+') to instead extract the emitted
field (e.g., "lookup_us") or add a fallback that checks for both "lookup_ratio"
and "lookup_us" so the script doesn't fail under set -euo pipefail; ensure the
echo "=== Primary metric ..." line matches the chosen metric name.

Comment on lines +40 to +44
fn usElapsed(start: linux.timespec, end: linux.timespec) u64 {
const s: u64 = @intCast(end.sec - start.sec);
const ns_diff: i64 = @as(i64, @intCast(end.nsec)) - @as(i64, @intCast(start.nsec));
return (s * 1_000_000_000 + @as(u64, @intCast(ns_diff))) / 1_000;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential integer overflow when nsec wraps around.

When end.nsec < start.nsec (second boundary crossing), ns_diff is negative. The expression s * 1_000_000_000 + @as(u64, @intcast(ns_diff)) attempts to cast a negative i64 to u64, which will panic in safe mode or produce incorrect results.

🐛 Proposed fix
 fn usElapsed(start: linux.timespec, end: linux.timespec) u64 {
-    const s: u64 = `@intCast`(end.sec - start.sec);
-    const ns_diff: i64 = `@as`(i64, `@intCast`(end.nsec)) - `@as`(i64, `@intCast`(start.nsec));
-    return (s * 1_000_000_000 + `@as`(u64, `@intCast`(ns_diff))) / 1_000;
+    const end_ns: u64 = `@as`(u64, `@intCast`(end.sec)) * 1_000_000_000 + `@as`(u64, `@intCast`(end.nsec));
+    const start_ns: u64 = `@as`(u64, `@intCast`(start.sec)) * 1_000_000_000 + `@as`(u64, `@intCast`(start.nsec));
+    return (end_ns - start_ns) / 1_000;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/autobench-strings.zig` around lines 40 - 44, The usElapsed function can
panic when end.nsec < start.nsec because it casts a negative ns_diff to u64; fix
by computing differences in a signed integer, normalize nanoseconds when ns_diff
is negative (decrement the seconds difference and add 1_000_000_000 to ns_diff)
and only then combine into a non-negative total before dividing by 1_000;
specifically, in usElapsed use a signed seconds delta (e.g., sec_diff_i64 =
end.sec - start.sec), compute ns_diff as i64, if ns_diff < 0 then sec_diff_i64
-= 1 and ns_diff += 1_000_000_000, then produce the final microsecond result
from sec_diff_i64 * 1_000_000_000 + ns_diff cast safely to u64.

Comment on lines +26 to +31
fn usElapsed(start: linux.timespec, end: linux.timespec) u64 {
const s: u64 = @intCast(end.sec - start.sec);
const ns_diff: i64 = @as(i64, @intCast(end.nsec)) - @as(i64, @intCast(start.nsec));
const total_ns: u64 = s * 1_000_000_000 + @as(u64, @intCast(ns_diff));
return total_ns / 1_000;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Potential panic when end.nsec < start.nsec.

When end.nsec < start.nsec (which happens whenever the seconds increment), ns_diff will be negative. Casting a negative i64 to u64 with @intCast will cause a runtime panic in safe modes or undefined behavior.

🐛 Proposed fix to handle nanosecond borrow correctly
 fn usElapsed(start: linux.timespec, end: linux.timespec) u64 {
-    const s: u64 = `@intCast`(end.sec - start.sec);
-    const ns_diff: i64 = `@as`(i64, `@intCast`(end.nsec)) - `@as`(i64, `@intCast`(start.nsec));
-    const total_ns: u64 = s * 1_000_000_000 + `@as`(u64, `@intCast`(ns_diff));
-    return total_ns / 1_000;
+    const s_diff = end.sec - start.sec;
+    const ns_diff = end.nsec - start.nsec;
+    const total_ns: i128 = `@as`(i128, s_diff) * 1_000_000_000 + ns_diff;
+    return `@intCast`(`@divFloor`(total_ns, 1_000));
 }

Alternatively, use the standard borrow pattern:

 fn usElapsed(start: linux.timespec, end: linux.timespec) u64 {
+    var s: i64 = end.sec - start.sec;
+    var ns: i64 = end.nsec - start.nsec;
+    if (ns < 0) {
+        s -= 1;
+        ns += 1_000_000_000;
+    }
+    return `@intCast`(s * 1_000_000 + `@divFloor`(ns, 1_000));
-    const s: u64 = `@intCast`(end.sec - start.sec);
-    const ns_diff: i64 = `@as`(i64, `@intCast`(end.nsec)) - `@as`(i64, `@intCast`(start.nsec));
-    const total_ns: u64 = s * 1_000_000_000 + `@as`(u64, `@intCast`(ns_diff));
-    return total_ns / 1_000;
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fn usElapsed(start: linux.timespec, end: linux.timespec) u64 {
const s: u64 = @intCast(end.sec - start.sec);
const ns_diff: i64 = @as(i64, @intCast(end.nsec)) - @as(i64, @intCast(start.nsec));
const total_ns: u64 = s * 1_000_000_000 + @as(u64, @intCast(ns_diff));
return total_ns / 1_000;
}
fn usElapsed(start: linux.timespec, end: linux.timespec) u64 {
const s_diff = end.sec - start.sec;
const ns_diff = end.nsec - start.nsec;
const total_ns: i128 = `@as`(i128, s_diff) * 1_000_000_000 + ns_diff;
return `@intCast`(`@divFloor`(total_ns, 1_000));
}
Suggested change
fn usElapsed(start: linux.timespec, end: linux.timespec) u64 {
const s: u64 = @intCast(end.sec - start.sec);
const ns_diff: i64 = @as(i64, @intCast(end.nsec)) - @as(i64, @intCast(start.nsec));
const total_ns: u64 = s * 1_000_000_000 + @as(u64, @intCast(ns_diff));
return total_ns / 1_000;
}
fn usElapsed(start: linux.timespec, end: linux.timespec) u64 {
var s: i64 = end.sec - start.sec;
var ns: i64 = end.nsec - start.nsec;
if (ns < 0) {
s -= 1;
ns += 1_000_000_000;
}
return `@intCast`(s * 1_000_000 + `@divFloor`(ns, 1_000));
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/autobench-verify.zig` around lines 26 - 31, The usElapsed function can
panic when end.nsec < start.nsec because a negative ns_diff gets cast to u64;
update usElapsed to compute the seconds and nanosecond difference with proper
borrow handling: compute sec_diff = end.sec - start.sec and if end.nsec <
start.nsec decrement sec_diff by 1 and set ns_diff = end.nsec + 1_000_000_000 -
start.nsec (all kept in unsigned types after the borrow), then combine total_ns
= sec_diff * 1_000_000_000 + ns_diff and return total_ns / 1_000 to produce
microseconds; apply this change inside fn usElapsed(start: linux.timespec, end:
linux.timespec) to avoid casting negative i64 to u64.

Comment on lines +28 to +32
fn usElapsed(start: linux.timespec, end: linux.timespec) u64 {
const s: u64 = @intCast(end.sec - start.sec);
const ns_diff: i64 = @as(i64, @intCast(end.nsec)) - @as(i64, @intCast(start.nsec));
const total_ns: u64 = s * 1_000_000_000 + @as(u64, @intCast(ns_diff));
return total_ns / 1_000;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Normalize the nanosecond borrow in usElapsed().

When end.nsec < start.nsec, ns_diff becomes negative and the cast to u64 turns it into a huge positive number. Any timed section that crosses a second boundary will report nonsense durations.

🛠️ Suggested change
 fn usElapsed(start: linux.timespec, end: linux.timespec) u64 {
-    const s: u64 = `@intCast`(end.sec - start.sec);
-    const ns_diff: i64 = `@as`(i64, `@intCast`(end.nsec)) - `@as`(i64, `@intCast`(start.nsec));
-    const total_ns: u64 = s * 1_000_000_000 + `@as`(u64, `@intCast`(ns_diff));
+    var sec = end.sec - start.sec;
+    var ns_diff: i64 = `@as`(i64, `@intCast`(end.nsec)) - `@as`(i64, `@intCast`(start.nsec));
+    if (ns_diff < 0) {
+        sec -= 1;
+        ns_diff += 1_000_000_000;
+    }
+    const total_ns: u64 = `@as`(u64, `@intCast`(sec)) * 1_000_000_000 + `@as`(u64, `@intCast`(ns_diff));
     return total_ns / 1_000;
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fn usElapsed(start: linux.timespec, end: linux.timespec) u64 {
const s: u64 = @intCast(end.sec - start.sec);
const ns_diff: i64 = @as(i64, @intCast(end.nsec)) - @as(i64, @intCast(start.nsec));
const total_ns: u64 = s * 1_000_000_000 + @as(u64, @intCast(ns_diff));
return total_ns / 1_000;
fn usElapsed(start: linux.timespec, end: linux.timespec) u64 {
var sec = end.sec - start.sec;
var ns_diff: i64 = `@as`(i64, `@intCast`(end.nsec)) - `@as`(i64, `@intCast`(start.nsec));
if (ns_diff < 0) {
sec -= 1;
ns_diff += 1_000_000_000;
}
const total_ns: u64 = `@as`(u64, `@intCast`(sec)) * 1_000_000_000 + `@as`(u64, `@intCast`(ns_diff));
return total_ns / 1_000;
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/autobench.zig` around lines 28 - 32, The usElapsed function incorrectly
casts a negative ns_diff to u64 causing huge values when end.nsec < start.nsec;
fix by normalizing the borrow: compute s and ns_diff as signed values, and if
ns_diff < 0 then decrement s by 1 and add 1_000_000_000 to ns_diff so ns_diff
becomes non-negative, then compute total_ns = s * 1_000_000_000 + ns_diff using
appropriate casts to u64 (ensure s is non-negative before the final cast) and
return total_ns / 1_000 to preserve microseconds; update the variables in the
usElapsed function (s, ns_diff, total_ns) accordingly.

Comment on lines +59 to +60
for (0..TOTAL_RUNS) |r| {
var map = HybridElasticHash.init(allocator, n) catch @panic("init failed");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

These “load” rows are not measuring the same capacity model as Abseil.

benchOne() labels load as fill / n, but HybridElasticHash.init(allocator, n) allocates tier 0 plus extra overflow tiers (src/hybrid.zig:347-370). So the elastic side owns substantially more slots than n, while absl::flat_hash_map::reserve(n) has a different spare-capacity curve. That makes the reported lookup/delete gaps materially apples-to-oranges until capacity is normalized or the effective physical load is reported.

Also applies to: 157-164

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/autobench.zig` around lines 59 - 60, The reported "load" is inconsistent
because benchOne() uses fill / n while HybridElasticHash.init(allocator, n)
creates tier0 plus overflow tiers (HybridElasticHash.init), giving more physical
slots than n; fix by normalizing capacity: either adjust the init parameter to
allocate only the equivalent physical slots that match
absl::flat_hash_map::reserve(n), or (preferable) query the hybrid map's actual
capacity after init (e.g., call its capacity/size method) and use fill /
effective_capacity when labeling load in benchOne(); apply the same
normalization change wherever HybridElasticHash.init(...) is used for
benchmarking (including the other occurrence mentioned) so Abseil reserve(n) and
HybridElasticHash capacity are compared on the same physical-slot basis.

Comment on lines +314 to +330
pub fn remove(self: *Self, key: []const u8) bool {
const h = hash(key);
const fp = fingerprint(h);
const mask = self.tier0_bucket_mask;
const bucket_base = h >> self.tier0_bucket_shift;

var probe: usize = 0;
while (probe < MAX_PROBES) : (probe += 1) {
const bucket_idx = (bucket_base +% @as(u64, probe)) & mask;
if (self.findKeyInBucket(bucket_idx, key, fp)) |slot| {
self.fingerprints[bucket_idx][slot] = TOMBSTONE;
self.count -= 1;
return true;
}
}
return false;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

remove() only searches tier 0, same issue as HybridElasticHash.

This mirrors the same limitation in src/hybrid.zig:611-628. Elements inserted into tier 1+ will not be found for removal.

🐛 Proposed fix - search overflow tier
     pub fn remove(self: *Self, key: []const u8) bool {
         const h = hash(key);
         const fp = fingerprint(h);
         const mask = self.tier0_bucket_mask;
         const bucket_base = h >> self.tier0_bucket_shift;

         var probe: usize = 0;
         while (probe < MAX_PROBES) : (probe += 1) {
             const bucket_idx = (bucket_base +% `@as`(u64, probe)) & mask;
             if (self.findKeyInBucket(bucket_idx, key, fp)) |slot| {
                 self.fingerprints[bucket_idx][slot] = TOMBSTONE;
                 self.count -= 1;
                 return true;
             }
         }
+        return self.removeOverflow(h, key, fp);
+    }
+
+    fn removeOverflow(self: *Self, h: u64, key: []const u8, fp: u8) bool {
+        if (self.num_tiers <= 1) return false;
+        const num_buckets = self.tier_bucket_counts[1];
+        const tier_start = self.tier_starts[1];
+        for (0..@min(MAX_PROBES, num_buckets)) |probe| {
+            const abs_idx = tier_start + bucketIndex(h, probe, num_buckets);
+            if (self.findKeyInBucket(abs_idx, key, fp)) |slot| {
+                self.fingerprints[abs_idx][slot] = TOMBSTONE;
+                self.count -= 1;
+                return true;
+            }
+            if (matchEmpty(&self.fingerprints[abs_idx]) != 0) return false;
+        }
         return false;
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/string_hybrid.zig` around lines 314 - 330, The remove() implementation
only searches tier 0; update it to scan all tiers (tier 0 through the overflow
tiers) the same way insertion/search does: loop over each tier index (e.g., 0 ..
self.tier_count), for each tier use that tier's bucket mask/shift (instead of
always tier0_bucket_mask/tier0_bucket_shift) to compute bucket_base and run the
same probe loop (probe < MAX_PROBES) calling findKeyInBucket(bucket_idx, key,
fp); when a slot is found, write TOMBSTONE into the correct fingerprints array
for that tier (e.g., self.fingerprints[tier][bucket_idx][slot]), decrement
self.count and return true; if no tiers find the key return false. Ensure you
reference and use the same symbols: remove, findKeyInBucket, fingerprint,
MAX_PROBES, TOMBSTONE, and self.count.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (1)
verify-strings.cpp (1)

103-106: Fail explicitly on unexpected misses before dereferencing the iterator.

These loops turn a lookup miss into UB via it->second. In a verification binary, that should be a clear failure instead.

Minimal hardening pattern
+// add near the top of the file
+#include <cstdlib>
...
-                auto it = map.find(keys[order[i]]);
-                do_not_optimize(it->second);
+                auto it = map.find(keys[order[i]]);
+                if (it == map.end()) std::abort();
+                do_not_optimize(it->second);

Apply the same guard in the keys, sv_keys, and owned_keys lookup loops.

Also applies to: 147-149, 193-195, 210-212

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@verify-strings.cpp` around lines 103 - 106, The lookup loops use auto it =
map.find(keys[order[i]]) and then immediately access it->second which leads to
UB on misses; update each lookup loop (the ones iterating over keys, sv_keys,
and owned_keys that use map.find, e.g., the loop with variables order, fill, and
do_not_optimize) to check if it == map.end() and, if so, fail explicitly (e.g.,
abort(), throw runtime_error, or call a test-failure helper) with a clear
message before dereferencing; do this guard in all similar loops (the
occurrences around the current find/dereference pairs) so unexpected misses
produce a deterministic failure instead of undefined behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@program-v4-verify-strings.md`:
- Around line 16-17: The checklist under "### 4. Different table sizes" is
inconsistent with the experiment list — update the experiment/checklist so both
the section header and the detailed experiment list include the same set of
sizes: 100K, 500K, 1M, 2M, and 4M (remove the duplicate/missing 1M/4M mismatch).
Locate the "### 4. Different table sizes" heading and the corresponding
experiment list block (the checklist around the experiment items) and make them
identical so someone following either will run 100K, 500K, 1M, 2M, and 4M.

In `@src/verify-strings.zig`:
- Around line 165-170: The verification currently prints counts but still exits
successfully; modify the end-of-run logic in verify-strings.zig so that after
printing the "Inserted/Found/Correct" counts you check whether found == fill and
correct == fill (or simply correct == fill) and if not, terminate with a
non-zero exit (e.g., call std.os.exit(1) or return a non-zero status from main)
instead of letting execution continue to the final "DONE" print; use the
existing local variables fill, found, correct and replace or augment the final
std.debug.print("DONE", .{}) path to return/exit non‑zero on failure and only
print/exit 0 when verification passes.
- Around line 73-79: Prevalidate that every shuffled lookup is actually a hit
before you measure timing: before capturing start in the loop that iterates
shuffle_order and calls map.get(key_buf[ki * klen .. (ki + 1) * klen]), run a
lightweight validation pass over the same shuffled indices to ensure map.get
returns the expected value/Some and that the retrieved value matches the
expected 16-byte key (use existing key_buf, klen, shuffle_order and
doNotOptimizeAway to avoid optimizer removal); fail fast (panic or record an
error) if any lookup is a miss so the timed section (now(), usElapsed, times,
WARMUP) only measures confirmed hits.
- Around line 37-40: The usElapsed function currently computes ns_diff and casts
it to u64, which overflows when start.nsec > end.nsec; update usElapsed to
compute a signed second difference (e.g., s_diff = end.sec - start.sec as a
signed integer) and a signed ns_diff, then if ns_diff < 0 decrement s_diff by 1
and add 1_000_000_000 to ns_diff so both parts are non-negative; finally compute
microseconds by converting s_diff and adjusted ns_diff to u64 (seconds *
1_000_000 + ns_diff / 1_000) to avoid casting negative values to u64 and to
handle second-boundary crossings correctly in usElapsed.

In `@verify-strings-results.md`:
- Around line 63-66: The summary line that claims "36-97% faster on shuffled hit
lookups at 1M" is inconsistent with the benchmark details; update the
sentence(s) referencing "shuffled hit lookup" so they correctly state whether
the numbers came from ordered-access or shuffled-access benchmarks (e.g.,
replace the contradictory line 63 text with a clear statement that the 36–97%
figures are from ordered-access and add the correct shuffled-access numbers:
1.24x at 16B keys 50% load, 1.35x at 16B keys 50% load 1M size, and 1.62x at
256B keys), and ensure the exact quoted phrase "36-97% faster on shuffled hit
lookups at 1M" is either removed or moved to an ordered-access context to
eliminate the contradiction.

---

Nitpick comments:
In `@verify-strings.cpp`:
- Around line 103-106: The lookup loops use auto it = map.find(keys[order[i]])
and then immediately access it->second which leads to UB on misses; update each
lookup loop (the ones iterating over keys, sv_keys, and owned_keys that use
map.find, e.g., the loop with variables order, fill, and do_not_optimize) to
check if it == map.end() and, if so, fail explicitly (e.g., abort(), throw
runtime_error, or call a test-failure helper) with a clear message before
dereferencing; do this guard in all similar loops (the occurrences around the
current find/dereference pairs) so unexpected misses produce a deterministic
failure instead of undefined behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dde4a2e8-9a45-4468-89dc-5ebfb049739d

📥 Commits

Reviewing files that changed from the base of the PR and between 27303d9 and 06ba5b9.

📒 Files selected for processing (5)
  • program-v4-verify-strings.md
  • src/verify-strings.zig
  • verify-strings-abseil
  • verify-strings-results.md
  • verify-strings.cpp

Comment on lines +16 to +17
### 4. Different table sizes
We only tested 1M. Need 100K, 500K, 2M, 4M.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Keep the checklist aligned with the gap you call out above.

Lines 17 and 35 disagree: section 4 says 1M is already covered and 4M is still missing, but the experiment list reintroduces 1M and drops 4M. Someone following only the checklist would miss the largest-size validation.

Also applies to: 33-35

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@program-v4-verify-strings.md` around lines 16 - 17, The checklist under "###
4. Different table sizes" is inconsistent with the experiment list — update the
experiment/checklist so both the section header and the detailed experiment list
include the same set of sizes: 100K, 500K, 1M, 2M, and 4M (remove the
duplicate/missing 1M/4M mismatch). Locate the "### 4. Different table sizes"
heading and the corresponding experiment list block (the checklist around the
experiment items) and make them identical so someone following either will run
100K, 500K, 1M, 2M, and 4M.

Comment on lines +37 to +40
fn usElapsed(start: linux.timespec, end: linux.timespec) u64 {
const s: u64 = @intCast(end.sec - start.sec);
const ns_diff: i64 = @as(i64, @intCast(end.nsec)) - @as(i64, @intCast(start.nsec));
return (s * 1_000_000_000 + @as(u64, @intCast(ns_diff))) / 1_000;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix usElapsed() for second-boundary crossings.

Line 40 casts a possibly negative nanosecond delta to u64. If start.nsec > end.nsec, the elapsed time wraps to a huge value and can poison the median.

Suggested fix
 fn usElapsed(start: linux.timespec, end: linux.timespec) u64 {
-    const s: u64 = `@intCast`(end.sec - start.sec);
-    const ns_diff: i64 = `@as`(i64, `@intCast`(end.nsec)) - `@as`(i64, `@intCast`(start.nsec));
-    return (s * 1_000_000_000 + `@as`(u64, `@intCast`(ns_diff))) / 1_000;
+    const sec_ns: i128 =
+        (`@as`(i128, `@intCast`(end.sec)) - `@as`(i128, `@intCast`(start.sec))) * 1_000_000_000;
+    const ns_diff: i128 =
+        `@as`(i128, `@intCast`(end.nsec)) - `@as`(i128, `@intCast`(start.nsec));
+    return `@intCast`((sec_ns + ns_diff) / 1_000);
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fn usElapsed(start: linux.timespec, end: linux.timespec) u64 {
const s: u64 = @intCast(end.sec - start.sec);
const ns_diff: i64 = @as(i64, @intCast(end.nsec)) - @as(i64, @intCast(start.nsec));
return (s * 1_000_000_000 + @as(u64, @intCast(ns_diff))) / 1_000;
fn usElapsed(start: linux.timespec, end: linux.timespec) u64 {
const sec_ns: i128 =
(`@as`(i128, `@intCast`(end.sec)) - `@as`(i128, `@intCast`(start.sec))) * 1_000_000_000;
const ns_diff: i128 =
`@as`(i128, `@intCast`(end.nsec)) - `@as`(i128, `@intCast`(start.nsec));
return `@intCast`((sec_ns + ns_diff) / 1_000);
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/verify-strings.zig` around lines 37 - 40, The usElapsed function
currently computes ns_diff and casts it to u64, which overflows when start.nsec
> end.nsec; update usElapsed to compute a signed second difference (e.g., s_diff
= end.sec - start.sec as a signed integer) and a signed ns_diff, then if ns_diff
< 0 decrement s_diff by 1 and add 1_000_000_000 to ns_diff so both parts are
non-negative; finally compute microseconds by converting s_diff and adjusted
ns_diff to u64 (seconds * 1_000_000 + ns_diff / 1_000) to avoid casting negative
values to u64 and to handle second-boundary crossings correctly in usElapsed.

Comment on lines +73 to +79
const start = now();
for (0..fill) |i| {
const ki = shuffle_order[i];
std.mem.doNotOptimizeAway(map.get(key_buf[ki * klen .. (ki + 1) * klen]));
}
const end = now();
if (r >= WARMUP) times[r - WARMUP] = usElapsed(start, end);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Prevalidate shuffled hits before timing.

Line 76 benchmarks the raw ?u64, so a miss can still look like a valid sample. Test 4 only validates 16-byte keys, which leaves the length and size sweeps unverified.

One low-overhead way to fail fast
+        for (0..fill) |i| {
+            const ki = shuffle_order[i];
+            const v = map.get(key_buf[ki * klen .. (ki + 1) * klen]) orelse `@panic`("unexpected miss");
+            if (v != ki) `@panic`("unexpected value");
+        }
+
         const start = now();
         for (0..fill) |i| {
             const ki = shuffle_order[i];
             std.mem.doNotOptimizeAway(map.get(key_buf[ki * klen .. (ki + 1) * klen]));
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const start = now();
for (0..fill) |i| {
const ki = shuffle_order[i];
std.mem.doNotOptimizeAway(map.get(key_buf[ki * klen .. (ki + 1) * klen]));
}
const end = now();
if (r >= WARMUP) times[r - WARMUP] = usElapsed(start, end);
for (0..fill) |i| {
const ki = shuffle_order[i];
const v = map.get(key_buf[ki * klen .. (ki + 1) * klen]) orelse `@panic`("unexpected miss");
if (v != ki) `@panic`("unexpected value");
}
const start = now();
for (0..fill) |i| {
const ki = shuffle_order[i];
std.mem.doNotOptimizeAway(map.get(key_buf[ki * klen .. (ki + 1) * klen]));
}
const end = now();
if (r >= WARMUP) times[r - WARMUP] = usElapsed(start, end);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/verify-strings.zig` around lines 73 - 79, Prevalidate that every shuffled
lookup is actually a hit before you measure timing: before capturing start in
the loop that iterates shuffle_order and calls map.get(key_buf[ki * klen .. (ki
+ 1) * klen]), run a lightweight validation pass over the same shuffled indices
to ensure map.get returns the expected value/Some and that the retrieved value
matches the expected 16-byte key (use existing key_buf, klen, shuffle_order and
doNotOptimizeAway to avoid optimizer removal); fail fast (panic or record an
error) if any lookup is a miss so the timed section (now(), usElapsed, times,
WARMUP) only measures confirmed hits.

Comment on lines +165 to +170
std.debug.print(" Inserted: {d}\n", .{fill});
std.debug.print(" Found: {d} ({d:.1}%)\n", .{ found, @as(f64, @floatFromInt(found)) / @as(f64, @floatFromInt(fill)) * 100 });
std.debug.print(" Correct: {d} ({d:.1}%)\n\n", .{ correct, @as(f64, @floatFromInt(correct)) / @as(f64, @floatFromInt(fill)) * 100 });
}

std.debug.print("DONE\n", .{});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Return a failing status when verification fails.

This block prints the counts but still exits 0 and reaches DONE, so automation cannot distinguish a verified run from a broken one.

Suggested fix
         std.debug.print("  Inserted: {d}\n", .{fill});
         std.debug.print("  Found:    {d} ({d:.1}%)\n", .{ found, `@as`(f64, `@floatFromInt`(found)) / `@as`(f64, `@floatFromInt`(fill)) * 100 });
         std.debug.print("  Correct:  {d} ({d:.1}%)\n\n", .{ correct, `@as`(f64, `@floatFromInt`(correct)) / `@as`(f64, `@floatFromInt`(fill)) * 100 });
+        if (found != fill or correct != fill) {
+            return error.VerificationFailed;
+        }
     }
 
     std.debug.print("DONE\n", .{});
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
std.debug.print(" Inserted: {d}\n", .{fill});
std.debug.print(" Found: {d} ({d:.1}%)\n", .{ found, @as(f64, @floatFromInt(found)) / @as(f64, @floatFromInt(fill)) * 100 });
std.debug.print(" Correct: {d} ({d:.1}%)\n\n", .{ correct, @as(f64, @floatFromInt(correct)) / @as(f64, @floatFromInt(fill)) * 100 });
}
std.debug.print("DONE\n", .{});
std.debug.print(" Inserted: {d}\n", .{fill});
std.debug.print(" Found: {d} ({d:.1}%)\n", .{ found, `@as`(f64, `@floatFromInt`(found)) / `@as`(f64, `@floatFromInt`(fill)) * 100 });
std.debug.print(" Correct: {d} ({d:.1}%)\n\n", .{ correct, `@as`(f64, `@floatFromInt`(correct)) / `@as`(f64, `@floatFromInt`(fill)) * 100 });
if (found != fill or correct != fill) {
return error.VerificationFailed;
}
}
std.debug.print("DONE\n", .{});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/verify-strings.zig` around lines 165 - 170, The verification currently
prints counts but still exits successfully; modify the end-of-run logic in
verify-strings.zig so that after printing the "Inserted/Found/Correct" counts
you check whether found == fill and correct == fill (or simply correct == fill)
and if not, terminate with a non-zero exit (e.g., call std.os.exit(1) or return
a non-zero status from main) instead of letting execution continue to the final
"DONE" print; use the existing local variables fill, found, correct and replace
or augment the final std.debug.print("DONE", .{}) path to return/exit non‑zero
on failure and only print/exit 0 when verification passes.

Comment on lines +63 to +66
The claim "36-97% faster on shuffled hit lookups at 1M" was from the ordered-access benchmark. With shuffled access the numbers are:
- **1.24x** at 16B keys, 50% load (24% faster)
- **1.35x** at 16B keys, 50% load, 1M size (35% faster)
- **1.62x** at 256B keys (62% faster)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Clarify whether this headline came from ordered or shuffled access.

Line 63 says the "shuffled hit lookup" claim came from an ordered-access benchmark. That sentence contradicts itself and makes the summary hard to trust.

Possible wording
-The claim "36-97% faster on shuffled hit lookups at 1M" was from the ordered-access benchmark. With shuffled access the numbers are:
+The 36-97% headline came from the 1M shuffled-hit benchmark across load factors. For the specific shuffled scenarios above, the numbers are:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
The claim "36-97% faster on shuffled hit lookups at 1M" was from the ordered-access benchmark. With shuffled access the numbers are:
- **1.24x** at 16B keys, 50% load (24% faster)
- **1.35x** at 16B keys, 50% load, 1M size (35% faster)
- **1.62x** at 256B keys (62% faster)
The 36-97% headline came from the 1M shuffled-hit benchmark across load factors. For the specific shuffled scenarios above, the numbers are:
- **1.24x** at 16B keys, 50% load (24% faster)
- **1.35x** at 16B keys, 50% load, 1M size (35% faster)
- **1.62x** at 256B keys (62% faster)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@verify-strings-results.md` around lines 63 - 66, The summary line that claims
"36-97% faster on shuffled hit lookups at 1M" is inconsistent with the benchmark
details; update the sentence(s) referencing "shuffled hit lookup" so they
correctly state whether the numbers came from ordered-access or shuffled-access
benchmarks (e.g., replace the contradictory line 63 text with a clear statement
that the 36–97% figures are from ordered-access and add the correct
shuffled-access numbers: 1.24x at 16B keys 50% load, 1.35x at 16B keys 50% load
1M size, and 1.62x at 256B keys), and ensure the exact quoted phrase "36-97%
faster on shuffled hit lookups at 1M" is either removed or moved to an
ordered-access context to eliminate the contradiction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant