cross-language benchmark: elastic hash 2-3x faster on both x86 and Apple M4#4
cross-language benchmark: elastic hash 2-3x faster on both x86 and Apple M4#4joshuaisaact wants to merge 195 commits intomainfrom
Conversation
…lot)" This reverts commit 32dd07a.
- src/string_hybrid.zig: StringElasticHash with wyhash, same tiered architecture - src/autobench-strings.zig: Zig benchmark with 16-byte hex keys - bench-abseil-strings.cpp: abseil benchmark with string_view keys - bench-strings.sh: comparison runner - build.zig: added string test and benchmark steps
… access Shuffled hit lookup gaps at 1M (string_view keys, 16-byte hex): 10%: 1.97x 25%: 1.86x 50%: 1.74x 75%: 1.61x 90%: 1.50x 99%: 1.36x The advantage is LARGER than u64 keys because: - Fingerprint filtering saves expensive 16-byte memcmp comparisons - Both sides pay more for wyhash, but our table structure saves more per probe
…, sizes, despite slower hash
…swiss.Map Elastic hash wins on string key hit lookups against all competitors at every load factor. Fair comparison (vs abseil): 36-97% faster. Rust/Go comparisons have fairness caveats (SipHash, allocation overhead).
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdded many cross-language benchmarks, runner scripts, and verification/research documents; introduced multiple new Zig benchmark executables and a string-key hybrid map; made substantial behavioral and public-layout changes to the Zig hybrid hash implementation ( Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Runner as "bench-v2.sh"
participant BuilderC as "g++ / pkg-config"
participant BuilderZ as "zig build"
participant Abseil as "bench-abseil"
participant Elastic as "autobench"
participant Parser as "log-parser"
Runner->>BuilderC: compile `bench-abseil.cpp`
Runner->>BuilderZ: build `autobench` (ReleaseFast)
BuilderC-->>Abseil: produce Abseil binary
BuilderZ-->>Elastic: produce Elastic binary
Runner->>Abseil: run -> write `abseil-v2.log`
Runner->>Elastic: run -> write `elastic-v2.log`
Runner->>Parser: parse `RESULT` lines from logs
Parser-->>Runner: match n/load and compute per-op gaps
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 19
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/hybrid.zig (1)
611-628:⚠️ Potential issue | 🔴 Critical
remove()only searches tier 0, but elements can exist in tier 1+.The
remove()function usestier0_bucket_maskandtier0_bucket_shiftto search only tier 0 buckets. However,insert()can place elements into tier 1+ (viainsertIntoTier,insertAnyTier). Elements inserted into higher tiers will never be found byremove(), returningfalseand leaving the element in the table.Compare with
get()which delegates toget_overflow_fnfor tier 1+ lookups.remove()needs equivalent overflow handling.🐛 Proposed fix sketch
pub fn remove(self: *Self, key: u64) bool { const h = hash(key); const fp = fingerprint(h); const mask = self.tier0_bucket_mask; const bucket_base = h >> self.tier0_bucket_shift; var probe: usize = 0; while (probe < MAX_PROBES) : (probe += 1) { const bucket_idx = (bucket_base +% `@as`(u64, probe)) & mask; if (self.findKeyInBucket(bucket_idx, key, fp)) |slot| { self.fingerprints[bucket_idx][slot] = TOMBSTONE; self.count -= 1; return true; } } - return false; + // Search tier 1+ (similar to defaultGetOverflow) + return self.removeOverflow(h, key, fp); } + +fn removeOverflow(self: *Self, h: u64, key: u64, fp: u8) bool { + if (self.num_tiers <= 1) return false; + const num_buckets = self.tier_bucket_counts[1]; + const tier_start = self.tier_starts[1]; + for (0..@min(MAX_PROBES, num_buckets)) |probe| { + const abs_idx = tier_start + bucketIndex(h, probe, num_buckets); + if (self.findKeyInBucket(abs_idx, key, fp)) |slot| { + self.fingerprints[abs_idx][slot] = TOMBSTONE; + self.tier_slot_counts[1] -= 1; + self.count -= 1; + return true; + } + } + return false; +}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/hybrid.zig` around lines 611 - 628, remove() only probes tier0 (uses tier0_bucket_mask/tier0_bucket_shift) so it never finds entries inserted into higher tiers by insertIntoTier/insertAnyTier; modify remove (or add a helper) to mirror get() behavior by delegating to the overflow lookup for tiers >0: for each tier attempt the same probe loop using that tier's bucket_mask and bucket_shift (or call get_overflow_fn-style logic) and when findKeyInBucket(bucket_idx, key, fp) returns a slot, set fingerprints[bucket_idx][slot] = TOMBSTONE and decrement count and return true; if no tier finds the key after probing MAX_PROBES per tier return false. Ensure you reference and use findKeyInBucket, MAX_PROBES, TOMBSTONE, fingerprints, count, and the per-tier bucket_mask/bucket_shift logic rather than always using tier0_*.
♻️ Duplicate comments (1)
src/autobench-verify.zig (1)
26-31:⚠️ Potential issue | 🟡 MinorSame potential underflow issue as in
autobench-strings.zig.The
ns_diffcast tou64could panic if negative. See the earlier comment onautobench-strings.zigfor a fix.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/autobench-verify.zig` around lines 26 - 31, The usElapsed function can underflow when ns_diff is negative; change the math to use signed arithmetic and only cast to u64 after combining seconds and nanoseconds: treat the seconds difference as a signed value (e.g., s_i64 = `@as`(i64, end.sec - start.sec)), keep ns_diff as i64, compute total_ns in a sufficiently-wide signed type (e.g., i128) as total_ns_signed = s_i64 * 1_000_000_000 + ns_diff, then cast total_ns_signed to u64 once (or assert/non-negatively check) before dividing by 1_000; update the usElapsed function in src/autobench-verify.zig accordingly to avoid casting a negative ns_diff to u64.
🧹 Nitpick comments (20)
bench-rust/src/main.rs (1)
70-77: Remove redundantas &[u8]casts.The casts are unnecessary since
&keys[order[i]](type&[u8; 16]) automatically coerces to&[u8]when passed tomap.get().♻️ Proposed simplification
// Shuffled hit lookup let start = Instant::now(); for i in 0..fill { - black_box(map.get(&keys[order[i]] as &[u8])); + black_box(map.get(&keys[order[i]][..])); } let lookup_us = start.elapsed().as_micros() as u64; // Shuffled miss lookup let start = Instant::now(); for i in 0..fill { - black_box(map.get(&miss_keys[order[i]] as &[u8])); + black_box(map.get(&miss_keys[order[i]][..])); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bench-rust/src/main.rs` around lines 70 - 77, The calls to map.get currently include redundant casts like `&keys[order[i]] as &[u8]`; remove the unnecessary `as &[u8]` so you pass `&keys[order[i]]` and `&miss_keys[order[i]]` directly to map.get. Update the lookup loops that use `map.get(&keys[order[i]] as &[u8])` and `map.get(&miss_keys[order[i]] as &[u8])` (in the blocks computing `lookup_us` and the shuffled miss lookup) to call `map.get(&keys[order[i]])` and `map.get(&miss_keys[order[i]])` respectively.bench-realistic.cpp (2)
132-134: Iterator dereference assumes key always exists.
it->secondis dereferenced without checkingit != map.end(). This is safe currently because alllookup_keysare drawn fromkeys[]which are all inserted, but the code is fragile if the key generation logic changes.Consider adding an assertion or defensive check for safety:
♻️ Proposed defensive check
for (size_t i = 0; i < OPS; i++) { auto it = map.find(lookup_keys[i]); - do_not_optimize(it->second); + do_not_optimize(it != map.end() ? it->second : 0); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bench-realistic.cpp` around lines 132 - 134, The loop dereferences it->second after auto it = map.find(lookup_keys[i]) without verifying the iterator is valid; update the code around map.find/it to either assert(it != map.end()) or add a defensive check (if (it == map.end()) handle error/continue) before calling do_not_optimize(it->second) so lookups from lookup_keys are safe if key generation changes.
34-35: Clarify the workload comment.The comment says "90% lookup, 5% insert, 5% delete" but doesn't mention the hit/miss split. The actual breakdown is 80% hit lookup + 10% miss lookup + 5% insert + 5% delete. Consider updating the comment to reflect this:
-// Workload 1: Mixed read/write (90% lookup, 5% insert, 5% delete) +// Workload 1: Mixed read/write (80% hit lookup, 10% miss lookup, 5% insert, 5% delete)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bench-realistic.cpp` around lines 34 - 35, Update the "Workload 1" comment to explicitly state the hit/miss split instead of just "90% lookup, 5% insert, 5% delete" — replace it with a line such as "Workload 1: Mixed read/write (80% hit lookup, 10% miss lookup, 5% insert, 5% delete) — simulates a cache or index under steady-state load" so the actual 80/10/5/5 breakdown is clear; locate the comment near the Workload 1 block in bench-realistic.cpp to make the change.verify-hash-cost.cpp (1)
64-69: Use portable format specifiers foruint64_t.The
%luformat specifier assumesuint64_tisunsigned long, which isn't portable across platforms (e.g., Windows usesunsigned long long). UsePRIu64from<cinttypes>for portability.♻️ Proposed fix
`#include` <cstdint> `#include` <cstdio> +#include <cinttypes> `#include` <chrono>- printf(" Elastic (multiply+xor-shift): %lu us (%.1f ns/hash)\n", + printf(" Elastic (multiply+xor-shift): %" PRIu64 " us (%.1f ns/hash)\n", best_ours, best_ours * 1000.0 / N); - printf(" Abseil (absl::Hash<u64>): %lu us (%.1f ns/hash)\n", + printf(" Abseil (absl::Hash<u64>): %" PRIu64 " us (%.1f ns/hash)\n", best_abseil, best_abseil * 1000.0 / N);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@verify-hash-cost.cpp` around lines 64 - 69, The printf calls use %lu for variables best_ours and best_abseil which is not portable for uint64_t; include <cinttypes> (or <inttypes.h>) and switch the format specifiers to use PRIu64 (e.g. "%" PRIu64) for best_ours and best_abseil, and ensure any arithmetic involving N is cast appropriately (or printed with a matching specifier) so the expressions best_ours * 1000.0 / N and best_abseil * 1000.0 / N still format correctly; update the three printf lines referencing best_ours, best_abseil and the ratio accordingly.src/autobench.zig (1)
84-126: Consider gating diagnostic output behind a compile-time flag.The brute-force scan and detailed tracing are valuable for debugging but add noise to benchmark output. Consider using a
comptimeflag to enable/disable this block.♻️ Optional: Add compile-time toggle
+const ENABLE_DIAGNOSTICS = false; + // One-time tier/probe distribution print (BEFORE delete) -if (r == 0 and load_pct == 99 and n == 1_048_576) { +if (ENABLE_DIAGNOSTICS and r == 0 and load_pct == 99 and n == 1_048_576) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/autobench.zig` around lines 84 - 126, Introduce a comptime toggle (e.g., comptime const SHOW_DISTRIBUTION = false) and wrap the entire diagnostic block (the if (r == 0 and load_pct == 99 and n == 1_048_576) { ... } including the brute-force scan that references map, map.get, map.getWithProbes, map.entries, map.fingerprints, KEY_SEED, and splitmix64) in if (comptime SHOW_DISTRIBUTION) { ... } so the prints and scan are compiled out by default and can be enabled at compile time for debugging.src/hybrid.zig (1)
403-406: Hash function differs fromComptimeHybridElasticHashversion.The runtime
hash()uses a simpler single-multiply-XOR scheme, whileComptimeHybridElasticHash.hash()(lines 82-89) uses a more thorough wyhash-style double multiplication. This may be intentional for performance, but could affect distribution quality differently between the two implementations.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/hybrid.zig` around lines 403 - 406, The runtime hash function inline fn hash(key: u64) currently uses a single-multiply-XOR scheme that differs from the ComptimeHybridElasticHash.hash() implementation; make them consistent by replacing the runtime hash body with the same wyhash-style double-multiply/xor/shift sequence used in ComptimeHybridElasticHash.hash() (same constants and operations) so both runtime and comptime hashes produce the same distribution, or if the simpler runtime variant is intentional, add a TODO/comment in inline fn hash referencing ComptimeHybridElasticHash.hash and include a brief note to keep behavior divergence deliberate and benchmarked; locate the function by name (inline fn hash) and the comptime implementation (ComptimeHybridElasticHash.hash) to copy or document the logic.program.md (1)
74-77: Add language specifier to fenced code block.The code block showing the TSV format lacks a language specifier, which can cause rendering issues in some markdown viewers.
Proposed fix
9. **Log.** Append a line to `results.tsv`: - ``` + ```tsv <commit_hash>\t<lookup_ratio_1M>\t<insert_ratio_1M>\t<delete_ratio_1M>\t<keep|revert|crash>\t<description> ```🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@program.md` around lines 74 - 77, Update the fenced code block in the "Log. Append a line to `results.tsv`" section so it includes a language specifier (e.g., ```tsv) immediately after the opening backticks; locate the block showing "<commit_hash>\t<lookup_ratio_1M>\t<insert_ratio_1M>\t<delete_ratio_1M>\t<keep|revert|crash>\t<description>" and change the opening fence to include the specifier to ensure proper markdown rendering.bench.sh (1)
14-14:grep -Pis not portable to macOS/BSD.The
-P(Perl regex) flag is a GNU grep extension. On macOS (which uses BSD grep by default), this will fail with "grep: invalid option -- P". If cross-platform support is needed, consider usingsedorawkinstead.Portable alternative using awk
-grep "RESULT" bench.log | grep "n=1048576" | grep -oP 'lookup_ratio=\K[0-9.]+' +grep "RESULT" bench.log | grep "n=1048576" | awk -F'lookup_ratio=' '{print $2}' | awk '{print $1}'Or using sed:
grep "RESULT" bench.log | grep "n=1048576" | sed -n 's/.*lookup_ratio=\([0-9.]*\).*/\1/p'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bench.sh` at line 14, The current pipeline using grep -oP 'lookup_ratio=\K[0-9.]+' is not portable because grep -P is a GNU-only (Perl) extension; replace the trailing grep -oP step with a portable sed or awk expression to extract the numeric value from lines matched by grep "RESULT" and grep "n=1048576" (i.e., update the pipeline that currently reads grep "RESULT" bench.log | grep "n=1048576" | grep -oP 'lookup_ratio=\K[0-9.]+'). Use sed -n with a capture group or awk to print the captured lookup_ratio value so the script works on macOS/BSD as well.bench-go/go.mod (1)
1-5: Remove stale duplicatebench-rust/bench-go/go.mod.The directory contains an outdated duplicate go.mod file with Go version 1.25 (compared to 1.25.4 in the main bench-go directory) and no dependencies. No Go source files are present in this directory, confirming it's a leftover artifact that should be deleted.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bench-go/go.mod` around lines 1 - 5, Delete the stale duplicate go.mod file that contains "module bench-go" and "go 1.25" (this leftover has no Go source files and is inconsistent with the main bench-go go.mod). Remove the file from the repo (e.g., git rm), commit the deletion, and ensure there are no references or imports relying on that duplicate; if CI or tooling references it, update those references to the main bench-go/go.mod.bench-strings-verify.cpp (1)
61-66: Unchecked iterator dereference assumes all lookups are hits.Lines 61 and 66 dereference
it->secondwithout checking ifit != map.end(). This is correct for hit benchmarks where all keys exist, but could segfault if any key is missing due to a bug.This is acceptable for controlled benchmark code, but adding a debug assertion would catch potential issues:
auto it = map.find(keys[i]); assert(it != map.end()); do_not_optimize(it->second);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bench-strings-verify.cpp` around lines 61 - 66, The benchmark currently dereferences iterators returned by map.find(...) (see use of map.find, keys, shuffle_order and do_not_optimize) without checking for map.end(), which can segfault if a key is missing; add a debug-only assertion before the dereference in both loops (after auto it = map.find(...)) to assert(it != map.end()) so hits are still assumed in release but missing-key bugs are caught during development.insights.md (1)
10-27: Add language specifier to pseudocode block.📝 Proposed fix
-``` +```text get(key): h = key * 0x517cc1b727220a95; h ^= h >> 32🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@insights.md` around lines 10 - 27, The fenced pseudocode block that shows get(key) and findKeyInBucket lacks a language specifier, so update the opening fence to include one (e.g., ```text or ```pseudo) to ensure consistent rendering/highlighting; leave the content of get(key) and findKeyInBucket unchanged and keep the closing ``` intact.program-v3.md (2)
92-99: Add language specifier to fenced code blocks.The code blocks showing memory layouts lack language specifiers. While these are pseudocode/diagrams, adding a specifier (e.g.,
textorplaintext) improves markdown rendering consistency.📝 Proposed fix
-``` +```text fingerprints: [tier0 bucket 0][tier0 bucket 1]...[tier1 bucket 0][tier1 bucket 1]...-
+text
fingerprints: [tier0 bucket 0][tier1 bucket 0][tier0 bucket 1][tier1 bucket 1]...🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@program-v3.md` around lines 92 - 99, The fenced diagram blocks showing the memory layout (the lines starting with "fingerprints: [tier0 bucket 0]...") should include a Markdown language specifier for consistent rendering; update both fenced code blocks that display the fingerprints layout to use a plain text specifier (e.g., change ``` to ```text) so the blocks that contain "fingerprints: [tier0 bucket 0][tier0 bucket 1]...[tier1 bucket 0][tier1 bucket 1]..." and the interleaved version "fingerprints: [tier0 bucket 0][tier1 bucket 0][tier0 bucket 1][tier1 bucket 1]..." are fenced as ```text.
162-164: Add language specifier for TSV header block.📝 Proposed fix
-``` +```tsv commit find_rate lookup_gap_99 miss_gap_99 insert_gap_99 status description</details> <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against the current code and only fix it if needed.
In
@program-v3.mdaround lines 162 - 164, The fenced code block containing the
TSV header line
"commit\tfind_rate\tlookup_gap_99\tmiss_gap_99\tinsert_gap_99\tstatus\tdescription"
should include a language specifier for TSV; update the opening fence from "" to "tsv" so the header block is marked as TSV (locate the block that contains
the exact header string to apply the change).</details> </blockquote></details> <details> <summary>src/string_hybrid.zig (1)</summary><blockquote> `39-47`: **`StringEntry` stores raw pointer without lifetime guarantees.** `StringEntry` stores `key_ptr: [*]const u8` and `key_len: usize` which point to memory owned by the caller. If the caller frees or modifies the original key buffer, lookups will access invalid memory or return incorrect results. This is acceptable for benchmark code where key buffers outlive the map, but should be documented for general use. <details> <summary>📝 Suggested documentation</summary> ```diff const StringEntry = struct { + /// IMPORTANT: The map does not own or copy key data. + /// Callers must ensure key memory remains valid for the map's lifetime. key_ptr: [*]const u8, key_len: usize, value: u64, ``` </details> <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@src/string_hybrid.zig` around lines 39 - 47, StringEntry currently stores a raw pointer (key_ptr) and length (key_len) that reference caller-owned memory, which can become invalid; update StringEntry to either (A) store an owned slice or copy of the key (e.g., allocate and store []const u8 or a Zig ArrayList/Allocator-managed buffer) or (B) explicitly document the lifetime requirement and provide a safe constructor that takes ownership or clones the key; adjust the StringEntry.key(self) method accordingly to return the owned slice, and add a clear comment on the struct (or add a create/copy function) describing that callers must ensure the backing buffer outlives the entry if you keep the current raw-pointer approach. ``` </details> </blockquote></details> <details> <summary>src/verify-strings.zig (1)</summary><blockquote> `62-62`: **Same readability suggestion: expand the single-line shuffle.** This matches the pattern in `autobench-strings-verify.zig`. Consider extracting a shared shuffle helper if this pattern is used across multiple files. <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@src/verify-strings.zig` at line 62, The one-line Fisher–Yates shuffle inside verify-strings.zig is hard to read; expand the single-line loop into a multi-line, well-indented while loop using the same variables (idx, rng, splitmix64, shuffle_order, tmp) so each operation is on its own line, and then optionally move that logic into a shared helper function (e.g., shuffleArray or fyShuffle) that accepts the slice and rng pointer so autobench-strings-verify.zig and verify-strings.zig can both call it. ``` </details> </blockquote></details> <details> <summary>bench-strings.sh (2)</summary><blockquote> `5-8`: **Quote the `pkg-config` output to prevent word splitting issues.** The shell will split the output on whitespace and glob characters. While typically safe for pkg-config, quoting is more robust. <details> <summary>🔧 Proposed fix</summary> ```diff -g++ -O3 -march=native -DNDEBUG -DABSL_HASHTABLEZ_SAMPLE_PARAMETER=0 \ - bench-abseil-strings.cpp -o bench-abseil-strings \ - $(pkg-config --cflags --libs absl_hash absl_raw_hash_set absl_hashtablez_sampler) +g++ -O3 -march=native -DNDEBUG -DABSL_HASHTABLEZ_SAMPLE_PARAMETER=0 \ + bench-abseil-strings.cpp -o bench-abseil-strings \ + "$(pkg-config --cflags --libs absl_hash absl_raw_hash_set absl_hashtablez_sampler)" ``` </details> <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@bench-strings.sh` around lines 5 - 8, The pkg-config invocation in the g++ command is unquoted which can cause word-splitting/globbing issues; update the g++ invocation that builds bench-abseil-strings to wrap the pkg-config substitution in double quotes (i.e., quote the $(pkg-config --cflags --libs absl_hash absl_raw_hash_set absl_hashtablez_sampler) part) so the flags are treated as a single expansion and not split by the shell. ``` </details> --- `40-48`: **Integer comparison may fail on non-numeric values.** If the awk extraction produces an unexpected non-numeric string (e.g., malformed log line), the `-gt` comparison will error. This is unlikely given controlled benchmark output, but a defensive check could help debugging. <details> <summary>🛡️ Optional: Add numeric validation</summary> ```diff + # Validate numeric values + is_numeric() { [[ "$1" =~ ^[0-9]+$ ]]; } + - if [ -n "$e_ins" ] && [ -n "$e_get" ] && [ -n "$e_del" ] && [ -n "$e_miss" ] && \ - [ "$e_ins" -gt 0 ] && [ "$e_get" -gt 0 ] && [ "$e_del" -gt 0 ] && [ "$e_miss" -gt 0 ]; then + if is_numeric "$e_ins" && is_numeric "$e_get" && is_numeric "$e_del" && is_numeric "$e_miss" && \ + [ "$e_ins" -gt 0 ] && [ "$e_get" -gt 0 ] && [ "$e_del" -gt 0 ] && [ "$e_miss" -gt 0 ]; then ``` </details> <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@bench-strings.sh` around lines 40 - 48, The current conditional that checks e_ins/e_get/e_del/e_miss using -gt can fail if any extracted value is non-numeric; before performing the integer comparisons in the if block (the line starting with "if [ -n "$e_ins" ] && ... && [ "$e_miss" -gt 0 ]; then"), add a simple numeric validation for the four extracted variables (e_ins, e_get, e_del, e_miss) — e.g., test each against a numeric regex (allow integers or decimals) or coerce/validate with awk — and only proceed to compute ins_gap/get_gap/del_gap/mis_gap and printf when all four pass the numeric check and are >0. Ensure you reference and validate the same variables used later (a_ins/a_get/a_del/a_miss and e_*) so the awk division and printf are only executed with safe numeric inputs. ``` </details> </blockquote></details> <details> <summary>README.md (1)</summary><blockquote> `106-110`: **Specify language for fenced code block.** Adding a language identifier enables syntax highlighting and satisfies linting rules. <details> <summary>🔧 Add language specifier</summary> ````diff -``` +```bash zig build test # run tests zig build bench # full benchmark bash bench-v2.sh # comparison vs abseil (requires abseil-cpp) ```🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@README.md` around lines 106 - 110, Update the fenced code block that contains the three commands starting with "zig build test" to include a language identifier by changing the opening ``` to ```bash so the block reads as a bash snippet; locate the triple-backtick fenced block which encloses "zig build test", "zig build bench", and "bash bench-v2.sh" and add the "bash" specifier for proper syntax highlighting and lint compliance.src/autobench-strings-verify.zig (1)
56-56: Consider expanding the shuffle loop for readability.The single-line Fisher-Yates implementation is correct but dense. Breaking it across multiple lines would improve maintainability.
♻️ Expanded shuffle loop
- while (idx > 1) { idx -= 1; const j = splitmix64(&rng) % (idx + 1); const tmp = shuffle_order[idx]; shuffle_order[idx] = shuffle_order[j]; shuffle_order[j] = tmp; } + while (idx > 1) { + idx -= 1; + const j = splitmix64(&rng) % (idx + 1); + const tmp = shuffle_order[idx]; + shuffle_order[idx] = shuffle_order[j]; + shuffle_order[j] = tmp; + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/autobench-strings-verify.zig` at line 56, The one-line Fisher‑Yates shuffle in the loop is correct but hard to read; refactor the while loop that uses idx, rng, splitmix64, and shuffle_order into multiple lines: decrement idx on its own line, compute j = splitmix64(&rng) % (idx + 1) on the next line, then swap using a temporary variable (tmp) across two lines so each step (decrement, index calc, swap) is explicit; keep the existing variable names (idx, rng, j, tmp, shuffle_order) and preserve the same logic and boundaries so behavior remains unchanged.bench-verify.cpp (1)
80-83: Consider usingPRIu64for portableuint64_tformatting.
%luassumesuint64_tisunsigned long, which isn't guaranteed on all platforms (e.g., Windows MSVC usesunsigned long long).🔧 Portable format specifier
+#include <cinttypes> ... - printf("ABSEIL\t%s\tn=%zu\tload=%zu\tordered_us=%lu\tshuffled_us=%lu\n", + printf("ABSEIL\t%s\tn=%zu\tload=%zu\tordered_us=%" PRIu64 "\tshuffled_us=%" PRIu64 "\n", label, n, fill * 100 / n, median(lkp_ordered, MEASURED), median(lkp_shuffled, MEASURED));🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bench-verify.cpp` around lines 80 - 83, The printf in bench-verify.cpp that prints median(lkp_ordered, MEASURED) and median(lkp_shuffled, MEASURED) uses %lu which is not portable for uint64_t; update the format to use the PRIu64 macro (e.g., "%" PRIu64) for those two fields and ensure you `#include` <inttypes.h> at the top of the file so PRIu64 is defined; keep the existing %zu for size_t (n, load) and adjust nothing else.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@bench-abseil-strings.cpp`:
- Around line 75-81: The benchmark currently performs hit lookups in insertion
order by iterating keys[i] in the loop (using variables keys, map.find, fill,
do_not_optimize), so change it to perform shuffled lookups: create a
vector<size_t> indices of length fill, populate with 0..fill-1, shuffle it with
std::shuffle and a std::mt19937 seeded appropriately, then iterate using
keys[indices[i]] (and call map.find on that) to measure lookup_us; ensure the
shuffle happens once before the timing window so only lookup cost is measured.
In `@bench-abseil.cpp`:
- Around line 62-68: The lookup loop in bench-abseil.cpp uses insertion-order
keys (keys[i]) for hit lookups; change it to perform shuffled random access like
the other benchmarks by indexing keys with the existing shuffle_order
permutation (or by shuffling keys prior). Specifically, in the block that
measures lookup_us, replace the direct keys[i] access with
keys[shuffle_order[i]] (or apply std::shuffle to keys before the loop) so the
map.find calls use the shuffled order; keep other variables (map, fill,
do_not_optimize, lookup_us) unchanged.
In `@bench-go/go.mod`:
- Line 5: The go.mod entry for github.com/cockroachdb/swiss is marked as
indirect but main.go imports it directly; remove the "// indirect" marker or add
the dependency as a direct requirement by running `go get
github.com/cockroachdb/swiss@v0.0.0-20251224182025-b0f6560f979b` (or update the
version) so the require line for github.com/cockroachdb/swiss becomes a direct
dependency; check main.go to confirm the import and then run `go mod tidy` to
ensure module file consistency.
In `@bench-go/main.go`:
- Around line 113-114: The RESULT line in bench-go/main.go currently prints an
extra "impl=go-swiss" field and places miss_us before delete_us, which breaks
bench-strings.sh parsing; update the fmt.Printf call (the RESULT formatter in
main.go) to remove the "impl=..." token and reorder the placeholders so
delete_us is printed before miss_us (i.e., format: RESULT\t
n=...\tload=...\tinsert_us=...\tlookup_us=...\tdelete_us=...\tmiss_us=...),
ensuring the field names and order exactly match what bench-strings.sh expects;
alternatively, if you prefer supporting both formats, add parsing logic to
bench-strings.sh to accept the "impl=" field and both field orders, but prefer
standardizing main.go's PRINTF first.
In `@bench-rust/target/.rustc_info.json`:
- Line 1: The bench build artifacts (the target directory and files like
.rustc_info.json) were committed; update the repo by adding the
bench-rust/target/ entry to .gitignore and stop tracking those files, then
remove the committed artifacts from git (unstage/remove from index) and commit
the change so the target files are no longer tracked; ensure .rustc_info.json
and any other files under the target directory are removed from the repository
history or at least untracked going forward.
In `@bench-rust/target/CACHEDIR.TAG`:
- Around line 1-3: Remove the committed Cargo build output by deleting the
CACHEDIR.TAG and other files under the bench-rust/target directory from version
control and ignore future artifacts: add an entry to .gitignore to exclude the
Rust target directory (e.g., a pattern matching target/ or bench-rust/target/),
then remove the already-committed files from the index (untrack them) and commit
the changes so the build outputs are no longer in the repository.
In
`@bench-rust/target/release/.fingerprint/bench-hashbrown-f2a29ac529dfc934/bin-bench-hashbrown.json`:
- Line 1: Add bench-rust/target/ to .gitignore and remove the already committed
build artifacts: update the repository .gitignore (at repo root or bench-rust/)
to include a line "target/" and then remove the committed files from git
tracking (e.g., git rm --cached) for entries under bench-rust/target/, including
bench-rust/target/release/.fingerprint/.../bin-bench-hashbrown.json so they
remain locally but are no longer tracked or pushed.
In `@bench-verify.cpp`:
- Around line 42-46: The shuffle in bench-verify.cpp currently instantiates
std::mt19937_64 rng(42) and uses it in the Fisher–Yates loop that swaps
lookup_order; replace the RNG with the same splitmix64 implementation and
seeding (seed 42) used by src/autobench-verify.zig so the generated pseudorandom
values match across implementations; keep the existing loop and swap logic (the
for loop touching lookup_order and swapping by index j) but replace the rng
type/constructor and RNG call site to use splitmix64.next() (or the C++
splitmix64 equivalent) so shuffled_us comparisons are valid.
In `@insights-v3.md`:
- Around line 28-30: Update the paragraph that claims Abseil is "thread-safe
with some overhead" to instead state that these benchmarks are single-threaded
and that absl::flat_hash_map (and Abseil containers) follow STL-style semantics
and require external synchronization for concurrent mutation; replace the
phrasing with a clear caveat that concurrent use of absl::flat_hash_map needs
external locking (e.g., mutexes) and is not intrinsically thread-safe while
keeping the rest of the note about single-threaded benchmarking and absl::Hash
vs integer-specialized multiply-hash intact.
In `@program-v2.md`:
- Around line 95-100: Update the bench-abseil.cpp setup to avoid the misleading
reserve(fill) instruction: in the file/benchmark setup where it currently says
to call reserve(fill) before inserts, change it to the corrected reservation
scheme reserve(n) (or clearly mark the reserve(fill) line as historical/legacy)
so the benchmark uses the final, unbiased approach described in insights-v2.md;
locate the text referring to reserve(fill) in the bench-abseil.cpp instructions
and replace it with the reserve(n) wording (or add an explicit “historical”
comment) so readers replaying v2 will run the corrected phase-1.
In `@program-v5-cross-lang.md`:
- Around line 25-28: The benchmark currently labels the target "Rust hashbrown"
but uses std::collections::HashMap (which uses RandomState/SipHash), conflating
hasher cost with table implementation; update the plan to either (A) explicitly
benchmark hashbrown::HashMap with a consistent fast hasher (e.g., use
hashbrown::HashMap with ahash/FxHasher configured) so the comparison isolates
table performance, or (B) rename the target to "Rust std::HashMap" and note that
results include RandomState/SipHash overhead; refer to HashMap,
hashbrown::HashMap, RandomState, and SipHash when making the change so the
intent and measurement are unambiguous.
In `@src/autobench-realistic.zig`:
- Around line 25-29: The usElapsed function computes microseconds but due to
operator precedence only ns_diff is divided by 1_000, making the seconds term
remain in nanoseconds; fix by computing total_ns = s * 1_000_000_000 + `@as`(i64,
`@intCast`(ns_diff)) (or properly cast ns_diff to i64 and add), then return
`@as`(u64, total_ns / 1_000); update the usElapsed implementation to build
total_ns first (using symbols s, ns_diff, and usElapsed) and perform a single
division to produce microseconds.
In `@src/autobench-strings.zig`:
- Around line 40-44: The usElapsed function is unsafe because it casts a
potentially negative ns_diff into u64 which will panic; update usElapsed to
perform the nanosecond arithmetic in a signed wide integer (e.g., i128 or i64)
by combining (end.sec - start.sec) * 1_000_000_000 with ns_diff as a signed
value, then divide by 1_000 and finally cast the non-negative result to u64;
target the function name usElapsed and the local ns_diff/end.sec/start.sec
symbols when making the change and ensure you guard/cast only after the combined
total is computed and verified non-negative.
In `@src/hybrid.zig`:
- Around line 595-609: defaultGetOverflow currently only scans tier 1, causing
misses for elements inserted into tier 2+ by insertAnyTier; change it to mirror
getWithProbes by iterating all tiers from 1 to self.num_tiers-1 and for each
tier use that tier's tier_bucket_counts[i] and tier_starts[i], probe up to
`@min`(MAX_PROBES, num_buckets) using bucketIndex, call findValueInBucket(abs_idx,
key, fp) and return the value if found, and perform the same per-tier early
termination using matchEmpty(&self.fingerprints[abs_idx]) to return null for
that tier; only return null after all tiers/probes are exhausted. Ensure you
reference defaultGetOverflow, getWithProbes, insertAnyTier, num_tiers,
tier_bucket_counts, tier_starts, MAX_PROBES, findValueInBucket, matchEmpty,
fingerprints, and bucketIndex when making the change.
In `@src/string_hybrid.zig`:
- Around line 302-312: defaultGetOverflow currently only inspects tier 1 which
misses items inserted by insertAnyTier into higher tiers; update
defaultGetOverflow(self: *const Self, h: u64, key: []const u8, fp: u8) to
iterate tiers 1..self.num_tiers, and for each tier set num_buckets =
self.tier_bucket_counts[tier] and tier_start = self.tier_starts[tier], then
perform the existing probe loop (using bucketIndex, findValueInBucket, and
matchEmpty) for that tier and return the found value or continue to the next
tier; keep the same MAX_PROBES logic per tier and return null if not found in
any overflow tier.
- Around line 314-330: remove() only probes tier 0 and doesn't update
tier_slot_counts, so items placed in higher tiers by insertAnyTier() cannot be
removed and tier_slot_counts[0] stays stale; update remove(self: *Self, key:
[]const u8) to probe across all tiers for the bucket (use the same bucket_idx
calculation but for each probe check each tier’s slot via findKeyInBucket or a
new helper), and when you tombstone an entry set fingerprints[bucket_idx][slot]
= TOMBSTONE, decrement self.count and also decrement
self.tier_slot_counts[tier_index] (use the tier you found the key in); ensure
the code updates the correct tier_slot_counts element and keeps existing
TOMBSTONE, count, and fingerprint semantics so getEmptyFraction() and insertion
logic remain correct (or if you intentionally won't support removal from
overflow tiers, document that limitation in insertAnyTier/remove).
In `@verify-results.md`:
- Around line 60-66: The document contains a contradictory note: the line
stating "clang++ not available for cross-check" is now stale because the later
"Update: clang 22.1.1 now tested." reports clang results; remove the earlier
"clang++ not available for cross-check" sentence (or reword it into a historical
note) so that the section only reflects the current clang testing status and
keeps the abseil vs clang/GCC comparison consistent.
In `@verify-strings.cpp`:
- Around line 41-43: median_val currently returns arr[n/2], which picks the
upper middle for even n and biases medians; after sorting the uint64_t* arr in
median_val(sort(arr, arr+n)), return the true median: if n is odd return
arr[n/2], otherwise compute the average of the two middle values (arr[n/2 - 1]
and arr[n/2]) and return that average as uint64_t to handle even-sized samples
correctly.
- Around line 57-70: The microbenchmark currently uses the minimum runtime
(variable best) across 10 runs for absl::Hash<string_view>, producing an
optimistic outlier; change it to compute the median runtime after the same
warmup behavior used elsewhere: record each run's duration (from start/end
around the inner loop that uses hasher(std::string_view(keys[i & 1023], 16))),
skip the warmup runs as done in the other tests, sort the collected durations
and pick the median value to print (use that median in the printf and
nanoseconds/hash calculation instead of best). Ensure you reference and reuse N,
keys, hasher, and the 10-run loop so the change is localized.
---
Outside diff comments:
In `@src/hybrid.zig`:
- Around line 611-628: remove() only probes tier0 (uses
tier0_bucket_mask/tier0_bucket_shift) so it never finds entries inserted into
higher tiers by insertIntoTier/insertAnyTier; modify remove (or add a helper) to
mirror get() behavior by delegating to the overflow lookup for tiers >0: for
each tier attempt the same probe loop using that tier's bucket_mask and
bucket_shift (or call get_overflow_fn-style logic) and when
findKeyInBucket(bucket_idx, key, fp) returns a slot, set
fingerprints[bucket_idx][slot] = TOMBSTONE and decrement count and return true;
if no tier finds the key after probing MAX_PROBES per tier return false. Ensure
you reference and use findKeyInBucket, MAX_PROBES, TOMBSTONE, fingerprints,
count, and the per-tier bucket_mask/bucket_shift logic rather than always using
tier0_*.
---
Duplicate comments:
In `@src/autobench-verify.zig`:
- Around line 26-31: The usElapsed function can underflow when ns_diff is
negative; change the math to use signed arithmetic and only cast to u64 after
combining seconds and nanoseconds: treat the seconds difference as a signed
value (e.g., s_i64 = `@as`(i64, end.sec - start.sec)), keep ns_diff as i64,
compute total_ns in a sufficiently-wide signed type (e.g., i128) as
total_ns_signed = s_i64 * 1_000_000_000 + ns_diff, then cast total_ns_signed to
u64 once (or assert/non-negatively check) before dividing by 1_000; update the
usElapsed function in src/autobench-verify.zig accordingly to avoid casting a
negative ns_diff to u64.
---
Nitpick comments:
In `@bench-go/go.mod`:
- Around line 1-5: Delete the stale duplicate go.mod file that contains "module
bench-go" and "go 1.25" (this leftover has no Go source files and is
inconsistent with the main bench-go go.mod). Remove the file from the repo
(e.g., git rm), commit the deletion, and ensure there are no references or
imports relying on that duplicate; if CI or tooling references it, update those
references to the main bench-go/go.mod.
In `@bench-realistic.cpp`:
- Around line 132-134: The loop dereferences it->second after auto it =
map.find(lookup_keys[i]) without verifying the iterator is valid; update the
code around map.find/it to either assert(it != map.end()) or add a defensive
check (if (it == map.end()) handle error/continue) before calling
do_not_optimize(it->second) so lookups from lookup_keys are safe if key
generation changes.
- Around line 34-35: Update the "Workload 1" comment to explicitly state the
hit/miss split instead of just "90% lookup, 5% insert, 5% delete" — replace it
with a line such as "Workload 1: Mixed read/write (80% hit lookup, 10% miss
lookup, 5% insert, 5% delete) — simulates a cache or index under steady-state
load" so the actual 80/10/5/5 breakdown is clear; locate the comment near the
Workload 1 block in bench-realistic.cpp to make the change.
In `@bench-rust/src/main.rs`:
- Around line 70-77: The calls to map.get currently include redundant casts like
`&keys[order[i]] as &[u8]`; remove the unnecessary `as &[u8]` so you pass
`&keys[order[i]]` and `&miss_keys[order[i]]` directly to map.get. Update the
lookup loops that use `map.get(&keys[order[i]] as &[u8])` and
`map.get(&miss_keys[order[i]] as &[u8])` (in the blocks computing `lookup_us`
and the shuffled miss lookup) to call `map.get(&keys[order[i]])` and
`map.get(&miss_keys[order[i]])` respectively.
In `@bench-strings-verify.cpp`:
- Around line 61-66: The benchmark currently dereferences iterators returned by
map.find(...) (see use of map.find, keys, shuffle_order and do_not_optimize)
without checking for map.end(), which can segfault if a key is missing; add a
debug-only assertion before the dereference in both loops (after auto it =
map.find(...)) to assert(it != map.end()) so hits are still assumed in release
but missing-key bugs are caught during development.
In `@bench-strings.sh`:
- Around line 5-8: The pkg-config invocation in the g++ command is unquoted
which can cause word-splitting/globbing issues; update the g++ invocation that
builds bench-abseil-strings to wrap the pkg-config substitution in double quotes
(i.e., quote the $(pkg-config --cflags --libs absl_hash absl_raw_hash_set
absl_hashtablez_sampler) part) so the flags are treated as a single expansion
and not split by the shell.
- Around line 40-48: The current conditional that checks
e_ins/e_get/e_del/e_miss using -gt can fail if any extracted value is
non-numeric; before performing the integer comparisons in the if block (the line
starting with "if [ -n "$e_ins" ] && ... && [ "$e_miss" -gt 0 ]; then"), add a
simple numeric validation for the four extracted variables (e_ins, e_get, e_del,
e_miss) — e.g., test each against a numeric regex (allow integers or decimals)
or coerce/validate with awk — and only proceed to compute
ins_gap/get_gap/del_gap/mis_gap and printf when all four pass the numeric check
and are >0. Ensure you reference and validate the same variables used later
(a_ins/a_get/a_del/a_miss and e_*) so the awk division and printf are only
executed with safe numeric inputs.
In `@bench-verify.cpp`:
- Around line 80-83: The printf in bench-verify.cpp that prints
median(lkp_ordered, MEASURED) and median(lkp_shuffled, MEASURED) uses %lu which
is not portable for uint64_t; update the format to use the PRIu64 macro (e.g.,
"%" PRIu64) for those two fields and ensure you `#include` <inttypes.h> at the top
of the file so PRIu64 is defined; keep the existing %zu for size_t (n, load) and
adjust nothing else.
In `@bench.sh`:
- Line 14: The current pipeline using grep -oP 'lookup_ratio=\K[0-9.]+' is not
portable because grep -P is a GNU-only (Perl) extension; replace the trailing
grep -oP step with a portable sed or awk expression to extract the numeric value
from lines matched by grep "RESULT" and grep "n=1048576" (i.e., update the
pipeline that currently reads grep "RESULT" bench.log | grep "n=1048576" | grep
-oP 'lookup_ratio=\K[0-9.]+'). Use sed -n with a capture group or awk to print
the captured lookup_ratio value so the script works on macOS/BSD as well.
In `@insights.md`:
- Around line 10-27: The fenced pseudocode block that shows get(key) and
findKeyInBucket lacks a language specifier, so update the opening fence to
include one (e.g., ```text or ```pseudo) to ensure consistent
rendering/highlighting; leave the content of get(key) and findKeyInBucket
unchanged and keep the closing ``` intact.
In `@program-v3.md`:
- Around line 92-99: The fenced diagram blocks showing the memory layout (the
lines starting with "fingerprints: [tier0 bucket 0]...") should include a
Markdown language specifier for consistent rendering; update both fenced code
blocks that display the fingerprints layout to use a plain text specifier (e.g.,
change ``` to ```text) so the blocks that contain "fingerprints: [tier0 bucket
0][tier0 bucket 1]...[tier1 bucket 0][tier1 bucket 1]..." and the interleaved
version "fingerprints: [tier0 bucket 0][tier1 bucket 0][tier0 bucket 1][tier1
bucket 1]..." are fenced as ```text.
- Around line 162-164: The fenced code block containing the TSV header line
"commit\tfind_rate\tlookup_gap_99\tmiss_gap_99\tinsert_gap_99\tstatus\tdescription"
should include a language specifier for TSV; update the opening fence from "```"
to "```tsv" so the header block is marked as TSV (locate the block that contains
the exact header string to apply the change).
In `@program.md`:
- Around line 74-77: Update the fenced code block in the "Log. Append a line to
`results.tsv`" section so it includes a language specifier (e.g., ```tsv)
immediately after the opening backticks; locate the block showing
"<commit_hash>\t<lookup_ratio_1M>\t<insert_ratio_1M>\t<delete_ratio_1M>\t<keep|revert|crash>\t<description>"
and change the opening fence to include the specifier to ensure proper markdown
rendering.
In `@README.md`:
- Around line 106-110: Update the fenced code block that contains the three
commands starting with "zig build test" to include a language identifier by
changing the opening ``` to ```bash so the block reads as a bash snippet; locate
the triple-backtick fenced block which encloses "zig build test", "zig build
bench", and "bash bench-v2.sh" and add the "bash" specifier for proper syntax
highlighting and lint compliance.
In `@src/autobench-strings-verify.zig`:
- Line 56: The one-line Fisher‑Yates shuffle in the loop is correct but hard to
read; refactor the while loop that uses idx, rng, splitmix64, and shuffle_order
into multiple lines: decrement idx on its own line, compute j = splitmix64(&rng)
% (idx + 1) on the next line, then swap using a temporary variable (tmp) across
two lines so each step (decrement, index calc, swap) is explicit; keep the
existing variable names (idx, rng, j, tmp, shuffle_order) and preserve the same
logic and boundaries so behavior remains unchanged.
In `@src/autobench.zig`:
- Around line 84-126: Introduce a comptime toggle (e.g., comptime const
SHOW_DISTRIBUTION = false) and wrap the entire diagnostic block (the if (r == 0
and load_pct == 99 and n == 1_048_576) { ... } including the brute-force scan
that references map, map.get, map.getWithProbes, map.entries, map.fingerprints,
KEY_SEED, and splitmix64) in if (comptime SHOW_DISTRIBUTION) { ... } so the
prints and scan are compiled out by default and can be enabled at compile time
for debugging.
In `@src/hybrid.zig`:
- Around line 403-406: The runtime hash function inline fn hash(key: u64)
currently uses a single-multiply-XOR scheme that differs from the
ComptimeHybridElasticHash.hash() implementation; make them consistent by
replacing the runtime hash body with the same wyhash-style
double-multiply/xor/shift sequence used in ComptimeHybridElasticHash.hash()
(same constants and operations) so both runtime and comptime hashes produce the
same distribution, or if the simpler runtime variant is intentional, add a
TODO/comment in inline fn hash referencing ComptimeHybridElasticHash.hash and
include a brief note to keep behavior divergence deliberate and benchmarked;
locate the function by name (inline fn hash) and the comptime implementation
(ComptimeHybridElasticHash.hash) to copy or document the logic.
In `@src/string_hybrid.zig`:
- Around line 39-47: StringEntry currently stores a raw pointer (key_ptr) and
length (key_len) that reference caller-owned memory, which can become invalid;
update StringEntry to either (A) store an owned slice or copy of the key (e.g.,
allocate and store []const u8 or a Zig ArrayList/Allocator-managed buffer) or
(B) explicitly document the lifetime requirement and provide a safe constructor
that takes ownership or clones the key; adjust the StringEntry.key(self) method
accordingly to return the owned slice, and add a clear comment on the struct (or
add a create/copy function) describing that callers must ensure the backing
buffer outlives the entry if you keep the current raw-pointer approach.
In `@src/verify-strings.zig`:
- Line 62: The one-line Fisher–Yates shuffle inside verify-strings.zig is hard
to read; expand the single-line loop into a multi-line, well-indented while loop
using the same variables (idx, rng, splitmix64, shuffle_order, tmp) so each
operation is on its own line, and then optionally move that logic into a shared
helper function (e.g., shuffleArray or fyShuffle) that accepts the slice and rng
pointer so autobench-strings-verify.zig and verify-strings.zig can both call it.
In `@verify-hash-cost.cpp`:
- Around line 64-69: The printf calls use %lu for variables best_ours and
best_abseil which is not portable for uint64_t; include <cinttypes> (or
<inttypes.h>) and switch the format specifiers to use PRIu64 (e.g. "%" PRIu64)
for best_ours and best_abseil, and ensure any arithmetic involving N is cast
appropriately (or printed with a matching specifier) so the expressions
best_ours * 1000.0 / N and best_abseil * 1000.0 / N still format correctly;
update the three printf lines referencing best_ours, best_abseil and the ratio
accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 81e28ea9-888f-489b-96c2-8adfa31b31e5
⛔ Files ignored due to path filters (8)
abseil-strings.logis excluded by!**/*.logbench-go/go.sumis excluded by!**/*.sumbench-rust/Cargo.lockis excluded by!**/*.lockbench-v2.logis excluded by!**/*.logelastic-strings.logis excluded by!**/*.logresults-v2.tsvis excluded by!**/*.tsvresults-v3.tsvis excluded by!**/*.tsvresults.tsvis excluded by!**/*.tsv
📒 Files selected for processing (61)
.gitignoreAGENTS.mdREADME.mdbench-abseil-clangbench-abseil-stringsbench-abseil-strings.cppbench-abseil.cppbench-go/bench-gobench-go/go.modbench-go/main.gobench-realisticbench-realistic.cppbench-rust/Cargo.tomlbench-rust/bench-go/go.modbench-rust/src/main.rsbench-rust/target/.rustc_info.jsonbench-rust/target/CACHEDIR.TAGbench-rust/target/release/.cargo-lockbench-rust/target/release/.fingerprint/bench-hashbrown-f2a29ac529dfc934/bin-bench-hashbrownbench-rust/target/release/.fingerprint/bench-hashbrown-f2a29ac529dfc934/bin-bench-hashbrown.jsonbench-rust/target/release/.fingerprint/bench-hashbrown-f2a29ac529dfc934/dep-bin-bench-hashbrownbench-rust/target/release/.fingerprint/bench-hashbrown-f2a29ac529dfc934/invoked.timestampbench-rust/target/release/bench-hashbrownbench-rust/target/release/bench-hashbrown.dbench-rust/target/release/deps/bench_hashbrown-f2a29ac529dfc934bench-rust/target/release/deps/bench_hashbrown-f2a29ac529dfc934.dbench-strings-verifybench-strings-verify.cppbench-strings.shbench-v2.shbench-verifybench-verify.cppbench.shbuild.zigcross-lang-results.mdinsights-v2.mdinsights-v3.mdinsights.mdprogram-v2.mdprogram-v3-verify.mdprogram-v3.mdprogram-v4-verify-strings.mdprogram-v5-cross-lang.mdprogram.mdresults.mdsrc/autobench-realistic.zigsrc/autobench-strings-verify.zigsrc/autobench-strings.zigsrc/autobench-verify.zigsrc/autobench.zigsrc/hybrid.zigsrc/string_hybrid.zigsrc/verify-strings.zigverify-capacityverify-capacity.cppverify-hash-costverify-hash-cost.cppverify-results.mdverify-strings-abseilverify-strings-results.mdverify-strings.cpp
| start = std::chrono::steady_clock::now(); | ||
| for (size_t i = 0; i < fill; i++) { | ||
| auto it = map.find(keys[i]); | ||
| do_not_optimize(it->second); | ||
| } | ||
| end = std::chrono::steady_clock::now(); | ||
| uint64_t lookup_us = std::chrono::duration_cast<std::chrono::microseconds>(end - start).count(); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if autobench.zig or autobench-strings.zig shuffle hit lookups
echo "=== Checking Zig benchmark lookup patterns ==="
# Look for shuffle usage near lookup timing in Zig files
rg -n -A5 -B5 "shuffle" src/autobench.zig src/autobench-strings.zig 2>/dev/null || echo "Files not found or no shuffle"
# Check if hit lookups use shuffle_order
rg -n "shuffle_order\[" src/autobench.zig src/autobench-strings.zig 2>/dev/null || echo "No shuffle_order usage found"Repository: joshuaisaact/elastic-hash
Length of output: 174
🏁 Script executed:
# First, let's look at bench-abseil.cpp around lines 62-68 to confirm the lookup pattern
head -80 bench-abseil.cpp | tail -25Repository: joshuaisaact/elastic-hash
Length of output: 1145
🏁 Script executed:
# Search for README and check for "shuffled random access" or similar claims
find . -iname "README*" -type f | head -5Repository: joshuaisaact/elastic-hash
Length of output: 80
🏁 Script executed:
# Check if there's any shuffling in C++ benchmarks
rg -n "shuffle" bench-*.cpp 2>/dev/null || echo "No shuffle found in C++ benchmarks"Repository: joshuaisaact/elastic-hash
Length of output: 1771
🏁 Script executed:
cat README.md | head -150Repository: joshuaisaact/elastic-hash
Length of output: 5535
🏁 Script executed:
# Search README for keywords about shuffled, random, access patterns
rg -n "shuffled|random.access|hit.lookup" README.mdRepository: joshuaisaact/elastic-hash
Length of output: 502
Hit lookups use insertion order, not shuffled access.
The README documents "shuffled random access" for hit lookups (line 13), and the verification benchmarks (bench-verify.cpp, bench-strings-verify.cpp) implement this with shuffle_order. However, bench-abseil.cpp lines 63-66 iterate keys in insertion order (keys[i] in a simple loop) rather than shuffled random access. Align the main benchmark with the documented methodology.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@bench-abseil.cpp` around lines 62 - 68, The lookup loop in bench-abseil.cpp
uses insertion-order keys (keys[i]) for hit lookups; change it to perform
shuffled random access like the other benchmarks by indexing keys with the
existing shuffle_order permutation (or by shuffling keys prior). Specifically,
in the block that measures lookup_us, replace the direct keys[i] access with
keys[shuffle_order[i]] (or apply std::shuffle to keys before the loop) so the
map.find calls use the shuffled order; keep other variables (map, fill,
do_not_optimize, lookup_us) unchanged.
|
|
||
| go 1.25.4 | ||
|
|
||
| require github.com/cockroachdb/swiss v0.0.0-20251224182025-b0f6560f979b // indirect |
There was a problem hiding this comment.
Dependency should be direct, not indirect.
The main.go directly imports github.com/cockroachdb/swiss, so this should be a direct dependency. The // indirect comment is incorrect and typically indicates the dependency is only transitively required.
Proposed fix
-require github.com/cockroachdb/swiss v0.0.0-20251224182025-b0f6560f979b // indirect
+require github.com/cockroachdb/swiss v0.0.0-20251224182025-b0f6560f979b📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| require github.com/cockroachdb/swiss v0.0.0-20251224182025-b0f6560f979b // indirect | |
| require github.com/cockroachdb/swiss v0.0.0-20251224182025-b0f6560f979b |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@bench-go/go.mod` at line 5, The go.mod entry for github.com/cockroachdb/swiss
is marked as indirect but main.go imports it directly; remove the "// indirect"
marker or add the dependency as a direct requirement by running `go get
github.com/cockroachdb/swiss@v0.0.0-20251224182025-b0f6560f979b` (or update the
version) so the require line for github.com/cockroachdb/swiss becomes a direct
dependency; check main.go to confirm the import and then run `go mod tidy` to
ensure module file consistency.
| fmt.Printf("RESULT\timpl=go-swiss\tn=%d\tload=%d\tinsert_us=%d\tlookup_us=%d\tmiss_us=%d\tdelete_us=%d\n", | ||
| n, loadPct, median(ins), median(lkp), median(mis), median(del)) |
There was a problem hiding this comment.
Output format differs from Abseil string benchmark.
The Go output includes an impl= field and has miss_us before delete_us, while bench-strings.sh expects no impl= field and delete_us before miss_us. This Go benchmark likely needs its own parser or the output should be standardized.
📋 Format comparison
Go output:
RESULT\timpl=go-swiss\tn=\tload=\tinsert_us=\tlookup_us=\tmiss_us=\tdelete_us=
bench-strings.sh expects:
RESULT\tn=\tload=\tinsert_us=\tlookup_us=\tdelete_us=\tmiss_us=
If you want to use bench-strings.sh with Go output, either add Go support to the parser or standardize the output format.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@bench-go/main.go` around lines 113 - 114, The RESULT line in bench-go/main.go
currently prints an extra "impl=go-swiss" field and places miss_us before
delete_us, which breaks bench-strings.sh parsing; update the fmt.Printf call
(the RESULT formatter in main.go) to remove the "impl=..." token and reorder the
placeholders so delete_us is printed before miss_us (i.e., format: RESULT\t
n=...\tload=...\tinsert_us=...\tlookup_us=...\tdelete_us=...\tmiss_us=...),
ensuring the field names and order exactly match what bench-strings.sh expects;
alternatively, if you prefer supporting both formats, add parsing logic to
bench-strings.sh to accept the "impl=" field and both field orders, but prefer
standardizing main.go's PRINTF first.
| @@ -0,0 +1 @@ | |||
| {"rustc_fingerprint":16647195171711883400,"outputs":{"17747080675513052775":{"success":true,"status":"","code":0,"stdout":"rustc 1.94.0 (4a4ef493e 2026-03-02) (Arch Linux rust 1:1.94.0-2)\nbinary: rustc\ncommit-hash: 4a4ef493e3a1488c6e321570238084b38948f6db\ncommit-date: 2026-03-02\nhost: x86_64-unknown-linux-gnu\nrelease: 1.94.0\nLLVM version: 22.1.0\n","stderr":""},"7971740275564407648":{"success":true,"status":"","code":0,"stdout":"___\nlib___.rlib\nlib___.so\nlib___.so\nlib___.a\nlib___.so\n/usr\noff\npacked\nunpacked\n___\ndebug_assertions\npanic=\"unwind\"\nproc_macro\ntarget_abi=\"\"\ntarget_arch=\"x86_64\"\ntarget_endian=\"little\"\ntarget_env=\"gnu\"\ntarget_family=\"unix\"\ntarget_feature=\"fxsr\"\ntarget_feature=\"sse\"\ntarget_feature=\"sse2\"\ntarget_has_atomic=\"16\"\ntarget_has_atomic=\"32\"\ntarget_has_atomic=\"64\"\ntarget_has_atomic=\"8\"\ntarget_has_atomic=\"ptr\"\ntarget_os=\"linux\"\ntarget_pointer_width=\"64\"\ntarget_vendor=\"unknown\"\nunix\n","stderr":""}},"successes":{}} No newline at end of file | |||
There was a problem hiding this comment.
Build artifact should not be committed.
The bench-rust/target/ directory contains Cargo build artifacts that are machine-specific and auto-generated. Add bench-rust/target/ to .gitignore and remove these files from the repository.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@bench-rust/target/.rustc_info.json` at line 1, The bench build artifacts (the
target directory and files like .rustc_info.json) were committed; update the
repo by adding the bench-rust/target/ entry to .gitignore and stop tracking
those files, then remove the committed artifacts from git (unstage/remove from
index) and commit the change so the target files are no longer tracked; ensure
.rustc_info.json and any other files under the target directory are removed from
the repository history or at least untracked going forward.
| fn defaultGetOverflow(self: *const Self, h: u64, key: []const u8, fp: u8) ?u64 { | ||
| if (self.num_tiers <= 1) return null; | ||
| const num_buckets = self.tier_bucket_counts[1]; | ||
| const tier_start = self.tier_starts[1]; | ||
| for (0..@min(MAX_PROBES, num_buckets)) |probe| { | ||
| const abs_idx = tier_start + bucketIndex(h, probe, num_buckets); | ||
| if (self.findValueInBucket(abs_idx, key, fp)) |val| return val; | ||
| if (matchEmpty(&self.fingerprints[abs_idx]) != 0) return null; | ||
| } | ||
| return null; | ||
| } |
There was a problem hiding this comment.
defaultGetOverflow only searches tier 1, but insertAnyTier can place elements in any tier.
insertAnyTier() (line 270) iterates for (0..self.num_tiers) |tier|, potentially placing elements in tiers 2+. However, defaultGetOverflow() only searches tier 1 (line 304), making elements in higher tiers invisible to get().
This could cause get() to return null for keys that were successfully inserted.
🐛 Proposed fix to search all overflow tiers
fn defaultGetOverflow(self: *const Self, h: u64, key: []const u8, fp: u8) ?u64 {
- if (self.num_tiers <= 1) return null;
- const num_buckets = self.tier_bucket_counts[1];
- const tier_start = self.tier_starts[1];
- for (0..@min(MAX_PROBES, num_buckets)) |probe| {
- const abs_idx = tier_start + bucketIndex(h, probe, num_buckets);
- if (self.findValueInBucket(abs_idx, key, fp)) |val| return val;
- if (matchEmpty(&self.fingerprints[abs_idx]) != 0) return null;
+ for (1..self.num_tiers) |tier| {
+ const num_buckets = self.tier_bucket_counts[tier];
+ const tier_start = self.tier_starts[tier];
+ for (0..@min(MAX_PROBES, num_buckets)) |probe| {
+ const abs_idx = tier_start + bucketIndex(h, probe, num_buckets);
+ if (self.findValueInBucket(abs_idx, key, fp)) |val| return val;
+ }
}
return null;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| fn defaultGetOverflow(self: *const Self, h: u64, key: []const u8, fp: u8) ?u64 { | |
| if (self.num_tiers <= 1) return null; | |
| const num_buckets = self.tier_bucket_counts[1]; | |
| const tier_start = self.tier_starts[1]; | |
| for (0..@min(MAX_PROBES, num_buckets)) |probe| { | |
| const abs_idx = tier_start + bucketIndex(h, probe, num_buckets); | |
| if (self.findValueInBucket(abs_idx, key, fp)) |val| return val; | |
| if (matchEmpty(&self.fingerprints[abs_idx]) != 0) return null; | |
| } | |
| return null; | |
| } | |
| fn defaultGetOverflow(self: *const Self, h: u64, key: []const u8, fp: u8) ?u64 { | |
| for (1..self.num_tiers) |tier| { | |
| const num_buckets = self.tier_bucket_counts[tier]; | |
| const tier_start = self.tier_starts[tier]; | |
| for (0..@min(MAX_PROBES, num_buckets)) |probe| { | |
| const abs_idx = tier_start + bucketIndex(h, probe, num_buckets); | |
| if (self.findValueInBucket(abs_idx, key, fp)) |val| return val; | |
| } | |
| } | |
| return null; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/string_hybrid.zig` around lines 302 - 312, defaultGetOverflow currently
only inspects tier 1 which misses items inserted by insertAnyTier into higher
tiers; update defaultGetOverflow(self: *const Self, h: u64, key: []const u8, fp:
u8) to iterate tiers 1..self.num_tiers, and for each tier set num_buckets =
self.tier_bucket_counts[tier] and tier_start = self.tier_starts[tier], then
perform the existing probe loop (using bucketIndex, findValueInBucket, and
matchEmpty) for that tier and return the found value or continue to the next
tier; keep the same MAX_PROBES logic per tier and return null if not found in
any overflow tier.
| pub fn remove(self: *Self, key: []const u8) bool { | ||
| const h = hash(key); | ||
| const fp = fingerprint(h); | ||
| const mask = self.tier0_bucket_mask; | ||
| const bucket_base = h >> self.tier0_bucket_shift; | ||
|
|
||
| var probe: usize = 0; | ||
| while (probe < MAX_PROBES) : (probe += 1) { | ||
| const bucket_idx = (bucket_base +% @as(u64, probe)) & mask; | ||
| if (self.findKeyInBucket(bucket_idx, key, fp)) |slot| { | ||
| self.fingerprints[bucket_idx][slot] = TOMBSTONE; | ||
| self.count -= 1; | ||
| return true; | ||
| } | ||
| } | ||
| return false; | ||
| } |
There was a problem hiding this comment.
remove() only searches tier 0 and doesn't decrement tier_slot_counts.
Two issues:
- Elements inserted via
insertAnyTier()(lines 267-284) may reside in tiers 1+, butremove()only probes tier 0, leaving those elements un-removable. - When removing from tier 0,
tier_slot_counts[0]is not decremented, causinggetEmptyFraction()to return incorrect values and potentially affecting insertion decisions.
🐛 Proposed fix to decrement tier_slot_counts
fn remove(self: *Self, key: []const u8) bool {
const h = hash(key);
const fp = fingerprint(h);
const mask = self.tier0_bucket_mask;
const bucket_base = h >> self.tier0_bucket_shift;
var probe: usize = 0;
while (probe < MAX_PROBES) : (probe += 1) {
const bucket_idx = (bucket_base +% `@as`(u64, probe)) & mask;
if (self.findKeyInBucket(bucket_idx, key, fp)) |slot| {
self.fingerprints[bucket_idx][slot] = TOMBSTONE;
+ self.tier_slot_counts[0] -= 1;
self.count -= 1;
return true;
}
}
return false;
}For elements in higher tiers, consider adding overflow tier removal or documenting this as a known limitation.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| pub fn remove(self: *Self, key: []const u8) bool { | |
| const h = hash(key); | |
| const fp = fingerprint(h); | |
| const mask = self.tier0_bucket_mask; | |
| const bucket_base = h >> self.tier0_bucket_shift; | |
| var probe: usize = 0; | |
| while (probe < MAX_PROBES) : (probe += 1) { | |
| const bucket_idx = (bucket_base +% @as(u64, probe)) & mask; | |
| if (self.findKeyInBucket(bucket_idx, key, fp)) |slot| { | |
| self.fingerprints[bucket_idx][slot] = TOMBSTONE; | |
| self.count -= 1; | |
| return true; | |
| } | |
| } | |
| return false; | |
| } | |
| pub fn remove(self: *Self, key: []const u8) bool { | |
| const h = hash(key); | |
| const fp = fingerprint(h); | |
| const mask = self.tier0_bucket_mask; | |
| const bucket_base = h >> self.tier0_bucket_shift; | |
| var probe: usize = 0; | |
| while (probe < MAX_PROBES) : (probe += 1) { | |
| const bucket_idx = (bucket_base +% `@as`(u64, probe)) & mask; | |
| if (self.findKeyInBucket(bucket_idx, key, fp)) |slot| { | |
| self.fingerprints[bucket_idx][slot] = TOMBSTONE; | |
| self.tier_slot_counts[0] -= 1; | |
| self.count -= 1; | |
| return true; | |
| } | |
| } | |
| return false; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/string_hybrid.zig` around lines 314 - 330, remove() only probes tier 0
and doesn't update tier_slot_counts, so items placed in higher tiers by
insertAnyTier() cannot be removed and tier_slot_counts[0] stays stale; update
remove(self: *Self, key: []const u8) to probe across all tiers for the bucket
(use the same bucket_idx calculation but for each probe check each tier’s slot
via findKeyInBucket or a new helper), and when you tombstone an entry set
fingerprints[bucket_idx][slot] = TOMBSTONE, decrement self.count and also
decrement self.tier_slot_counts[tier_index] (use the tier you found the key in);
ensure the code updates the correct tier_slot_counts element and keeps existing
TOMBSTONE, count, and fingerprint semantics so getEmptyFraction() and insertion
logic remain correct (or if you intentionally won't support removal from
overflow tiers, document that limitation in insertAnyTier/remove).
| - g++ (GCC) 15.2.1 for abseil | ||
| - Zig 0.16.0-dev (LLVM 19+ backend) for elastic hash | ||
| - clang++ not available for cross-check | ||
|
|
||
| Both use -O3 equivalent. The different compiler backends (GCC vs LLVM) could generate different code quality. This is a limitation we can't easily resolve without installing clang. | ||
|
|
||
| **Update: clang 22.1.1 now tested.** GCC and clang produce nearly identical abseil lookup performance at 10-90% load (within 3%). At 99% load, GCC is actually 10% faster than clang. The compiler difference is NOT giving us an unfair advantage -- GCC-compiled abseil is the harder target. |
There was a problem hiding this comment.
Remove the stale “clang++ not available” note.
This section first says clang was unavailable and then immediately reports clang 22.1.1 results. Leaving both statements in the final document makes the verification narrative self-contradictory.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@verify-results.md` around lines 60 - 66, The document contains a
contradictory note: the line stating "clang++ not available for cross-check" is
now stale because the later "Update: clang 22.1.1 now tested." reports clang
results; remove the earlier "clang++ not available for cross-check" sentence (or
reword it into a historical note) so that the section only reflects the current
clang testing status and keeps the abseil vs clang/GCC comparison consistent.
| static uint64_t median_val(uint64_t* arr, int n) { | ||
| std::sort(arr, arr + n); | ||
| return arr[n / 2]; |
There was a problem hiding this comment.
Return a true median for even-sized samples.
With MEASURED == 10, arr[n / 2] picks the upper middle sample, not the median. That biases every reported “median of 10” in Tests 2-4 and can skew cross-implementation comparisons.
Proposed fix
static uint64_t median_val(uint64_t* arr, int n) {
std::sort(arr, arr + n);
- return arr[n / 2];
+ if ((n & 1) != 0) return arr[n / 2];
+ const uint64_t lo = arr[(n / 2) - 1];
+ const uint64_t hi = arr[n / 2];
+ return lo + (hi - lo) / 2;
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@verify-strings.cpp` around lines 41 - 43, median_val currently returns
arr[n/2], which picks the upper middle for even n and biases medians; after
sorting the uint64_t* arr in median_val(sort(arr, arr+n)), return the true
median: if n is odd return arr[n/2], otherwise compute the average of the two
middle values (arr[n/2 - 1] and arr[n/2]) and return that average as uint64_t to
handle even-sized samples correctly.
| absl::Hash<std::string_view> hasher; | ||
| uint64_t best = UINT64_MAX; | ||
| for (int r = 0; r < 10; r++) { | ||
| auto start = std::chrono::steady_clock::now(); | ||
| uint64_t acc = 0; | ||
| for (size_t i = 0; i < N; i++) { | ||
| acc += hasher(std::string_view(keys[i & 1023], 16)); | ||
| } | ||
| auto end = std::chrono::steady_clock::now(); | ||
| do_not_optimize(acc); | ||
| uint64_t us = std::chrono::duration_cast<std::chrono::microseconds>(end - start).count(); | ||
| if (us < best) best = us; | ||
| } | ||
| printf(" absl::Hash<string_view>(16B): %lu us (%.1f ns/hash)\n\n", best, best * 1000.0 / N); |
There was a problem hiding this comment.
Use the same summary statistic for Test 1.
The hash-cost microbenchmark reports the minimum runtime, while the rest of this file reports medians after warmup. That makes Test 1 systematically more optimistic and not directly comparable to the lookup numbers you're using elsewhere.
Proposed fix
absl::Hash<std::string_view> hasher;
- uint64_t best = UINT64_MAX;
- for (int r = 0; r < 10; r++) {
+ uint64_t times[MEASURED];
+ for (int r = 0; r < TOTAL_RUNS; r++) {
auto start = std::chrono::steady_clock::now();
uint64_t acc = 0;
for (size_t i = 0; i < N; i++) {
acc += hasher(std::string_view(keys[i & 1023], 16));
}
auto end = std::chrono::steady_clock::now();
do_not_optimize(acc);
uint64_t us = std::chrono::duration_cast<std::chrono::microseconds>(end - start).count();
- if (us < best) best = us;
+ if (r >= WARMUP) times[r - WARMUP] = us;
}
- printf(" absl::Hash<string_view>(16B): %lu us (%.1f ns/hash)\n\n", best, best * 1000.0 / N);
+ const uint64_t median = median_val(times, MEASURED);
+ printf(" absl::Hash<string_view>(16B): %lu us (%.1f ns/hash)\n\n", median, median * 1000.0 / N);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| absl::Hash<std::string_view> hasher; | |
| uint64_t best = UINT64_MAX; | |
| for (int r = 0; r < 10; r++) { | |
| auto start = std::chrono::steady_clock::now(); | |
| uint64_t acc = 0; | |
| for (size_t i = 0; i < N; i++) { | |
| acc += hasher(std::string_view(keys[i & 1023], 16)); | |
| } | |
| auto end = std::chrono::steady_clock::now(); | |
| do_not_optimize(acc); | |
| uint64_t us = std::chrono::duration_cast<std::chrono::microseconds>(end - start).count(); | |
| if (us < best) best = us; | |
| } | |
| printf(" absl::Hash<string_view>(16B): %lu us (%.1f ns/hash)\n\n", best, best * 1000.0 / N); | |
| absl::Hash<std::string_view> hasher; | |
| uint64_t times[MEASURED]; | |
| for (int r = 0; r < TOTAL_RUNS; r++) { | |
| auto start = std::chrono::steady_clock::now(); | |
| uint64_t acc = 0; | |
| for (size_t i = 0; i < N; i++) { | |
| acc += hasher(std::string_view(keys[i & 1023], 16)); | |
| } | |
| auto end = std::chrono::steady_clock::now(); | |
| do_not_optimize(acc); | |
| uint64_t us = std::chrono::duration_cast<std::chrono::microseconds>(end - start).count(); | |
| if (r >= WARMUP) times[r - WARMUP] = us; | |
| } | |
| const uint64_t median = median_val(times, MEASURED); | |
| printf(" absl::Hash<string_view>(16B): %lu us (%.1f ns/hash)\n\n", median, median * 1000.0 / N); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@verify-strings.cpp` around lines 57 - 70, The microbenchmark currently uses
the minimum runtime (variable best) across 10 runs for absl::Hash<string_view>,
producing an optimistic outlier; change it to compute the median runtime after
the same warmup behavior used elsewhere: record each run's duration (from
start/end around the inner loop that uses hasher(std::string_view(keys[i &
1023], 16))), skip the warmup runs as done in the other tests, sort the
collected durations and pick the median value to print (use that median in the
printf and nanoseconds/hash calculation instead of best). Ensure you reference
and reuse N, keys, hasher, and the 10-run loop so the change is localized.
…for Go Rust+ahash is now the closest competitor (1.46x at 50%, 1.09x at 99%). Go swiss.Map improved from 6.27x to 2.28x with pre-allocated strings. Elastic hash still fastest at every load factor against every competitor.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
bench-go/main.go (1)
121-122:⚠️ Potential issue | 🟡 MinorRESULT schema still differs from parser-expected format.
Line 121 and Line 198 still emit
impl=...and placemiss_usbeforedelete_us. This matches the previously reported parser-compatibility issue.Suggested format normalization
-fmt.Printf("RESULT\timpl=go-swiss\tn=%d\tload=%d\tinsert_us=%d\tlookup_us=%d\tmiss_us=%d\tdelete_us=%d\n", - n, loadPct, median(ins), median(lkp), median(mis), median(del)) +fmt.Printf("RESULT\tn=%d\tload=%d\tinsert_us=%d\tlookup_us=%d\tdelete_us=%d\tmiss_us=%d\n", + n, loadPct, median(ins), median(lkp), median(del), median(mis)) -fmt.Printf("RESULT\timpl=go-builtin\tn=%d\tload=%d\tinsert_us=%d\tlookup_us=%d\tmiss_us=%d\tdelete_us=%d\n", - n, loadPct, median(ins), median(lkp), median(mis), median(del)) +fmt.Printf("RESULT\tn=%d\tload=%d\tinsert_us=%d\tlookup_us=%d\tdelete_us=%d\tmiss_us=%d\n", + n, loadPct, median(ins), median(lkp), median(del), median(mis))Also applies to: 198-199
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bench-go/main.go` around lines 121 - 122, The RESULT output lines currently print "impl=go-swiss" and place miss_us before delete_us, which breaks the parser; update both fmt.Printf calls that emit "RESULT\timpl=go-swiss..." (the two print sites in main that format RESULT) to match the parser-expected schema by removing the "impl=..." token and reordering fields so delete_us appears before miss_us (i.e., RESULT\t n=...\tload=...\tinsert_us=...\tlookup_us=...\tdelete_us=...\tmiss_us=...), keeping the same numeric values and tab separators.
🧹 Nitpick comments (1)
cross-lang-results.md (1)
67-72: Add concrete environment metadata for reproducibility.Consider adding CPU model, core pinning policy, OS/kernel, and exact compiler/runtime versions used for each language run. This will make the fairness caveats and rankings much easier to reproduce and defend.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cross-lang-results.md` around lines 67 - 72, Update the "## Remaining fairness caveats" section to include concrete environment metadata for each language benchmark run: list CPU model/frequency, core-pin/affinity policy used, OS and kernel version, exact compiler/toolchain version and flags (e.g., rustc/cargo, gcc/clang, Zig/C++ toolchains), and runtime versions (Go, JVM, etc.); add a short per-language table or bullet block under that header that records these fields for every measured run so readers can reproduce the results and defend the comparisons.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@bench-go/main.go`:
- Around line 37-42: The median function (median, using arr and local s)
currently returns s[len(s)/2] which biases the result for even-length slices;
modify median to handle even counts by returning the average of the two middle
values: for n := len(s), if n==0 return 0 (or appropriate zero value), if n%2==1
return s[n/2], else return (s[n/2-1]+s[n/2])/2 using uint64 arithmetic to
preserve the function signature and avoid bias for even-sized samples.
---
Duplicate comments:
In `@bench-go/main.go`:
- Around line 121-122: The RESULT output lines currently print "impl=go-swiss"
and place miss_us before delete_us, which breaks the parser; update both
fmt.Printf calls that emit "RESULT\timpl=go-swiss..." (the two print sites in
main that format RESULT) to match the parser-expected schema by removing the
"impl=..." token and reordering fields so delete_us appears before miss_us
(i.e., RESULT\t
n=...\tload=...\tinsert_us=...\tlookup_us=...\tdelete_us=...\tmiss_us=...),
keeping the same numeric values and tab separators.
---
Nitpick comments:
In `@cross-lang-results.md`:
- Around line 67-72: Update the "## Remaining fairness caveats" section to
include concrete environment metadata for each language benchmark run: list CPU
model/frequency, core-pin/affinity policy used, OS and kernel version, exact
compiler/toolchain version and flags (e.g., rustc/cargo, gcc/clang, Zig/C++
toolchains), and runtime versions (Go, JVM, etc.); add a short per-language
table or bullet block under that header that records these fields for every
measured run so readers can reproduce the results and defend the comparisons.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d4b4bce8-2f62-461c-9d99-eb9bd886daf7
⛔ Files ignored due to path filters (1)
bench-rust/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (113)
bench-go/bench-gobench-go/main.gobench-rust/Cargo.tomlbench-rust/src/main.rsbench-rust/target/release/.fingerprint/ahash-28d25007c6697d4d/run-build-script-build-script-buildbench-rust/target/release/.fingerprint/ahash-28d25007c6697d4d/run-build-script-build-script-build.jsonbench-rust/target/release/.fingerprint/ahash-7cdcab918ea16e7f/dep-lib-ahashbench-rust/target/release/.fingerprint/ahash-7cdcab918ea16e7f/invoked.timestampbench-rust/target/release/.fingerprint/ahash-7cdcab918ea16e7f/lib-ahashbench-rust/target/release/.fingerprint/ahash-7cdcab918ea16e7f/lib-ahash.jsonbench-rust/target/release/.fingerprint/ahash-7d171089a62bd126/build-script-build-script-buildbench-rust/target/release/.fingerprint/ahash-7d171089a62bd126/build-script-build-script-build.jsonbench-rust/target/release/.fingerprint/ahash-7d171089a62bd126/dep-build-script-build-script-buildbench-rust/target/release/.fingerprint/ahash-7d171089a62bd126/invoked.timestampbench-rust/target/release/.fingerprint/bench-hashbrown-415e0f0b67b371d2/bin-bench-hashbrownbench-rust/target/release/.fingerprint/bench-hashbrown-415e0f0b67b371d2/bin-bench-hashbrown.jsonbench-rust/target/release/.fingerprint/bench-hashbrown-415e0f0b67b371d2/dep-bin-bench-hashbrownbench-rust/target/release/.fingerprint/bench-hashbrown-415e0f0b67b371d2/invoked.timestampbench-rust/target/release/.fingerprint/cfg-if-d985813f4553fd9f/dep-lib-cfg_ifbench-rust/target/release/.fingerprint/cfg-if-d985813f4553fd9f/invoked.timestampbench-rust/target/release/.fingerprint/cfg-if-d985813f4553fd9f/lib-cfg_ifbench-rust/target/release/.fingerprint/cfg-if-d985813f4553fd9f/lib-cfg_if.jsonbench-rust/target/release/.fingerprint/getrandom-6df4dbbcffb370b7/build-script-build-script-buildbench-rust/target/release/.fingerprint/getrandom-6df4dbbcffb370b7/build-script-build-script-build.jsonbench-rust/target/release/.fingerprint/getrandom-6df4dbbcffb370b7/dep-build-script-build-script-buildbench-rust/target/release/.fingerprint/getrandom-6df4dbbcffb370b7/invoked.timestampbench-rust/target/release/.fingerprint/getrandom-7c74517ba22e6184/run-build-script-build-script-buildbench-rust/target/release/.fingerprint/getrandom-7c74517ba22e6184/run-build-script-build-script-build.jsonbench-rust/target/release/.fingerprint/getrandom-9596904795c59be3/dep-lib-getrandombench-rust/target/release/.fingerprint/getrandom-9596904795c59be3/invoked.timestampbench-rust/target/release/.fingerprint/getrandom-9596904795c59be3/lib-getrandombench-rust/target/release/.fingerprint/getrandom-9596904795c59be3/lib-getrandom.jsonbench-rust/target/release/.fingerprint/libc-1bc3c604e29ae713/run-build-script-build-script-buildbench-rust/target/release/.fingerprint/libc-1bc3c604e29ae713/run-build-script-build-script-build.jsonbench-rust/target/release/.fingerprint/libc-67bdb2dd2ef38a9b/dep-lib-libcbench-rust/target/release/.fingerprint/libc-67bdb2dd2ef38a9b/invoked.timestampbench-rust/target/release/.fingerprint/libc-67bdb2dd2ef38a9b/lib-libcbench-rust/target/release/.fingerprint/libc-67bdb2dd2ef38a9b/lib-libc.jsonbench-rust/target/release/.fingerprint/libc-6b81b46a947c48f0/build-script-build-script-buildbench-rust/target/release/.fingerprint/libc-6b81b46a947c48f0/build-script-build-script-build.jsonbench-rust/target/release/.fingerprint/libc-6b81b46a947c48f0/dep-build-script-build-script-buildbench-rust/target/release/.fingerprint/libc-6b81b46a947c48f0/invoked.timestampbench-rust/target/release/.fingerprint/once_cell-7d17331ba62697a3/dep-lib-once_cellbench-rust/target/release/.fingerprint/once_cell-7d17331ba62697a3/invoked.timestampbench-rust/target/release/.fingerprint/once_cell-7d17331ba62697a3/lib-once_cellbench-rust/target/release/.fingerprint/once_cell-7d17331ba62697a3/lib-once_cell.jsonbench-rust/target/release/.fingerprint/version_check-88e8b2470aacf655/dep-lib-version_checkbench-rust/target/release/.fingerprint/version_check-88e8b2470aacf655/invoked.timestampbench-rust/target/release/.fingerprint/version_check-88e8b2470aacf655/lib-version_checkbench-rust/target/release/.fingerprint/version_check-88e8b2470aacf655/lib-version_check.jsonbench-rust/target/release/.fingerprint/zerocopy-2c2a0b3df16c71b3/run-build-script-build-script-buildbench-rust/target/release/.fingerprint/zerocopy-2c2a0b3df16c71b3/run-build-script-build-script-build.jsonbench-rust/target/release/.fingerprint/zerocopy-6dd092b58363eab7/dep-lib-zerocopybench-rust/target/release/.fingerprint/zerocopy-6dd092b58363eab7/invoked.timestampbench-rust/target/release/.fingerprint/zerocopy-6dd092b58363eab7/lib-zerocopybench-rust/target/release/.fingerprint/zerocopy-6dd092b58363eab7/lib-zerocopy.jsonbench-rust/target/release/.fingerprint/zerocopy-9010127d9528e97d/build-script-build-script-buildbench-rust/target/release/.fingerprint/zerocopy-9010127d9528e97d/build-script-build-script-build.jsonbench-rust/target/release/.fingerprint/zerocopy-9010127d9528e97d/dep-build-script-build-script-buildbench-rust/target/release/.fingerprint/zerocopy-9010127d9528e97d/invoked.timestampbench-rust/target/release/bench-hashbrownbench-rust/target/release/build/ahash-28d25007c6697d4d/invoked.timestampbench-rust/target/release/build/ahash-28d25007c6697d4d/outputbench-rust/target/release/build/ahash-28d25007c6697d4d/root-outputbench-rust/target/release/build/ahash-28d25007c6697d4d/stderrbench-rust/target/release/build/ahash-7d171089a62bd126/build-script-buildbench-rust/target/release/build/ahash-7d171089a62bd126/build_script_build-7d171089a62bd126bench-rust/target/release/build/ahash-7d171089a62bd126/build_script_build-7d171089a62bd126.dbench-rust/target/release/build/getrandom-6df4dbbcffb370b7/build-script-buildbench-rust/target/release/build/getrandom-6df4dbbcffb370b7/build_script_build-6df4dbbcffb370b7bench-rust/target/release/build/getrandom-6df4dbbcffb370b7/build_script_build-6df4dbbcffb370b7.dbench-rust/target/release/build/getrandom-7c74517ba22e6184/invoked.timestampbench-rust/target/release/build/getrandom-7c74517ba22e6184/outputbench-rust/target/release/build/getrandom-7c74517ba22e6184/root-outputbench-rust/target/release/build/getrandom-7c74517ba22e6184/stderrbench-rust/target/release/build/libc-1bc3c604e29ae713/invoked.timestampbench-rust/target/release/build/libc-1bc3c604e29ae713/outputbench-rust/target/release/build/libc-1bc3c604e29ae713/root-outputbench-rust/target/release/build/libc-1bc3c604e29ae713/stderrbench-rust/target/release/build/libc-6b81b46a947c48f0/build-script-buildbench-rust/target/release/build/libc-6b81b46a947c48f0/build_script_build-6b81b46a947c48f0bench-rust/target/release/build/libc-6b81b46a947c48f0/build_script_build-6b81b46a947c48f0.dbench-rust/target/release/build/zerocopy-2c2a0b3df16c71b3/invoked.timestampbench-rust/target/release/build/zerocopy-2c2a0b3df16c71b3/outputbench-rust/target/release/build/zerocopy-2c2a0b3df16c71b3/root-outputbench-rust/target/release/build/zerocopy-2c2a0b3df16c71b3/stderrbench-rust/target/release/build/zerocopy-9010127d9528e97d/build-script-buildbench-rust/target/release/build/zerocopy-9010127d9528e97d/build_script_build-9010127d9528e97dbench-rust/target/release/build/zerocopy-9010127d9528e97d/build_script_build-9010127d9528e97d.dbench-rust/target/release/deps/ahash-7cdcab918ea16e7f.dbench-rust/target/release/deps/bench_hashbrown-415e0f0b67b371d2bench-rust/target/release/deps/bench_hashbrown-415e0f0b67b371d2.dbench-rust/target/release/deps/cfg_if-d985813f4553fd9f.dbench-rust/target/release/deps/getrandom-9596904795c59be3.dbench-rust/target/release/deps/libahash-7cdcab918ea16e7f.rlibbench-rust/target/release/deps/libahash-7cdcab918ea16e7f.rmetabench-rust/target/release/deps/libc-67bdb2dd2ef38a9b.dbench-rust/target/release/deps/libcfg_if-d985813f4553fd9f.rlibbench-rust/target/release/deps/libcfg_if-d985813f4553fd9f.rmetabench-rust/target/release/deps/libgetrandom-9596904795c59be3.rlibbench-rust/target/release/deps/libgetrandom-9596904795c59be3.rmetabench-rust/target/release/deps/liblibc-67bdb2dd2ef38a9b.rlibbench-rust/target/release/deps/liblibc-67bdb2dd2ef38a9b.rmetabench-rust/target/release/deps/libonce_cell-7d17331ba62697a3.rlibbench-rust/target/release/deps/libonce_cell-7d17331ba62697a3.rmetabench-rust/target/release/deps/libversion_check-88e8b2470aacf655.rlibbench-rust/target/release/deps/libversion_check-88e8b2470aacf655.rmetabench-rust/target/release/deps/libzerocopy-6dd092b58363eab7.rlibbench-rust/target/release/deps/libzerocopy-6dd092b58363eab7.rmetabench-rust/target/release/deps/once_cell-7d17331ba62697a3.dbench-rust/target/release/deps/version_check-88e8b2470aacf655.dbench-rust/target/release/deps/zerocopy-6dd092b58363eab7.dcross-lang-results.md
✅ Files skipped from review due to trivial changes (67)
- bench-rust/target/release/.fingerprint/ahash-28d25007c6697d4d/run-build-script-build-script-build
- bench-rust/target/release/.fingerprint/ahash-7d171089a62bd126/invoked.timestamp
- bench-rust/target/release/.fingerprint/getrandom-9596904795c59be3/invoked.timestamp
- bench-rust/target/release/.fingerprint/libc-67bdb2dd2ef38a9b/lib-libc
- bench-rust/target/release/.fingerprint/zerocopy-6dd092b58363eab7/lib-zerocopy
- bench-rust/target/release/.fingerprint/zerocopy-9010127d9528e97d/build-script-build-script-build
- bench-rust/target/release/.fingerprint/ahash-7cdcab918ea16e7f/lib-ahash
- bench-rust/target/release/build/libc-1bc3c604e29ae713/invoked.timestamp
- bench-rust/target/release/.fingerprint/libc-6b81b46a947c48f0/build-script-build-script-build
- bench-rust/target/release/.fingerprint/version_check-88e8b2470aacf655/invoked.timestamp
- bench-rust/target/release/.fingerprint/bench-hashbrown-415e0f0b67b371d2/bin-bench-hashbrown
- bench-rust/target/release/.fingerprint/ahash-7d171089a62bd126/build-script-build-script-build
- bench-rust/target/release/.fingerprint/getrandom-7c74517ba22e6184/run-build-script-build-script-build
- bench-rust/target/release/.fingerprint/cfg-if-d985813f4553fd9f/invoked.timestamp
- bench-rust/target/release/.fingerprint/once_cell-7d17331ba62697a3/invoked.timestamp
- bench-rust/target/release/.fingerprint/getrandom-7c74517ba22e6184/run-build-script-build-script-build.json
- bench-rust/target/release/.fingerprint/zerocopy-2c2a0b3df16c71b3/run-build-script-build-script-build
- bench-rust/target/release/build/getrandom-7c74517ba22e6184/invoked.timestamp
- bench-rust/target/release/.fingerprint/zerocopy-9010127d9528e97d/invoked.timestamp
- bench-rust/target/release/build/getrandom-7c74517ba22e6184/root-output
- bench-rust/target/release/.fingerprint/ahash-7d171089a62bd126/build-script-build-script-build.json
- bench-rust/target/release/.fingerprint/libc-6b81b46a947c48f0/invoked.timestamp
- bench-rust/target/release/.fingerprint/libc-67bdb2dd2ef38a9b/invoked.timestamp
- bench-rust/target/release/.fingerprint/getrandom-9596904795c59be3/lib-getrandom.json
- bench-rust/target/release/.fingerprint/ahash-28d25007c6697d4d/run-build-script-build-script-build.json
- bench-rust/target/release/.fingerprint/libc-1bc3c604e29ae713/run-build-script-build-script-build
- bench-rust/target/release/.fingerprint/zerocopy-2c2a0b3df16c71b3/run-build-script-build-script-build.json
- bench-rust/target/release/.fingerprint/cfg-if-d985813f4553fd9f/lib-cfg_if.json
- bench-rust/target/release/.fingerprint/getrandom-6df4dbbcffb370b7/build-script-build-script-build.json
- bench-rust/target/release/.fingerprint/ahash-7cdcab918ea16e7f/invoked.timestamp
- bench-rust/target/release/.fingerprint/ahash-7cdcab918ea16e7f/lib-ahash.json
- bench-rust/target/release/.fingerprint/bench-hashbrown-415e0f0b67b371d2/invoked.timestamp
- bench-rust/target/release/deps/bench_hashbrown-415e0f0b67b371d2.d
- bench-rust/target/release/.fingerprint/zerocopy-9010127d9528e97d/build-script-build-script-build.json
- bench-rust/target/release/.fingerprint/version_check-88e8b2470aacf655/lib-version_check
- bench-rust/target/release/build/libc-1bc3c604e29ae713/root-output
- bench-rust/target/release/deps/cfg_if-d985813f4553fd9f.d
- bench-rust/target/release/build/zerocopy-9010127d9528e97d/build_script_build-9010127d9528e97d.d
- bench-rust/target/release/build/ahash-7d171089a62bd126/build_script_build-7d171089a62bd126.d
- bench-rust/target/release/.fingerprint/libc-1bc3c604e29ae713/run-build-script-build-script-build.json
- bench-rust/target/release/.fingerprint/version_check-88e8b2470aacf655/lib-version_check.json
- bench-rust/target/release/.fingerprint/libc-6b81b46a947c48f0/build-script-build-script-build.json
- bench-rust/target/release/build/ahash-28d25007c6697d4d/output
- bench-rust/target/release/build/ahash-28d25007c6697d4d/invoked.timestamp
- bench-rust/target/release/deps/version_check-88e8b2470aacf655.d
- bench-rust/target/release/deps/once_cell-7d17331ba62697a3.d
- bench-rust/target/release/build/zerocopy-2c2a0b3df16c71b3/root-output
- bench-rust/target/release/.fingerprint/libc-67bdb2dd2ef38a9b/lib-libc.json
- bench-rust/target/release/build/libc-6b81b46a947c48f0/build_script_build-6b81b46a947c48f0.d
- bench-rust/target/release/deps/ahash-7cdcab918ea16e7f.d
- bench-rust/target/release/.fingerprint/once_cell-7d17331ba62697a3/lib-once_cell
- bench-rust/target/release/deps/libc-67bdb2dd2ef38a9b.d
- bench-rust/target/release/deps/getrandom-9596904795c59be3.d
- bench-rust/target/release/build/zerocopy-2c2a0b3df16c71b3/invoked.timestamp
- bench-rust/target/release/.fingerprint/getrandom-9596904795c59be3/lib-getrandom
- bench-rust/target/release/.fingerprint/cfg-if-d985813f4553fd9f/lib-cfg_if
- bench-rust/target/release/build/libc-1bc3c604e29ae713/output
- bench-rust/target/release/.fingerprint/getrandom-6df4dbbcffb370b7/invoked.timestamp
- bench-rust/target/release/.fingerprint/zerocopy-6dd092b58363eab7/invoked.timestamp
- bench-rust/target/release/build/ahash-28d25007c6697d4d/root-output
- bench-rust/target/release/.fingerprint/zerocopy-6dd092b58363eab7/lib-zerocopy.json
- bench-rust/target/release/.fingerprint/once_cell-7d17331ba62697a3/lib-once_cell.json
- bench-rust/target/release/build/zerocopy-2c2a0b3df16c71b3/output
- bench-rust/target/release/build/getrandom-6df4dbbcffb370b7/build_script_build-6df4dbbcffb370b7.d
- bench-rust/target/release/.fingerprint/getrandom-6df4dbbcffb370b7/build-script-build-script-build
- bench-rust/target/release/build/getrandom-7c74517ba22e6184/output
- bench-rust/target/release/.fingerprint/bench-hashbrown-415e0f0b67b371d2/bin-bench-hashbrown.json
🚧 Files skipped from review as they are similar to previous changes (2)
- bench-rust/Cargo.toml
- bench-rust/src/main.rs
| func median(arr []uint64) uint64 { | ||
| s := make([]uint64, len(arr)) | ||
| copy(s, arr) | ||
| sort.Slice(s, func(i, j int) bool { return s[i] < s[j] }) | ||
| return s[len(s)/2] | ||
| } |
There was a problem hiding this comment.
Median computation is biased for even-sized samples.
On Line 41, median returns s[len(s)/2]. With 10 measured runs, this picks the upper middle value (6th), not the true median-of-10.
Suggested fix
func median(arr []uint64) uint64 {
s := make([]uint64, len(arr))
copy(s, arr)
sort.Slice(s, func(i, j int) bool { return s[i] < s[j] })
- return s[len(s)/2]
+ mid := len(s) / 2
+ if len(s)%2 == 0 {
+ return (s[mid-1] + s[mid]) / 2
+ }
+ return s[mid]
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func median(arr []uint64) uint64 { | |
| s := make([]uint64, len(arr)) | |
| copy(s, arr) | |
| sort.Slice(s, func(i, j int) bool { return s[i] < s[j] }) | |
| return s[len(s)/2] | |
| } | |
| func median(arr []uint64) uint64 { | |
| s := make([]uint64, len(arr)) | |
| copy(s, arr) | |
| sort.Slice(s, func(i, j int) bool { return s[i] < s[j] }) | |
| mid := len(s) / 2 | |
| if len(s)%2 == 0 { | |
| return (s[mid-1] + s[mid]) / 2 | |
| } | |
| return s[mid] | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@bench-go/main.go` around lines 37 - 42, The median function (median, using
arr and local s) currently returns s[len(s)/2] which biases the result for
even-length slices; modify median to handle even counts by returning the average
of the two middle values: for n := len(s), if n==0 return 0 (or appropriate zero
value), if n%2==1 return s[n/2], else return (s[n/2-1]+s[n/2])/2 using uint64
arithmetic to preserve the function signature and avoid bias for even-sized
samples.
Disproves cache-density hypothesis. On M4's 16MB L2 where both fingerprint arrays fit, elastic hash is *faster* relative to abseil than on x86. The win is cache lines touched per probe, not L2 vs L3 spill. Fixed linux-only timers in zig benchmarks for macOS/ARM.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (4)
BENCHMARK-M4.md (2)
81-82: Unconventional stderr redirection pattern.Line 81 redirects stderr to the results log (
2> results-m4-elastic.log), which suggests the benchmark output goes to stderr rather than stdout. This is unconventional since stderr is typically reserved for errors and warnings, not results.Consider updating the Zig benchmark to output results to stdout, then use standard output redirection:
zig build autobench-strings -Doptimize=ReleaseFast > results-m4-elastic.log 2>/dev/nullThis pattern (used for Abseil at line 77 and others) is more conventional and makes it clearer where results vs errors are captured.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@BENCHMARK-M4.md` around lines 81 - 82, The benchmark invocation currently redirects stderr into the results file using the command string "zig build autobench-strings -Doptimize=ReleaseFast 2> results-m4-elastic.log"; change the benchmark to emit results to stdout and update the invocation to redirect stdout into results (and optionally silence stderr) so results-m4-elastic.log contains only benchmark output; update the documented command in BENCHMARK-M4.md (the "zig build autobench-strings -Doptimize=ReleaseFast" line) to use stdout redirection and suppress or separate stderr as appropriate.
37-61: Consider providing automated build scripts.The manual build commands are detailed but complex, especially the Abseil compilation with pkg-config fallback (lines 40-50). This complexity makes the setup error-prone and difficult to maintain.
💡 Consider creating a build script
Create a
build-benchmarks.shscript that:
- Detects the platform (macOS/Linux)
- Tries pkg-config and falls back to explicit paths automatically
- Builds all benchmark executables in sequence
- Reports any build failures clearly
This would reduce user friction and make the setup more maintainable.
Example structure:
#!/bin/bash set -e echo "Building Abseil benchmark..." if pkg-config --exists absl_hash; then g++ -O3 -march=native ... $(pkg-config ...) else echo "pkg-config failed, trying explicit paths..." g++ -O3 -march=native ... -I/opt/homebrew/include ... fi echo "Building Zig benchmarks..." zig build autobench-strings -Doptimize=ReleaseFast # ... etc for Rust and Go🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@BENCHMARK-M4.md` around lines 37 - 61, The benchmark README currently lists complex manual build commands (the Abseil g++ command with pkg-config fallback and separate Zig/Rust/Go steps) which is error-prone; create a build-benchmarks.sh script that detects OS (macOS vs Linux), attempts pkg-config for Abseil and falls back to explicit include/lib paths, runs the Abseil g++ build (the long g++ line shown), then runs "zig build autobench-strings -Doptimize=ReleaseFast", "cd bench-rust && cargo build --release", and "cd bench-go && go build -o bench-go .", failing fast with clear error messages and exit codes so users can build all benchmarks automatically and the README can reference this script.cross-lang-results.md (1)
110-115: Address the logical tension between emphasizing cross-language comparison and the "only apples-to-apples" caveat.The caveat in lines 114-115 states that "the only truly apples-to-apples comparison is elastic hash (Zig) vs abseil (C++)" - but this somewhat contradicts the document's primary focus on cross-language comparisons. Consider either:
- Reframing to emphasize the Zig vs C++ comparison as the most rigorous result, with cross-language data providing supplementary context
- Softening the caveat to note that while cross-language comparisons include runtime/compiler differences, they still offer valuable insights
Additionally, the document would benefit from discussing statistical significance or measurement variance. While "median of 10 runs" is mentioned, there's no analysis of variance or whether observed differences exceed noise levels.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cross-lang-results.md` around lines 110 - 115, Update the "Remaining fairness caveats" section to resolve the tension: explicitly state that the elastic hash (Zig) vs abseil (C++) comparison is the most rigorous "apples-to-apples" result while framing other cross-language results as supplementary context that reflect runtime/compiler differences; replace the absolute phrasing "the only truly apples-to-apples" with softer wording that highlights rigor (e.g., "the most controlled apples-to-apples") and call out that cross-language comparisons are informative but not strictly equivalent. Also add a short paragraph explaining the measurement uncertainty for "median of 10 runs": report standard deviation (or 95% confidence intervals) and whether observed differences exceed measurement noise, and suggest including a simple statistical indicator (stddev, CI, or boxplot) alongside medians to show variance. Ensure changes reference the existing section title "Remaining fairness caveats", the specific comparison "elastic hash (Zig) vs abseil (C++)", and the current practice "median of 10 runs".src/autobench-strings.zig (1)
41-44: Minor: Returns upper-median for even-sized array.With
MEASURED = 10, this returns element at index 5 rather than averaging indices 4 and 5. For benchmark reporting this is acceptable (consistent "upper median"), but worth noting for precision.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/autobench-strings.zig` around lines 41 - 44, The current median function (median, using sort.insertion and MEASURED) returns the upper-median for even-sized arrays; to fix, update median to handle even MEASURED by sorting as-is then computing the average of the two middle elements (use indices MEASURED/2 - 1 and MEASURED/2) and return their mean as u64 (handle potential overflow appropriately, e.g., (low / 2) + (high / 2) or use checked arithmetic); alternatively if you prefer keeping upper-median behavior, add a clear comment in the median function documenting that it intentionally returns the upper median for even MEASURED.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@BENCHMARK-M4.md`:
- Around line 103-107: The current manual swap of src/autobench.zig to run the
shuffled verification benchmark is fragile and lacks error handling; add a
proper build target in build.zig that compiles and runs
src/autobench-strings-verify.zig (e.g., create an executable named
"autobench-verify" and a run step that the "autobench-verify" step depends on)
and replace the swap-and-build shell sequence with a single invocation of that
new target (zig build autobench-verify), or if you must keep the swap approach
wrap the swap/restore in a safe cleanup handler (trap/EXIT) to always restore
src/autobench.zig and remove the .bak on exit/failure so autobench and
autobench-strings-verify.zig are not left out of sync.
In `@cross-lang-results.md`:
- Around line 9-34: Update the "Apple Silicon M4 results" section to clarify L2
cache details: state which M4 variant was tested (or note that the exact variant
is unspecified), change the phrase "~16MB shared L2" to explicitly say "≈16MB
P‑core cluster L2" (or "≈16MB per P‑core cluster L2"), and add a brief
parenthetical that different M4 variants have different P‑core L2 sizes (e.g.,
M4 Max ≈32MB) so readers understand the scope of the measurement; keep the
performance conclusion about cache lines/tiered metadata unchanged.
---
Nitpick comments:
In `@BENCHMARK-M4.md`:
- Around line 81-82: The benchmark invocation currently redirects stderr into
the results file using the command string "zig build autobench-strings
-Doptimize=ReleaseFast 2> results-m4-elastic.log"; change the benchmark to emit
results to stdout and update the invocation to redirect stdout into results (and
optionally silence stderr) so results-m4-elastic.log contains only benchmark
output; update the documented command in BENCHMARK-M4.md (the "zig build
autobench-strings -Doptimize=ReleaseFast" line) to use stdout redirection and
suppress or separate stderr as appropriate.
- Around line 37-61: The benchmark README currently lists complex manual build
commands (the Abseil g++ command with pkg-config fallback and separate
Zig/Rust/Go steps) which is error-prone; create a build-benchmarks.sh script
that detects OS (macOS vs Linux), attempts pkg-config for Abseil and falls back
to explicit include/lib paths, runs the Abseil g++ build (the long g++ line
shown), then runs "zig build autobench-strings -Doptimize=ReleaseFast", "cd
bench-rust && cargo build --release", and "cd bench-go && go build -o bench-go
.", failing fast with clear error messages and exit codes so users can build all
benchmarks automatically and the README can reference this script.
In `@cross-lang-results.md`:
- Around line 110-115: Update the "Remaining fairness caveats" section to
resolve the tension: explicitly state that the elastic hash (Zig) vs abseil
(C++) comparison is the most rigorous "apples-to-apples" result while framing
other cross-language results as supplementary context that reflect
runtime/compiler differences; replace the absolute phrasing "the only truly
apples-to-apples" with softer wording that highlights rigor (e.g., "the most
controlled apples-to-apples") and call out that cross-language comparisons are
informative but not strictly equivalent. Also add a short paragraph explaining
the measurement uncertainty for "median of 10 runs": report standard deviation
(or 95% confidence intervals) and whether observed differences exceed
measurement noise, and suggest including a simple statistical indicator (stddev,
CI, or boxplot) alongside medians to show variance. Ensure changes reference the
existing section title "Remaining fairness caveats", the specific comparison
"elastic hash (Zig) vs abseil (C++)", and the current practice "median of 10
runs".
In `@src/autobench-strings.zig`:
- Around line 41-44: The current median function (median, using sort.insertion
and MEASURED) returns the upper-median for even-sized arrays; to fix, update
median to handle even MEASURED by sorting as-is then computing the average of
the two middle elements (use indices MEASURED/2 - 1 and MEASURED/2) and return
their mean as u64 (handle potential overflow appropriately, e.g., (low / 2) +
(high / 2) or use checked arithmetic); alternatively if you prefer keeping
upper-median behavior, add a clear comment in the median function documenting
that it intentionally returns the upper median for even MEASURED.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 6ad92223-b64f-4c85-90c4-af44175198bd
⛔ Files ignored due to path filters (4)
results-m4-abseil.logis excluded by!**/*.logresults-m4-elastic.logis excluded by!**/*.logresults-m4-go.logis excluded by!**/*.logresults-m4-rust.logis excluded by!**/*.log
📒 Files selected for processing (5)
BENCHMARK-M4.mdREADME.mdcross-lang-results.mdsrc/autobench-strings-verify.zigsrc/autobench-strings.zig
✅ Files skipped from review due to trivial changes (1)
- README.md
🚧 Files skipped from review as they are similar to previous changes (1)
- src/autobench-strings-verify.zig
| ## Apple Silicon M4 results | ||
|
|
||
| Tested on M4 (~16MB shared L2). The hypothesis was that elastic hash's advantage was cache-density-specific: our 1MB fingerprint array fits in x86's ~512KB L2 while abseil's 2MB control bytes spill to L3. On M4, both should fit in L2, so the advantage should shrink or disappear. | ||
|
|
||
| **It didn't.** It got bigger. | ||
|
|
||
| ### Shuffled hit lookup (us, lower is better) | ||
|
|
||
| | Load | Elastic (Zig) | Abseil (C++) | Ratio | | ||
| |------|--------------|-------------|-------| | ||
| | 10% | **719** | 2,861 | **3.98x** | | ||
| | 25% | **2,276** | 10,169 | **4.47x** | | ||
| | 50% | **8,863** | 22,984 | **2.59x** | | ||
| | 75% | **15,972** | 33,624 | **2.11x** | | ||
| | 90% | **22,118** | 41,671 | **1.88x** | | ||
| | 99% | **25,748** | 46,543 | **1.81x** | | ||
|
|
||
| ### x86 vs M4 comparison (shuffled hit lookup at 50% load) | ||
|
|
||
| | Platform | Elastic | Abseil | Gap | | ||
| |----------|---------|--------|-----| | ||
| | x86 (~512KB L2) | 11,119 | 19,312 | **1.74x** | | ||
| | M4 (~16MB L2) | 8,863 | 22,984 | **2.59x** | | ||
|
|
||
| The advantage grew from 1.74x to 2.59x. The tiered metadata advantage is not about L2 vs L3 spill -- it's about cache lines touched per probe. Separated, dense fingerprint arrays mean fewer cache line fetches under random access, regardless of which cache level they live in. | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if documentation mentions specific CPU models
rg -i "cpu|processor|intel|amd|apple m[0-9]" --type=md -C 2Repository: joshuaisaact/elastic-hash
Length of output: 1433
🌐 Web query:
Apple M4 chip L2 cache size specifications official
💡 Result:
Apple does not officially publish detailed L2 cache sizes for the M4 family chips in their specifications or press releases. Official Apple sources like the M4 announcement, iPad Pro specs, MacBook Pro tech specs, and MacBook Air specs mention CPU core counts, GPU, Neural Engine, memory bandwidth, and unified memory but no L2 cache details. Third-party sources based on analysis provide the following: For base M4 (e.g., 9-10 core CPU: 3-4 P-cores + 4-6 E-cores): - P-core L2 cache: 16 MB (shared per P-core cluster) - E-core L2 cache: 4 MB (shared per E-core cluster) For M4 Pro (12-14 core CPU: 8-10 P-cores + 4 E-cores): - Similar per-cluster L2: P-cores likely 16 MB per cluster (two P-clusters for 10 P-cores), E-cores 4 MB. For M4 Max (14-16 core CPU: 10-12 P-cores + 4 E-cores): - P-core L2: 32 MB per P-core (per third-party analysis, possibly shared differently), E-core L2: 4 MB. Sources cite Wikipedia, teardowns, and benchmarks. P-cores are clustered (e.g., base M4 has one 4 P-core cluster sharing 16 MB L2), E-cores share 4 MB L2. No official confirmation; Apple silicon cache details are typically reverse-engineered.
Citations:
- 1: https://lowendmac.com/1234/apple-silicon-m4-chip-specs/
- 2: https://www.apple.com/newsroom/2024/05/apple-introduces-m4-chip/
- 3: https://support.apple.com/en-us/121553
- 4: https://support.apple.com/en-gb/122209
- 5: https://en.wikipedia.org/wiki/Apple_M4
- 6: https://eclecticlight.co/2024/11/11/inside-m4-chips-p-cores/
- 7: https://everymac.com/systems/apple/macbook_pro/specs/macbook-pro-m4-max-14-core-cpu-32-core-gpu-16-2024-specs.html
- 8: https://lowendmac.com/2025/apple-silicon-m4-chip-specs/
- 9: https://lowendmac.com/1234/apple-silicon-m4-max-chip-specs/
🏁 Script executed:
# Check more context around the M4 and x86 cache mentions
rg -B5 -A5 "~16MB|M4.*L2|x86.*512KB" --type=md cross-lang-results.mdRepository: joshuaisaact/elastic-hash
Length of output: 1610
Clarify M4 cache hierarchy details; x86 specification is adequate.
The document already specifies "AMD/Intel ~512KB L2" for x86, so generalization is not a concern. However, the M4 claim needs clarification:
- M4 P-core L2 is indeed ~16MB, but this is per P-core cluster, not a single shared pool across all cores
- Different M4 variants have different L2 configurations (M4 Pro similar, M4 Max has 32MB P-core L2)
- The phrasing "~16MB shared L2" is imprecise without context
Consider noting which specific M4 variant was tested, or clarifying that the 16MB refers to the P-core cluster L2 cache. The performance conclusion about cache lines touched per probe remains sound regardless.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cross-lang-results.md` around lines 9 - 34, Update the "Apple Silicon M4
results" section to clarify L2 cache details: state which M4 variant was tested
(or note that the exact variant is unspecified), change the phrase "~16MB shared
L2" to explicitly say "≈16MB P‑core cluster L2" (or "≈16MB per P‑core cluster
L2"), and add a brief parenthetical that different M4 variants have different
P‑core L2 sizes (e.g., M4 Max ≈32MB) so readers understand the scope of the
measurement; keep the performance conclusion about cache lines/tiered metadata
unchanged.
…mpetitors Size sweep at 50% load with shuffled access on Apple M4. Elastic hash beats abseil (1.3-2.4x), Rust hashbrown+ahash (1.2-3x), and Go swiss.Map (1.8-3.5x) at every tested size. The x86 finding that small tables favored abseil does not hold on M4. Added proper build step for shuffled verification benchmark.
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
cross-lang-results.md (1)
11-11:⚠️ Potential issue | 🟡 MinorClarify cache-size claims; current sentence is self-contradictory and imprecise.
The phrase “1MB fingerprint array fits in x86’s ~512KB L2” is mathematically incorrect, and “M4 (~16MB shared L2)” should be scoped as P-core cluster L2 (variant-dependent). Please revise this sentence to avoid misleading the cache argument.
Proposed wording fix
-Tested on M4 (~16MB shared L2). The hypothesis was that elastic hash's advantage was cache-density-specific: our 1MB fingerprint array fits in x86's ~512KB L2 while abseil's 2MB control bytes spill to L3. On M4, both should fit in L2, so the advantage should shrink or disappear. +Tested on Apple M4 (≈16MB P-core cluster L2 on base M4; cache layout varies by M4 variant). The hypothesis was that elastic hash's advantage was cache-density-specific: elastic’s metadata footprint is smaller than Abseil’s, so on x86 (~512KB L2) Abseil should spill earlier to lower cache levels. On M4, both metadata regions should fit more comfortably in upper caches, so the advantage should shrink or disappear.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cross-lang-results.md` at line 11, The sentence claiming “1MB fingerprint array fits in x86’s ~512KB L2” is incorrect and the M4 L2 description is ambiguous; update the sentence that discusses cache sizes (the line referencing “1MB fingerprint array” and “M4 (~16MB shared L2)”) to correctly state that a 1MB fingerprint array exceeds a typical per-core x86 L2 (~512KB) and instead note that on Apple M4 the P‑core cluster’s shared L2 (variant-dependent, ~16MB) can accommodate both our 1MB fingerprint array and abseil’s 2MB control bytes, so the L2-fitting advantage should shrink or disappear—also scope “M4” explicitly to the P‑core cluster L2.
🧹 Nitpick comments (2)
README.md (1)
73-76: Use architecture-neutral SIMD wording here.“SSE2 vector comparison” is x86-specific, but this README also presents Apple M4 measurements. Consider documenting this as generic SIMD (SSE2 on x86 / NEON on ARM) to keep the architecture section accurate.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@README.md` around lines 73 - 76, Update the "SIMD bucketed probing" wording to be architecture-neutral: replace the phrase "SSE2 vector comparison" with something like "SIMD vector comparison (e.g., SSE2 on x86, NEON on ARM)" in the README section titled "SIMD bucketed probing" so the description covers both x86 and ARM implementations while leaving the rest of the bullets (16-element buckets, 8-bit fingerprints) unchanged.bench-rust/src/main.rs (1)
112-115: Output format inconsistency with ahash benchmark.The siphash benchmark outputs only
lookup_uswhile the ahash version outputs all four metrics. If the comparison scripts expect uniform output, this will cause parsing issues. Consider either:
- Matching the ahash output format with all metrics
- Clearly documenting that siphash results are reference-only and excluded from automated parsing
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bench-rust/src/main.rs` around lines 112 - 115, The current println in the siphash benchmark only emits lookup_us (println!("RESULT\timpl=rust-siphash\tn={}\tload={}\tlookup_us={}", n, load_pct, median(&mut lkp))), causing output format mismatch with the ahash benchmark; update this print to emit the same four metrics the ahash runner prints (e.g., n, load_pct and the three timing metrics including lookup_us plus the other timing fields used by ahash) so automated parsers can consume siphash results identically, ensuring you read the same variables used in the benchmark (n, load_pct, median(&mut lkp) and the other timing/stat vars) and format them in the same RESULT\timpl=... schema.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@bench-rust/src/main.rs`:
- Around line 84-88: The RESULT line printed by the println! macro currently
emits an extra "impl=rust-ahash" field and places miss_us before delete_us,
which breaks bench-v2.sh parsing; update the println! invocation (the formatting
string in main where println!(... ) is called) to match the expected order and
field names: emit "RESULT\t n={}\t load={}\t insert_us={}\t lookup_us={}\t
delete_us={}\t miss_us={}" (remove the impl= field and swap delete_us and
miss_us positions) while keeping the same variables (n, load_pct, median(&mut
ins), median(&mut lkp), median(&mut del), median(&mut mis)); if you still need
to identify the implementation, write that name to stderr separately instead of
in the RESULT line.
In `@bench-strings-verify.cpp`:
- Around line 73-74: The printf uses %lu for lkp_ordered and lkp_shuffled which
are uint64_t values; make the format portable by including <inttypes.h> and
using the PRIu64 macro (e.g., use "%" PRIu64) for those fields while keeping n
and load_pct as-is—update the printf that prints ABSEIL, n, load_pct,
median_val(lkp_ordered, MEASURED), median_val(lkp_shuffled, MEASURED) to use
PRIu64 for the two uint64_t results and add the inttypes.h include.
In `@README.md`:
- Around line 111-115: Update the fenced code block containing the commands "zig
build test", "zig build bench", and "bash bench-v2.sh" to include the language
identifier "bash" after the opening backticks (i.e., change ``` to ```bash) so
the block conforms to Markdown lint rule MD040 and improves syntax highlighting.
---
Duplicate comments:
In `@cross-lang-results.md`:
- Line 11: The sentence claiming “1MB fingerprint array fits in x86’s ~512KB L2”
is incorrect and the M4 L2 description is ambiguous; update the sentence that
discusses cache sizes (the line referencing “1MB fingerprint array” and “M4
(~16MB shared L2)”) to correctly state that a 1MB fingerprint array exceeds a
typical per-core x86 L2 (~512KB) and instead note that on Apple M4 the P‑core
cluster’s shared L2 (variant-dependent, ~16MB) can accommodate both our 1MB
fingerprint array and abseil’s 2MB control bytes, so the L2-fitting advantage
should shrink or disappear—also scope “M4” explicitly to the P‑core cluster L2.
---
Nitpick comments:
In `@bench-rust/src/main.rs`:
- Around line 112-115: The current println in the siphash benchmark only emits
lookup_us (println!("RESULT\timpl=rust-siphash\tn={}\tload={}\tlookup_us={}", n,
load_pct, median(&mut lkp))), causing output format mismatch with the ahash
benchmark; update this print to emit the same four metrics the ahash runner
prints (e.g., n, load_pct and the three timing metrics including lookup_us plus
the other timing fields used by ahash) so automated parsers can consume siphash
results identically, ensuring you read the same variables used in the benchmark
(n, load_pct, median(&mut lkp) and the other timing/stat vars) and format them
in the same RESULT\timpl=... schema.
In `@README.md`:
- Around line 73-76: Update the "SIMD bucketed probing" wording to be
architecture-neutral: replace the phrase "SSE2 vector comparison" with something
like "SIMD vector comparison (e.g., SSE2 on x86, NEON on ARM)" in the README
section titled "SIMD bucketed probing" so the description covers both x86 and
ARM implementations while leaving the rest of the bullets (16-element buckets,
8-bit fingerprints) unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2b7783f9-dcc6-44a1-9a40-ee024b6920e8
📒 Files selected for processing (7)
README.mdbench-go/main.gobench-rust/src/main.rsbench-strings-verify.cppbuild.zigcross-lang-results.mdsrc/autobench-strings-verify.zig
🚧 Files skipped from review as they are similar to previous changes (2)
- bench-go/main.go
- build.zig
| println!( | ||
| "RESULT\timpl=rust-ahash\tn={}\tload={}\tinsert_us={}\tlookup_us={}\tmiss_us={}\tdelete_us={}", | ||
| n, load_pct, | ||
| median(&mut ins), median(&mut lkp), median(&mut mis), median(&mut del) | ||
| ); |
There was a problem hiding this comment.
RESULT output format incompatible with bench-v2.sh parser.
The output format differs from what bench-v2.sh expects in two ways:
- Extra
impl=field: The parser expectsn=as the 2nd field, butimpl=rust-ahashshifts all positions. - Field order mismatch: This outputs
miss_usbeforedelete_us, but the parser and C++ benchmark usedelete_usbeforemiss_us.
This will cause the comparison scripts to extract incorrect values or fail entirely.
Proposed fix to match the expected format
println!(
- "RESULT\timpl=rust-ahash\tn={}\tload={}\tinsert_us={}\tlookup_us={}\tmiss_us={}\tdelete_us={}",
+ "RESULT\tn={}\tload={}\tinsert_us={}\tlookup_us={}\tdelete_us={}\tmiss_us={}",
n, load_pct,
- median(&mut ins), median(&mut lkp), median(&mut mis), median(&mut del)
+ median(&mut ins), median(&mut lkp), median(&mut del), median(&mut mis)
);If you need to distinguish implementations in the output, consider updating bench-v2.sh to handle an optional impl= field, or output the implementation name to stderr separately.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@bench-rust/src/main.rs` around lines 84 - 88, The RESULT line printed by the
println! macro currently emits an extra "impl=rust-ahash" field and places
miss_us before delete_us, which breaks bench-v2.sh parsing; update the println!
invocation (the formatting string in main where println!(... ) is called) to
match the expected order and field names: emit "RESULT\t n={}\t load={}\t
insert_us={}\t lookup_us={}\t delete_us={}\t miss_us={}" (remove the impl= field
and swap delete_us and miss_us positions) while keeping the same variables (n,
load_pct, median(&mut ins), median(&mut lkp), median(&mut del), median(&mut
mis)); if you still need to identify the implementation, write that name to
stderr separately instead of in the RESULT line.
| printf("ABSEIL\tn=%zu\tload=%d\tordered=%lu\tshuffled=%lu\n", | ||
| n, load_pct, median_val(lkp_ordered, MEASURED), median_val(lkp_shuffled, MEASURED)); |
There was a problem hiding this comment.
Use PRIu64 for portable uint64_t formatting.
%lu assumes uint64_t is unsigned long, which isn't true on all platforms (e.g., 32-bit systems or some Windows configurations where uint64_t is unsigned long long).
🔧 Suggested fix
Add the include:
`#include` <cstdio>
+#include <cinttypes>Then update the printf:
- printf("ABSEIL\tn=%zu\tload=%d\tordered=%lu\tshuffled=%lu\n",
- n, load_pct, median_val(lkp_ordered, MEASURED), median_val(lkp_shuffled, MEASURED));
+ printf("ABSEIL\tn=%zu\tload=%d\tordered=%" PRIu64 "\tshuffled=%" PRIu64 "\n",
+ n, load_pct, median_val(lkp_ordered, MEASURED), median_val(lkp_shuffled, MEASURED));📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| printf("ABSEIL\tn=%zu\tload=%d\tordered=%lu\tshuffled=%lu\n", | |
| n, load_pct, median_val(lkp_ordered, MEASURED), median_val(lkp_shuffled, MEASURED)); | |
| printf("ABSEIL\tn=%zu\tload=%d\tordered=%" PRIu64 "\tshuffled=%" PRIu64 "\n", | |
| n, load_pct, median_val(lkp_ordered, MEASURED), median_val(lkp_shuffled, MEASURED)); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@bench-strings-verify.cpp` around lines 73 - 74, The printf uses %lu for
lkp_ordered and lkp_shuffled which are uint64_t values; make the format portable
by including <inttypes.h> and using the PRIu64 macro (e.g., use "%" PRIu64) for
those fields while keeping n and load_pct as-is—update the printf that prints
ABSEIL, n, load_pct, median_val(lkp_ordered, MEASURED), median_val(lkp_shuffled,
MEASURED) to use PRIu64 for the two uint64_t results and add the inttypes.h
include.
| ``` | ||
| zig build bench | ||
| zig build test # run tests | ||
| zig build bench # full benchmark | ||
| bash bench-v2.sh # comparison vs abseil (requires abseil-cpp) | ||
| ``` |
There was a problem hiding this comment.
Add a language identifier to the fenced code block.
This trips markdown lint (MD040). Use bash for this command block.
Proposed fix
-```
+```bash
zig build test # run tests
zig build bench # full benchmark
bash bench-v2.sh # comparison vs abseil (requires abseil-cpp)</details>
<details>
<summary>🧰 Tools</summary>
<details>
<summary>🪛 markdownlint-cli2 (0.21.0)</summary>
[warning] 111-111: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
</details>
</details>
<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against the current code and only fix it if needed.
In @README.md around lines 111 - 115, Update the fenced code block containing
the commands "zig build test", "zig build bench", and "bash bench-v2.sh" to
include the language identifier "bash" after the opening backticks (i.e., change
tobash) so the block conforms to Markdown lint rule MD040 and improves
syntax highlighting.
</details>
<!-- fingerprinting:phantom:triton:hawk -->
<!-- This is an auto-generated comment by CodeRabbit -->
Why
We verified elastic hash beats abseil by 36-97% on string key lookups on x86. But two questions remained: does the advantage hold against Rust's hashbrown and Go's swiss.Map when the comparison is fair? And is the advantage just a cache-density artifact of x86's small L2, or something deeper?
What
Results
The cache-density hypothesis was wrong
On M4, the advantage grew. At 50% load, shuffled hit lookup:
The win comes from cache lines touched per probe, not L2 vs L3 spill. Separated dense fingerprint arrays mean fewer cache line fetches under random access regardless of cache hierarchy.
Cross-language at 1M, 50% load, shuffled (M4)
Size sweep at 50% load, shuffled (M4)
Elastic hash wins at every table size from 16K to 4M. The x86 finding that small tables favored abseil does not hold on M4.
Where abseil still wins
References
cross-lang-results.mdBENCHMARK-M4.mdSummary by CodeRabbit
New Features
Documentation
Refactor
Tests
Chores