add cow_bytes: O(dirty_nodes) pairwise tree diff by dapplion · Pull Request #100 · sigp/milhouse

dapplion · 2026-04-06T05:33:06Z

Summary

Add cow_bytes(base, derived) for measuring the COW memory cost of a derived tree
relative to its base. Walks both trees in parallel — when Arc pointers match, the
subtree is shared and skipped immediately. Only divergent nodes are visited and counted.

This is intended for Lighthouse's state cache byte budget: measuring how many bytes
each cached state owns beyond the shared finalized base, cheaply enough to run on
every slot transition.

Benchmarks (1M u64 entries, capacity 2^40)

Dirty leaves	cow_bytes	MemoryTracker
0 (clone)	1.1 ns	450 ms
1	180 ns	450 ms
128	5.8 µs	450 ms
10,000	511 µs	450 ms
1,000,000 (all)	4.6 ms	450 ms

API

List::cow_bytes(&self, base: &Self) -> usize
List::total_tree_bytes(&self) -> usize
Vector::cow_bytes(&self, base: &Self) -> usize
Vector::total_tree_bytes(&self) -> usize
mem::cow_tree_bytes(base, derived) — raw Arc<Tree<T>> version
mem::total_tree_bytes(tree) — full size, no sharing baseline

Test plan

cow_bytes_identical_is_zero — clone has 0 cost
cow_bytes_single_mutation — single dirty leaf, cost << total
cow_bytes_all_dirty — all leaves dirty, cost ≈ total
cow_bytes_matches_tracker_differential — agrees with MemoryTracker within List struct overhead
cow_bytes_vector — works on Vector too
cow_bytes_chain — A→B→C: C vs A >= C vs B
total_tree_bytes_nonzero — sanity check
All 278 existing tests pass

…rement Add `cow_bytes(base, derived)` — computes the bytes owned by `derived` that are not shared with `base` by walking both trees in parallel. When Arc pointers match, the subtree is shared and skipped immediately (O(1)). When they differ, the derived node is counted and children are recursed. This is dramatically faster than MemoryTracker for measuring COW cost: - 1M entries, 1 dirty leaf: 180ns vs 450ms (2,500,000x faster) - 1M entries, 128 dirty: 5.8µs vs 450ms (76,000x faster) - 1M entries, all dirty: 4.6ms vs 450ms (100x faster) The key difference: no HashMap allocation (MemoryTracker spends 26% of time on HashMap insert/rehash), and shared subtrees are never visited (MemoryTracker must walk the entire base to build its seen-set). API: - `List::cow_bytes(&self, base: &Self) -> usize` - `List::total_tree_bytes(&self) -> usize` - `Vector::cow_bytes(&self, base: &Self) -> usize` - `Vector::total_tree_bytes(&self) -> usize` - `mem::cow_tree_bytes(base, derived)` — operates on raw `Arc<Tree<T>>` - `mem::total_tree_bytes(tree)` — total bytes, no sharing baseline

codecov-commenter · 2026-04-06T05:44:37Z

Codecov Report

❌ Patch coverage is 65.75342% with 25 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.02%. Comparing base (8963bc5) to head (084972e).

Files with missing lines	Patch %	Lines
src/mem.rs	60.31%	25 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #100      +/-   ##
==========================================
+ Coverage   70.15%   71.02%   +0.86%     
==========================================
  Files          22       22              
  Lines        1280     1346      +66     
==========================================
+ Hits          898      956      +58     
- Misses        382      390       +8

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…iple states Walks multiple derived trees against a shared base, deduplicating COW'd nodes across states using a HashSet. Nodes shared with base are skipped via Arc::ptr_eq (no base walk). Nodes already counted from another derived state are skipped via HashSet lookup. This is O(total_unique_dirty_nodes) — each unique COW'd node is visited exactly once across all states. The HashSet only contains dirty nodes, keeping it small. Benchmarks at 1M entries: | Scenario | Cache | Time | Dedup | |-----------------------------|-------|---------|-------| | Slot transitions (indep) | 128 | 638 µs | 1.0x | | Slot transitions (chained) | 128 | 738 µs | 27.9x | | Effective balance updates | 128 | 37 ms | 1.0x | | Epoch transitions | 8 | 232 ms | 1.0x | | Mixed (4 epoch + 124 chain) | 128 | 109 ms | - | API: - `mem::total_unique_cow_tree_bytes(base, &[derived_roots])` — raw trees - `List::tree_root()` — access inner Arc<Tree> for use with the above

Add MiniState with 4 fields (validators 128B×1M, balances u64×1M, inactivity u64×1M, participation u8×1M) matching the dominant BeaconState fields. Results at 1M validators: - 128 chained slot states: 1.9ms, 4 MB unique - 128 independent slot states: 1.8ms, 4 MB unique - 4 epoch + 124 slot chain: 225ms, 360 MB unique - 4 epoch + 4 eff_bal + 120: 229ms, 362 MB unique Epoch boundary states dominate — each has ~89 MB of fully-rewritten trees. Slot-only caches are sub-2ms.

…ion advance

michaelsproul · 2026-05-06T05:56:37Z

I would like to merge this, it looks good. Not sure why the memory size test is failing.

dapplion mentioned this pull request Apr 6, 2026

State cache: spec-derived byte-size estimation and budget-based eviction dapplion/lighthouse#71

Open

add test verifying cow_bytes includes leaf data, not just tree nodes

a52a4a6

dapplion force-pushed the cow-bytes branch from c79ed43 to a52a4a6 Compare April 6, 2026 16:24

dapplion and others added 4 commits April 6, 2026 22:53

add test: total_unique_cow_tree_bytes works correctly after finalizat…

89e905a

…ion advance

Fix CI maybe? Idk, it doesn't repro locally

084972e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add cow_bytes: O(dirty_nodes) pairwise tree diff#100

add cow_bytes: O(dirty_nodes) pairwise tree diff#100
dapplion wants to merge 6 commits into
sigp:mainfrom
dapplion:cow-bytes

dapplion commented Apr 6, 2026

Uh oh!

codecov-commenter commented Apr 6, 2026 •

edited

Loading

Uh oh!

michaelsproul commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

dapplion commented Apr 6, 2026

Summary

Benchmarks (1M u64 entries, capacity 2^40)

API

Test plan

Uh oh!

codecov-commenter commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

michaelsproul commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-commenter commented Apr 6, 2026 •

edited

Loading