Skip to content

Releases: ai-2070/net

💜 Net v0.27.6 — "Purple Rain"

19 Jun 02:03

Choose a tag to compare

Bindings & integration hardening — a full-workspace bug hunt at the FFI edge

v0.27.6 is the substantive counterpart to the v0.27.5 version-stamp: a full-workspace bug hunt across the net crate (~100k LOC Rust) plus the Go / Python / FFI binding layers, recorded in docs/misc/BUG_AUDIT_2026_06_18_BINDINGS.md. 34 of 37 findings are fixed and committed across 47 commits and three automated review rounds on bugfix/audit-2026-06-18.

The headline finding: every concrete bug in the first pass lived at the language-binding / FFI edge — three use-after-free races in the shipped Go module (github.com/ai-2070/net/go), reachable by ordinary context cancellation. Two deeper passes then narrowed the "core is clean" framing: a missing FFI panic-guard in two binding crates, one core data-path HIGH (a reliable-stream sequence gap under backpressure), and a tier of behavior / meshdb / RedEX correctness fixes. The Rust core, identity/security, RedEX recovery, and the core ffi/* memory-safety pass all came back clean and verified — most classic hazards already had named, tested mitigations.

No wire-format change, no C-ABI change, no public-API change. The Go fixes are source-level inside the binding module (same method signatures, now race-safe). Honest v0.27.5 / earlier peers interoperate freely.


Three use-after-free races in the shipped Go module

Three Go handle types — RpcStream (HIGH 1), MeshOsDaemonHandle (HIGH 2), and MeshBlobAdapter (HIGH 7) — guarded their native C handle with a check-then-use pattern (a bare atomic.Bool, or a mutex dropped before the cgo call) rather than a claim-then-use lock held across it. A concurrent free — the ctx-cancel watcher goroutine, an explicit Free()/Close(), or the GC finalizer once the handle becomes unreachable mid-call — could drop(Box::from_raw(...)) the native object while a Recv/Send/Store/NextControl was parked inside block_on. The result is a dereference of freed Rust memory: memory corruption / crash.

This is reachable on the documented happy pathCallStreaming(ctx, …) then a Recv() loop with ctx cancelled mid-recv — not an exotic double-close. MeshBlobAdapter was the sharpest case: its own struct doc claimed it already serialized _free against in-flight ops, while the code dropped the lock before every cgo call.

The fix gives each handle a refcount quiesce guard (streamHandleGuard): ops bracket the cgo call with enter() / leave() without holding a lock, the free runs once and only after the last op leaves, and the free path never blocks. The design evolved across review rounds, and the evolution is worth recording:

  • The first fix used an RWMutex held across the blocking cgo call — correct against the UAF, but it could wedge Close/Finish/Split on a deadline-less stream. Replaced with the non-blocking quiesce guard.
  • A review round caught that Split()'s post-split halves (DuplexSink/DuplexStream) were left on the original bare-atomic.Bool pattern — the exact race #1 closed. Both now carry their own guard.
  • A separate GC use-after-free surfaced in meshos.go PublishLog: it took unsafe.Pointer(&msgBytes[0]) in an inner block that closed before the cgo call without a runtime.KeepAlive, so the GC could reclaim the backing array mid-call. Hoisted + KeepAlive, matching the sibling PublishCapabilities.

Verification caveat (honest): this build environment has no cgo C toolchain (CGO_ENABLED=0, no gcc), so the Go fixes are verified by gofmt + manual review + a pure-Go unit test for the quiesce guard (runs in CI), not a cgo compile/link. The patterns mirror already-compiling sibling handles.


FFI panic guards for rpc-ffi and compute-ffi

The rpc-ffi and compute-ffi binding crates had no catch_unwind at any entry point and called tokio's raw Runtime::block_on (HIGH 8). block_on panics ("Cannot start a runtime from within a runtime…") when invoked from a thread already inside a tokio runtime, and any internal panic does the same — the unwind then crosses the extern "C" boundary into Go/cgo, which is undefined behavior. This narrowed the first pass's "panic-across-FFI catches all sound" note: true for the core ffi/* crates, but not these two.

Every extern "C" body is now wrapped in ffi_guard! / catch_unwind, and block_on routes through the abort-on-reentry wrapper the sibling FFI crates already use. Two review-round corrections went with it:

  • The first pass defined the ffi_guard! macro in compute-ffi but never invoked it (P1) — so every one of the 80 entry points still unwound across the ABI. Now wrapped everywhere.
  • net_compute_runtime_daemon_count's caught-panic default was 0 — itself a valid count — so a panic read as "0 daemons" success rather than the -1 error sentinel the function uses. Default changed to the negative sentinel.

Companion structural hardening from the same family: Go rpc-ffi out-params (write_response/find_service_nodes) gained null checks (#30); the len > isize::MAX guard before slice::from_raw_parts was extended across the *-ffi crates (#31), with three copy-paste siblings the first sweep missed picked up in review.


A reliable-stream sequence gap under backpressure

The one core data-path HIGH (#19, verified end to end). send_on_stream allocates a sequence number atomically with the byte credit, then builds, delivers, commits, and only afterwards registers the retransmit descriptor. For a scheduled stream, deliver_stream_packet has a second backpressure source — a full FairScheduler queue — surfaced as Backpressure after the seq was consumed. On that early return, TxSlotGuard::drop refunds the credit bytes (correct) but never rolls back tx_seq, and register_retransmit never runs — yet the packet was never put on the wire.

Impact on a reliable stream: a permanent, unrecoverable gap at the skipped seq. The receiver records the next packet out-of-order, never advances past the hole, and NACKs it forever; the sender's on_nack(seq) finds no descriptor and can't retransmit → eventual failed flag and a spurious StreamReset. Compounding it, any partial flush that did commit earlier in the same call is re-sent under new seqs on retry → duplicate delivery. This is the documented backpressure path under bulk load, not a rare edge.

Fixed by making the sequence refundable / rolled back on the failure path and not replaying already-committed events when send_with_retry re-enters. A review follow-up also bounded an unbounded committed-prefix retry: once the first packet of a multi-batch send commits, flush_stream_batch can't surface Backpressure (replay would duplicate), so it retried internally with no bound — a stalled receiver that never granted credit spun the sender forever. Now bounded by COMMITTED_FLUSH_STALL_BUDGET (30 s, the session-dead horizon): past it the peer is treated as dead and a terminal StreamError::Transport (which the caller does not replay) is surfaced. Paused-time unit tests pin both.


The correctness tail — MEDIUM and LOW across behavior, meshdb, RedEX, and the FFI edge

The deeper passes turned up a tier of logic bugs away from the data path. Representative fixes:

  • meshdb executor (#20, #32). LEFT/RIGHT OUTER join silently dropped preserved-side rows whose join key was missing/non-scalar (they never entered the build table, so the unmatched-emit loop never saw them) — violating OUTER semantics. sort_merge_join had the same no-key drop. Both now emit no-key preserved rows unmatched, matching hash_join_full_outer.
  • load balancer (#14, #15/#29, #33). A half-open circuit probe slot could be permanently claimed if the caller skipped record_completion (and a circuit_recovery_time_ms == 0 collapsed the breaker entirely — now clamped to ≥ 1 ms); add_endpoint re-add leaked / clobbered ~150 stale hash-ring vnodes (a destructive collision-probe insert overwrote another node's vnode); weighted-round-robin starved endpoints when all effective weights were < 1.0.
  • aggregator daemon (#10, #13). A zero summary_interval panicked the spawned task (tokio::time::interval(0)), despite a comment claiming validation; filter_novel deduped on fold_kind only, re-publishing multi-row summaries every tick.
  • meshos reconcile / ICE (#9, #24, #28). Duplicate RequestEviction for one chain per tick (the count arm wrote the dedup set but never read it); MarkAvoid re-emitted every tick; ICE ThawCluster was blocked by the cluster-wide cooldown, violating the break-glass invariant.
  • deck streams (#3, #4, #25). deck-ffi reported genuine stream-end as a timeout for any non-zero timeout (unwrap_or_default() collapsed Err(Elapsed) and Ok(None) together) — livelocking the idiomatic Go polling loop; AuditStream/LogStream/FailureStream could park forever by not re-arming the waker after a consumed empty tick (now centralized in one helper); exported log timestamps printed an epoch hour-count instead of a 24h clock (missing % 24).
  • nRPC / routing (#26, #27, #34). A duplicate in-flight call_id overwrote the prior caller's response sender (guarded only by debug_assert); mint_random_call_id returned 0 on getrandom failure, so concurrent failing calls evicted each other; a route owner couldn't update its own route to a worse metric, pinning a stale route until TTL.
  • RedEX (#21, #35, #36, #5, #18, #22). Per-entry checksum covered the payload but not the header — review showed a corrupt payload_offset/len/flags is caught transitively (it reads the wrong region and fails the checksum), and only seq escapes, which is exactly why #21 added the seq-...
Read more

💜 Net v0.27.5 — "Purple Rain"

13 Jun 16:34

Choose a tag to compare

A version-stamp release — no Rust dependency changes

v0.27.5 is the smallest kind of release. The net/crates/net/Cargo.lock diff against v0.27.4 contains nothing but the workspace version stamp — every net-* crate (net-mesh, net-mesh-sdk, net-node, net-cli, net-python, net-aggregator-daemon, the FFI crates, …) steps 0.27.4 → 0.27.5. No third-party Rust dependency was added, removed, or upgraded. The Rust core, the C/Go/Node/Python FFI, the SDK surface, and the wire are byte-for-byte unchanged. Drop-in for everyone.


What actually moved (and why it's out of scope here)

The only dependency churn in this window was documentation/web, via package-lock.json: react-hook-form → 7.79.0 and a routine renovate lock-file-maintenance pass. Neither touches the Rust crate or any runtime path, so — per the Cargo.lock-only scope — there is no release-relevant change to record.


Breaking changes

None. No wire-format change, no API/ABI change, no config change. v0.27.5 interoperates with honest v0.27.4 / v0.27.3 / earlier peers freely.


How to upgrade

Bump the dependency to 0.27.5 — pure drop-in. No atomic peer roll, no config changes, nothing to rebuild beyond the version stamp.


Dependency updates

None in net/crates/net/Cargo.lock beyond the internal workspace version bump (0.27.4 → 0.27.5). Web/docs-side, in package-lock.json: react-hook-form → 7.79.0 and a lock-file-maintenance pass — out of scope for the Rust crate and without runtime or wire impact.

💜 Net v0.27.4 — "Purple Rain"

13 Jun 00:53

Choose a tag to compare

A maintenance release — dependency bumps only

After two substantive Purple Rain turns — v0.27.2 (security) and v0.27.3 (performance + the ring AEAD swap) — v0.27.4 is a quiet maintenance release. There are no source changes to the Rust core, no API/ABI changes, and no wire-format change. The substance lives entirely in net/crates/net/Cargo.lock: the Python-binding stack steps to pyo3 0.29, zeroize to 1.9, and the rest is routine transitive churn. Drop-in for everyone; honest v0.27.3 / v0.27.2 / v0.27.1 peers are unaffected.


The Python binding completes its pyo3 0.29 migration

The headline of the lock diff: the whole pyo3 stack steps from 0.28.3 → 0.29.0pyo3, pyo3-ffi, pyo3-macros, pyo3-macros-backend, and pyo3-async-runtimes (0.28.0 → 0.29.0). The only consumer is the net-python crate; the Rust core, the C/Go/Node FFI, and the wire are all untouched.

This finishes what v0.27.3 started. That release bumped only the build-time helper pyo3-build-config to 0.29.0, which left the lock carrying two copies side by side — 0.28.3 (for the still-0.28.3 pyo3) and 0.29.0 (for net-python's direct dependency). v0.27.4 brings the rest of the stack up to 0.29.0, which collapses the lock onto a single pyo3-build-config 0.29.0 and drops pyo3-macros-backend's now-unneeded build-config dependency.

For Python wheel builders: rebuild against pyo3 0.29. There is no change to the Python-facing API surface in this release — it's a transitive consequence of the lock bump, not a binding-API change on our side.


Secret-zeroing and build tooling

  • zeroize 1.8.2 → 1.9.0 (+ zeroize_derive 1.4.3 → 1.5.0) — the crate backing the secure-wipe discipline on identity keys, PSKs, and other secret material. A routine minor bump that keeps the secret-hygiene path current.
  • cc 1.2.63 → 1.2.64 — the C-compiler driver. Worth a line only because v0.27.3's ring swap put C compilation on the build path (the zig cc musl-cross and aarch64-windows-clang jobs called out in that release's notes); this keeps that toolchain current. Patch bump.

Routine transitive bumps

None of these reach the datapath; all are pulled transitively by the WASM/browser targets and low-level utilities.

  • WASM / browser toolchain: wasm-bindgen 0.2.123 → 0.2.125 (with -macro / -macro-support / -shared), js-sys 0.3.100 → 0.3.102, web-sys 0.3.100 → 0.3.102, wasip2 1.0.3+wasi-0.2.9 → 1.0.4+wasi-0.2.12.
  • memchr 2.8.1 → 2.8.2.

Breaking changes

None. No wire-format change, no API/ABI change, no config change. v0.27.4 interoperates with honest v0.27.3 / v0.27.2 / v0.27.1 peers freely.


How to upgrade

Bump the dependency to 0.27.4 — drop-in, no atomic peer roll, no config changes. The only consumers with anything to do are those building the Python wheels, who should rebuild against pyo3 0.29 (a transitive effect of the lock bump, not an API change).


Dependency updates

All in net/crates/net/Cargo.lock:

Crate From To Note
pyo3 (+ -ffi, -macros, -macros-backend) 0.28.3 0.29.0 Python binding (net-python)
pyo3-async-runtimes 0.28.0 0.29.0
pyo3-build-config 0.28.3 + 0.29.0 0.29.0 duplicate collapsed
zeroize / zeroize_derive 1.8.2 / 1.4.3 1.9.0 / 1.5.0 secret-zeroing
cc 1.2.63 1.2.64 C-compiler driver
wasm-bindgen (family) 0.2.123 0.2.125 WASM target
js-sys / web-sys 0.3.100 0.3.102 WASM target
wasip2 1.0.3 1.0.4 WASM target
memchr 2.8.1 2.8.2 transitive

Web/docs-side renovate bumps (eslint, tailwindcss, posthog, better-auth) also landed in the same window via package-lock.json; none carry runtime or wire impact and they are out of scope for the Rust crate.

💜 Net v0.27.3 — "Purple Rain"

12 Jun 03:34
2faa1f3

Choose a tag to compare

🟣 Packet-path AEAD swapped to ring

The June-9 flamegraph concluded "~5% AEAD, nothing to do." A raw-AEAD decomposition bench revised that: ~700 ns of the ~975 ns fixed per-message cost in the RustCrypto chacha20poly1305 stack was poly1305 0.8's AVX2 backend re-deriving the r¹..r⁴ key powers per message — paid on seal AND open, on every packet. ring's assembly AEAD has both a lower fixed cost and a higher bulk rate.

PacketCipher now backs onto ring::aead::LessSafeKey behind the same method surface (seal_in_place_separate_tag / open_in_place map 1:1 onto the detached/in-place API and the wire's ct||tag layout). The wire format is unchanged — both implement RFC 8439, ciphertexts are byte-identical. chacha20poly1305 stays a dependency for the IdentityEnvelope XChaCha sealed-box and as the cross-impl test oracle; the Noise handshake (cold path) is untouched.

Measured on i9-14900K, full PacketBuilder::build:

Size Before After Delta
64 B 1139 ns 222 ns −81%
256 B 1205 ns 288 ns −76%
1 KiB 1585 ns 544 ns −66%
4 KiB 3111 ns 1538 ns −51%

ring: ~115 ns fixed + 0.31 ns/B (3.2 GB/s) vs RustCrypto ~950 ns fixed + 0.47 ns/B (2.1 GB/s) — wins at every size, both directions. The decrypt leg gets the same fixed-cost removal, so per round-trip this shaves ~1.6 µs off every small packet — unary nRPC, grants, acks, heartbeats. These are real, above-noise wins, not the below-loopback-floor µs items v0.27.2 landed on faith.

It also retires one of v0.27.2's three parked levers. The "crypto SIMD" item (rebuild x86-64 with RUSTFLAGS="-C target-feature=+avx2", but a baked-in floor would SIGILL on pre-AVX2) is moot — ring dispatches to the best backend at runtime, so the AEAD fast path is on by default with no build-flag dance and no SIGILL risk.


Boxing the cipher key regression fixed

Honestly told: the swap above bloated PacketBuilder. ring sizes its UnboundKey to the largest AEAD variant (AES-256-GCM key schedule + GHASH tables) — 544 bytes — even though this path only ever holds a 32-byte ChaCha20-Poly1305 key. PacketCipher is embedded by value in every PacketBuilder, and builders move through the packet pool's ArrayQueue on every get()/release(). The inline key grew the builder 304 → 816 bytes, so every pool pop/push memcpy'd ~2.7× more data — regressing the pure pool path ~70%, where no crypto runs at all. Pure struct bloat, caught before release.

The fix is one word: cipher: Box<LessSafeKey>. The pool now moves an 8-byte pointer and PacketBuilder is 264 bytes — leaner than the pre-swap 304. The heap allocation is paid only on cipher construction (pool pre-fill / refill / rekey — all cold), never on steady-state reuse; the extra indirection inside seal/open is negligible against the AEAD.

Operation Before (post-swap) After (boxed)
net_packet_pool get/return 88 ns 38 ns
pool_comparison shared_pool_10x 820 ns 355 ns
pool_contention fast_acquire_release −35..−40%

Net: recovered past the pre-swap baseline while keeping all of the encryption wins.


The full-crate sweep — six subsystems

The sweep covered the whole crate in six parallel passes — core bus, mesh transport datapath, routing/nRPC/reliability, behavior/capability folds, RedEX/CortEX/state, and the Dataforts blob layer. Of 70 findings: 56 fixed-and-tested, 1 folded into another fix, 1 accepted by design, and 12 deliberately parked (7 structural, 3 deferred, 2 partial). The full per-item resolution table is in the audit doc; the marquee items, by subsystem:

  • Dataforts store path (§6.1–§6.3). BLAKE3 and Reed-Solomon no longer run inline on the tokio runtime (now offloaded via spawn_blocking/rayon); dedup hits compare lengths instead of re-reading + re-hashing the whole existing chunk (a 16 MiB dedup hit was costing a 16 MiB read + ~5 ms hash); chunk store is prehashed. Multiplicative win on dedup-heavy ingest.
  • Replication (§5.1, §5.2/§5.4, §5.5). Leader catch-up is now bounded and budget-gated before the read (was O(N²) — read the whole backlog per request); replicated payloads thread through as Bytes end-to-end (removes 3 of 5 per-record copies); replica-apply fsync moved off the async worker via spawn_blocking.
  • nRPC / routing / reliability (§3). FairScheduler::dequeue no longer allocates a Vec + walks the whole DashMap per packet (now an ArcSwap active-stream snapshot); latency histograms are non-cumulative (14 contended atomic RMWs/RPC → 3); the client stream-grant path coalesces through one drainer instead of spawning a task + reliable packet per chunk; the per-call reply-subscription check is off the process-wide mutex; the retransmit window trims in order.
  • Behavior / capability folds (§4). synthesize_capability_set caches a change-generation-keyed Arc<CapabilitySet> instead of re-parsing tags on every call; fold primary store and inverted indexes moved off SipHash; the predicate planner no longer re-plans on every evaluate(); single-pass resource-axis extraction. (The ~40 ns fold-index lookup itself is by design — it scales to millions of nodes — and is untouched. These target the re-parsing and re-allocation around the index.)
  • Mesh transport datapath (§2). O(1) session_id reverse index (was an O(peers) scan per routed-local packet, on the single receive task); in-place relay forward (no per-packet copy + tokio::spawn); PacketBuilder frames events directly into the packet buffer (eliminates the second full-payload memcpy per built packet).
  • Core bus (§1). The global SeqCst in-flight counter is striped (was one cache line ping-ponged across all producers); bus stats derive from per-shard counters; dynamic-scaling metrics are subsampled; the FFI poll path splices raw event bytes via RawValue instead of parse-to-DOM-and-reserialize per event.

Security-relevant aside (§3.8). Call-id minting moved from a getrandom syscall per RPC to a thread-local pooled-entropy CSPRNG — a latency win, but the review pass also found the interim SplitMix64 it briefly used had an invertible public finalizer (a callee could recover the PRNG state from one call_id and predict every future id on that thread). The shipped version uses pooled OS entropy; no call_id predictability in the release.


The review pass

This keeps v0.27.2's discipline: after the 45 fix commits landed, every one was re-reviewed one-by-one (six parallel subsystem reviewers + a docs pass). 17 follow-up commits repaired 10 real bugs introduced by the fixes themselves, added 40+ regression tests, dispositioned 11 external review-bot (cubic) findings, and fixed 2 CI gates. The bugs spanned stale-data, liveness, correctness, security, and one data-loss: §6.7's new RedexFile handle cache wasn't invalidated by sweep_gc, so a post-sweep re-store hit the stale idempotent path and silently skipped the append. All repaired before release.

Validation at the end of the pass: clippy clean under the default feature set and --no-default-features --features {net, cortex}; RUSTDOCFLAGS="-D warnings" cargo doc clean; 4,300+ lib tests green; dir_transfer, integration_cortex_*, and integration_redex suites green.


Capability queries — borrowed index buckets, and a misread corrected

Single-constraint capability queries (tag-only / state-only / region-only) now borrow the index bucket (CandidateKeys::Borrowed) instead of cloning it; composite queries still own. The work also corrected a stale read: the "2.56 ms, linear scan, sad at fleet scale" figure turned out to be measuring the cost of returning half the fleet, not the lookup. With a fixed-cardinality probe (query_tag_rare — exactly 100 matches at every fleet size), the indexed lookup is 3.0 µs at 50,000 nodes — flat from 5K to 50K (a 50× fleet costs ~2× on a constant-cardinality discovery query). The borrowed fast path gave a real −20% at 10K on the half-fleet query; 1K/5K/50K moved within noise.


Benchmark hygiene

  • Multi-producer ingest bench de-skewed — and its first numbers invalidated. The new EventBus::ingest_raw multi-producer bench (the audit's first bench-coverage gap) initially cloned one RawEvent template; RawEvent caches its xxh3 routing hash, so every producer routed to a single shard — it measured shard-mutex contention, not the striped-counter layer it was written to expose. Fixed with a pool of 256 distinct templates, round-robined with per-thread stagger. The numbers in the first commit are not comparable — re-baseline before drawing conclusions.
  • Per-match throughput. query_tag / query_complex now report cost per match (not per query), so the half-fleet queries read correctly: ~24 ns/match at 1K drifting to ~50 ns/match at 50K (cache-footprint drift).
  • New benches and captures. raw_ring/{64,256,1024,4096} keeps the cipher-vs-cipher AEAD profile visible alongside the RustCrypto reference; query_tag_rare isolates index-lookup cost from result cardinality. Fresh i9-14900K and M1 Max benchmark sets were recorded.

Breaking changes

None on the wire, and none for honest peers. There is no wire-format change anywhere in this release; mixed-version meshes interoperate (the AEAD swap is byte-identical RFC 8439, proven by cross-impl tests + the interop smoke). The SDK and FFI surfaces are unchanged.

One build-time note for packagers (no API/ABI change): ring is now a build dependency of the packet path.

  • Native release targets — linux gnu x64/aarch64, macOS (both), win x64 — build ring routinely.
  • Two jobs gain a hard toolchain dependency to verify on the next release ...
Read more

💜 Net v0.27.2 — "Purple Rain"

10 Jun 12:58
1b07fbb

Choose a tag to compare

A security release — one critical auth fix, and the nRPC wire path keeps shrinking

Where v0.27.1 was pure performance and nothing on the wire moved, v0.27.2 leads with a four-pass security audit of the net crate and the fixes it surfaced — headlined by a critical authorization-bypass in the capability fold — then continues the hot-path work on the nRPC dispatch layer the hot-path audit opened. The full security review is recorded in docs/misc/SECURITY_AUDIT_2026_06_09_NET_CRATE.md; this log is the operator-facing summary.

The reassuring part first: the audit found the crate unusually well-hardened. Untrusted-wire parsing, token/chain auth, nonce/randomness handling, handshake identity binding, secret hygiene, and the filesystem/path surfaces all came back clean and verified (not assumed) — most classic hazards already had named, tested mitigations. One finding stood apart and is fixed below; the rest are medium/low hardening and defense-in-depth.

Interop: honest v0.27.1 peers are unaffected. The critical fix only rejects forged input that the old code silently trusted — a legitimate node always announced its own node_id, so nothing on the honest path changes. No wire-format change.


🔴 The critical fix — the capability fold now binds wire node_id to the verified signer

SignedAnnouncement::verify checked that an announcement carried a valid Ed25519 signature over a transcript that included node_id — but never that the claimed node_id was actually the signer's. The dispatch and apply layers then keyed all capability/reservation state on that attacker-supplied node_id.

The exploit chain (all four links confirmed against the code):

  1. Peer A — legitimately authenticated via PSK + Noise — signs a CapabilityMembership envelope with its own entity key but sets the internal node_id to victim C's.
  2. verify passes: it is a valid signature by A over those bytes; nothing required the node id to be A's.
  3. apply installs the entry under key (class_hash, C) — a forged capability now lives in C's state (e.g. tags:[nrpc:<service>], allowed_nodes:[A]).
  4. A calls the gated service; the callee gate reads by_node[C], finds the forged entry, and returns true.

Impact: complete bypass of the per-node nRPC capability allow-list — any authenticated participant could invoke any capability-gated service on any node — plus global forge/overwrite/strip of other nodes' advertised capabilities (cap-stripping DoS, scheduler-placement poisoning). The same unbound-node_id primitive hit ReservationFold, enabling reservation/lock hijacking on behalf of arbitrary node ids.

The fix is exactly the one the audit prescribed — surgical, outsized impact: verify / decode_and_verify (and the reservation path) now reject any envelope where ann.node_id != publisher.node_id(), returning WireError::NodeIdMismatch. This closes capability injection and reservation hijack simultaneously. The check is effectively free — Ed25519 verification (~50 µs) already dominates every inbound envelope. Pinned by a full-dispatch multi-publisher regression test, and the Fold::restore trust invariant ("only restore from local snapshots") is now documented alongside it.


FFI hardening — the aggregator handles join the crate's UAF protection

Two aggregator FFI handles (RegistryClientHandle, FoldQueryClientHandle) did an unconditional drop(Box::from_raw(handle)) on free, lacking the HandleGuard every other opaque handle in the crate carries — so a caller racing free against an in-flight op (a pattern the handles' own docs invite) could deallocate the client out from under a live read. Closed:

  • Both handles adopt the standard HandleGuard + leak-on-free + try_enter()-gated ops treatment, with quiesce-on-free, and no longer hold the guard across the blocking RPC.
  • net_registry_last_error_detail / net_fold_query_last_error_detail now return a caller-owned char* (freed with net_free_string) instead of a pointer into a Mutex-owned CString a concurrent erroring op could free out from under the reader; net.h documents the ownership, and the Go bindings free the returned strings.
  • Free now warns only on a genuine drain timeout (via begin_free_detailed), and a new-handle adoption checklist was added to handle_guard so the next FFI handle gets this by construction.

Filesystem — symlink-escape closed in directory reconstruction, including the subtle FS bypasses

fetch_dir sanitized a symlink's link path via safe_join but wrote its target verbatim from the attacker-controlled manifest — the classic "symlink in an archive" exposure. v0.27.2 rejects absolute / escaping symlink targets, and — crucially — closes the bypasses a naïve check misses:

  • Composed-link and symlinked-parent escapes (a link whose escape only materializes through an earlier-reconstructed link).
  • Case- and normalization-insensitive FS bypasses — default macOS APFS compares filenames both case- and normalization-insensitively, so the lexical traversal check now folds component case and applies NFC before comparing. (Reconstruction was already strictly ordered — dirs, then files, then symlinks last — so this was never a traversal write; the fix removes the residual risk to whatever later reads the tree.)

Hardening grab-bag (medium/low, from the audit's backlog)

  • Constant-time secret compares. GroupId (32-byte) / SubnetId (16-byte) bearer secrets now compare via subtle::ConstantTimeEq instead of derived PartialEq / Vec::contains (early-exit, data-dependent timing). Remote timing recovery of a 128/256-bit secret was already impractical; closed for completeness.
  • PSK config permissions. The aggregator daemon now warns when its TOML config (which holds the mesh PSK) is group/world-readable — mirroring the 0600 discipline the CLI identity seed already enforces. The check runs before parse and warns on non-Unix too.
  • Cap-filters documented as advisory. subscribe_caps / publish_caps are self-asserted matchmaking, not an access boundary — the real boundary is require_token + token_roots (root-anchored TokenChain). This is now prominently documented so no one mistakes a cap-filter for access control.
  • Fuzz coverage widened. New fuzz targets for the nRPC request decode, channel-membership decode, migration bindings decode, and blob-transfer header decode — attacker-reachable, manually-hardened decoders that previously lacked a continuous regression guard. The fuzz crate gained the bytes / postcard deps and the cortex / dataforts features to reach them.

Correctness — a node no longer expires its own capability entry (self-inflicted outage)

Surfaced while baselining the nRPC QPS bench (audit §15): capability fold entries carry a TTL (default 300 s) and the sweeper reaps them on expiry, but nothing periodically re-announced the node's own entryserve_rpc's announce is one-time. So any node serving RPC continuously past one TTL (≈5 min) without re-announcing would start rejecting all inbound calls (the callee-side cap gate finds no self-entry → CapabilityDenied) and drop off peer discovery — masked until now because every test/bench runs far under 300 s.

Fixed with a periodic re-announce loop (spawn_capability_reannounce_loop) that re-broadcasts the node's capabilities every MeshNodeConfig::capability_reannounce_interval (default 150 s), refreshing both the local self-index (callee gate) and peers' folds (discovery). Re-broadcasting needs an owned Arc, so MeshNode::start_arc(self: &Arc<Self>) stores a Weak the loop upgrades each tick; the SDK (Mesh::start) and FFI (net_mesh_start) call it. A bare start(&self) keeps its signature — no test-caller churn — and simply omits the loop. A/B tests pin both directions.

A follow-up review hardened the TTL math: because announce_capabilities_with rate-limits the network broadcast to min_announce_interval, a re-announce interval set below it would have let peer entries expire before the throttle released the next broadcast. The stamped TTL is now 2 × max(reannounce_interval, min_announce_interval) — sized to the cadence peers actually see a refresh at, not the bare tick. (The local self-index is refreshed every call regardless, so only peers were ever at risk.)

A companion bench-harness fix bounds call_*_retrying backpressure retries with a 20 s RETRY_DEADLINE, so a saturated benchmark bar fails fast (transport saturated … not measurable here) instead of livelocking past a TTL.


nRPC hot path — fewer wakeups, fewer allocations per round-trip

The audit's headline conclusion holds — the system is syscall- and wakeup-bound, not compute-bound (51% wake/scheduling, 22% transport syscalls, ~5% AEAD). v0.27.2 lands the contained, no-wire-change items that attack the wakeup count on the unary response leg:

  • §8a — one fewer tokio::spawn per response. The response emit closure used to spawn a task for every response publish (~1–2 µs of scheduling on a wake-bound path). It now builds the wire payload synchronously and hands an RpcResponseJob to a single per-service drain task (the same drainer pattern as grant-coalescing) — bounded channel, drop-on-overflow, FIFO. Streaming/duplex variants keep their per-emit spawn; unary is the QPS hot path.
  • §8b — reply-channel name cached per caller. format!("{service}.replies.{caller_origin:016x}") + ChannelName::new() was two heap allocs on every response, deterministic from (service, caller_origin). Now cached alongside the response path's origin cache — a cache hit is an Arc bump.
  • **T2.2 — `RpcPayload:...
Read more

💜 Net v0.27.1 — "Purple Rain"

09 Jun 14:38
8ec6aa4

Choose a tag to compare

A pure performance release — nothing on the wire moves

v0.27.1 ships no new systems, no new SDK surface, and no protocol changes. Every change either replaces an O(shards) operation with an O(1) atomic, swaps an O(n) full-scan for an index read, deletes an allocation, or corrects a benchmark fixture that was reporting fiction. The work is recorded in full in docs/misc/PERF_AUDIT_2026_06_08_BENCHMARK_WINS.md; this log is the operator-facing summary.

The organizing observation, the same shape as v0.27's: the substrate was answering cheap questions expensively. len(), node_count(), and stats() are called on admission gates and per-selection hot paths, and the default DashMap shards to 4 × num_cpus (128 on a 32-thread host), so every one of those calls locked and summed 128 shards regardless of how few entries the map held — an ~950 ns fixed cost to read a number the code could have maintained as it went. v0.27.1 maintains it as it goes.


DashMap::len() was a 128-shard walk on hot paths

The cross-cutting fix. Five subsystems carried AtomicUsize (and AtomicU64) counters that are now maintained exactly on every insert / remove / eviction, replacing the per-shard walk:

  • LocalGraph (swarm.rs) — num_nodes / num_edges / num_seen. The hot one: the seen_pingwaves soft-cap gate ran on every accepted pingwave, paying the shard walk per admission. local_graph/on_pingwave_duplicate drops from 974 ns → 16 ns (~60×).
  • ProximityGraph (behavior/proximity.rs) — num_nodes / num_edges / num_seen.
  • MetadataStore (behavior/metadata.rs) — node_count, and stats() now reads its inverted indexes (status / tier / continent) instead of full-scanning every node with a String allocation per entry.
  • FailureDetector (failure.rs) — num_nodes, plus check_all() now reads the monotonic clock once per sweep instead of once per node.
  • RoutingTable (route.rs) — num_routes / num_streams, including the per-novel-stream admission gate.

node_count() / len() / stats() reads collapse from ~950 ns to a sub-nanosecond atomic load. The FailureDetector per-status (healthy / suspected / failed) tally is deliberately kept as a scan — it's observability-only and node status is mutated in place, so a maintained per-status counter would silently drift. The scan is always exact.


Capability serialize — a one-word fix

sorted_tag_vec sorted capability tags with sort_by_key(|t| t.to_string()), which re-renders each Tag to a String on every comparison (~N log N allocations). Switched to sort_by_cached_key, which renders each tag exactly once (N allocations). Output order is byte-identical, so signed CapabilityAnnouncement bytes stay stable across peers — pinned by a regression test. capability_set/serialize drops 65.3 µs → 9.6 µs (~6.8×); capability_announcement/serialize 71.7 µs → 11.8 µs (~6.1×).


API registry — O(1) counts, index-derived stats, allocation-free path match

ApiRegistry (behavior/api.rs) got the same treatment plus an allocation fix:

  • len() / is_empty() / stats().total_nodes and the register capacity gate now read node_count / total_endpoints atomics. api_registry_basic/len: 1.42 µs → 0.20 ns.
  • stats() reads apis_by_name from the by_api_name inverted index (provider count per name, skipping empty buckets) rather than full-scanning every node and schema with a String clone per schema. api_registry_basic/stats: ~201 ms → ~7 µs.
  • find_by_endpoint called matches_path(..).is_some(), allocating two Vecs + a HashMap + a String per endpoint per node just to extract a bool. A new allocation-free ApiEndpoint::path_matches() -> bool replaces it at the three params-discarding call sites (the full scan is retained — it's correct for endpoints whose first path segment is a parameter, which a prefix index would miss). api_registry_query/find_by_endpoint: 6.98 ms → 1.88 ms (~3.7×), all from dropped allocation.

stats()'s apis_by_name is now distinct provider nodes per API name (the index is a provider set); this differs from the old per-schema-instance count only when one node advertises the same API name in two schemas — a degenerate case, documented and pinned by a test.


Load balancer — snapshot selection, right-sized hash ring

LoadBalancer::select (behavior/loadbalance.rs) is a per-dispatch hot path in GroupCoordinator, and get_available_endpoints iterated the endpoints DashMap via DashMap::iter — a 128-shard walk regardless of endpoint count.

  • Endpoint snapshot. The authoritative DashMap is kept for point lookups (reservation, health/metric updates); select / stats / endpoints / endpoint_count now iterate a flat ArcSwap<Vec<Arc<EndpointState>>> snapshot rebuilt only when the endpoint set changes. Per-endpoint atomic state (health, connections, circuit) stays live through the shared Arcs. lb_strategies/round_robin: 8.24 µs → ~340 ns (~24×); lb_scaling/select/10: 5.59 µs → ~370 ns (~15×).
  • Right-sized hash ring. consistent_hash selection walks the separate hash_ring DashMap, which the snapshot doesn't cover; it was over-sharded the same way. Pinning it to 8 shards (HASH_RING_SHARDS) cut lb_strategies/consistent_hash ~20% (49.1 µs → 39.8 µs), no new invariants.

A documented experiment (in the audit, "Snapshot vs. right-sized DashMap") confirmed the snapshot is not over-engineering: replacing it with a merely right-sized endpoints DashMap regressed select ~2× (a wait-free ArcSwap load over a contiguous Vec beats locking even 8 shards over scattered HashMap buckets on the iterate-heavy path). The snapshot stays; only the ring — which it doesn't cover — was right-sized.


Concurrency hardening (correctness, shipped with the perf work)

The dual-store and counter changes drew a review pass that closed five latent races before they could ship:

  • LoadBalancer membership lockadd_endpoint / remove_endpoint now serialize the map mutation + snapshot rebuild under a Mutex, so concurrent membership changes can't store a stale snapshot last (which would silently drop a just-added endpoint from rotation). Off the hot path; select only reads.
  • Removed-endpoint flag — an EndpointState.removed bit, set on removal and checked in is_available(), so a selector reading a snapshot taken just before a concurrent removal filters the gone endpoint out instead of burning a reservation retry into a transient false NoEndpointsAvailable.
  • ApiRegistry::register made atomic per node — the read-old / re-index / insert sequence now runs under a single nodes entry lock (mirroring MetadataStore::upsert), so concurrent re-registration of the same node can't drift total_endpoints (which, decremented with fetch_sub, could otherwise underflow to a huge value).
  • ApiRegistry::clear drains instead of store(0) — per-key decrement through the same chokepoints the live paths use, so a concurrent unregister racing clear can't underflow the counters.
  • RoutingTable::get_stream_stats gated on the cap — it created a stream_stats entry for any id unconditionally, bypassing the MAX_STREAM_STATS soft cap the record_* paths enforce; now gated, returning Option.

All five carry regression tests (including multi-thread stress tests for the counter races).


Benchmark fixtures — corrections, not wins

Three of the largest "before" numbers were never real production costs — they were shared, growing Criterion fixtures bleeding into each other. The audit's §7 records them so nobody chases the wrong number, and the O(1)/fixture work makes them moot:

  • failure_detector/check_all (670 ms), failure_detector/stats (198 ms), and metadata_store_basic/stats (169 ms) were inflated by the heartbeat_new / register_new benches ballooning a shared detector/store that the later stats/check_all closures reused. check_all is genuinely O(n), so its bench got a dedicated growth_detector; the stats/len numbers are moot post-rework because those methods are now O(1) regardless of map size. Post-fix: check_all 16.7 µs, stats 16 µs, metadata stats 15.9 µs.

Measured results

Full table in the audit doc. Headline figures (Intel i9-14900K, Criterion defaults):

Benchmark Before After Change
local_graph/node_count 958 ns 0.20 ns ~4770×
local_graph/stats 2.89 µs 0.33 ns ~8850×
local_graph/on_pingwave_duplicate 974 ns 16 ns ~60×
metadata_store_basic/len 956 ns 0.20 ns ~4750×
routing_table/aggregate_stats 13.1 µs 6.07 µs ~2.2×
capability_set/serialize 65.3 µs 9.63 µs ~6.8×
api_registry_basic/len 1.42 µs 0.20 ns ~6970×
api_registry_query/find_by_endpoint 6.98 ms 1.88 ms ~3.7×
lb_strategies/round_robin 8.24 µs ~340 ns ~24×
lb_scaling/select/10 5.59 µs ~370 ns ~15×
lb_strategies/consistent_hash 50.6 µs 39.8 µs ~1.27×

Absolute "after" figures on the sub-µs select/lb rows carry ±40–50% run-to-run variance on the dev box; they're representative, not precise, and the audit's re-verification note documents the spread. The multipliers and the order-of-magnitude wins are stable.

SIMD crypto (documented, opt-in). The audit's highest-leverage item — the ChaCha20-Poly1305 AEAD running on the software backend rather than AVX2 — is documented but deliberately not enforced in committed config: a baked-in +avx2 floor would SIGILL on pre-AVX2 x86-64 and is meaningless on ARM. Operators opt in per target class via RUSTFLAGS="-C target-feature=+avx2" (or target-cpu=native); default builds keep the so...

Read more

💜 Net v0.27 — "Purple Rain"

07 Jun 08:12
fb376b7

Choose a tag to compare

Named after Prince's 1984 closer to the album and the film — the eight-minute power ballad cut live one August night at First Avenue in Minneapolis in 1983 and never re-recorded, the Wendy Melvoin guitar lead and the Lisa Coleman piano answering each other under a vocal Prince once said started life as a Bob Seger country song before he heard it differently. "Dearly beloved, we are gathered here today to get through this thing called life." The film's last shot is the Kid walking offstage after the band has finally come together; the album's last note is the long-decaying piano chord that follows. v0.27 is the substrate's same shape: the long-running reliability, security, and concurrency threads that have been running through the codebase since v0.21 close out together. Stream retransmit wires every piece of the reliable-stream machinery the substrate had implemented but never connected. The reliable-stream hardening pass closes the cluster of deficiencies that wiring surfaced. The channel-auth audit replaces bare-token credentials with root-anchored token chains. The capability fold's bulk-query path turns 100 ms scans into 100-µs lookups. The polling-to-event-driven SDK migration ends a class of CPU waste across every language tier. And a new fair-scheduler transport primitive ships datafort blob transfer in the same release as the SDK that exposes it. Same nRPC, same fold, same fold-driven discovery — but the substrate finally plays the chord that resolves the act.

A long act closes; a new transport primitive opens the next one

The v0.27 release converges a stack of work that has been threading through the codebase since v0.21. None of it introduces a new system — every piece either finishes wiring a substrate machine the codebase had built but never connected, hardens a path that has been carrying production traffic, or strips waste from a layer that has been overpaying. The public type surface bumps in a handful of places where the shape change earns its keep many times over (root-anchored token chains, the new scheduled flag on StreamConfig); everything else lands under the hood.

The organizing observation: the substrate already had every primitive it needed, just not always connected to itself. The reliability layer's retransmit code had shipped in isolation and lived in a separate code path the MeshNode receive loop didn't use; v0.27 wires it. The fair scheduler arbitrated relayed traffic but not originating sends; v0.27 adds an opt-in scheduled flag and a new blob-transfer subprotocol that rides it. The capability fold had a query surface but cloned the whole CapabilityMembership payload to extract a NodeId from each match; v0.27 makes the bulk-query path index-driven and the payload clone go away. The MeshOS snapshot publisher fired its change signal every tick regardless of whether anything structural had changed; v0.27 gates the signal on a structural-view diff while keeping the snapshot itself live. The substrate stops paying for what it isn't using, finishes what it's using halfway, and ships a new SDK surface for what's been sitting under it.

Below: the wins, grouped by where they fire.


Reliable streams — retransmit wired end-to-end, full hardening pass

MeshNode reliable streams provided dedup + in-order accounting + flow control, but did not retransmit lost packets — a dropped packet was a permanent gap, and the receiver stalled to the 30-second transfer timeout. The machinery existed (ReliableStream::{on_send, on_nack, get_timed_out, build_nack} were all implemented), but send_on_stream never called on_send, there was no retransmit loop, and the receive path never emitted a NACK. v0.27 connects every piece and then closes the cluster of deficiencies the connection surfaced.

Retransmit wired end-to-end. MeshNode::send_on_stream now registers a RetransmitDescriptor on every reliable send. The receive path emits a NackPayload-carrying SUBPROTOCOL_STREAM_NACK packet whenever an out-of-order arrival opens a gap, coalesced per (session, stream) through a per-mesh drainer. The sender consumes NACKs and resends from its descriptor window. A timeout backstop walks active reliable streams every RTO interval, resending tail packets a lost-final-packet case can't NACK. Verified by a test that drives a multi-MiB transfer under 1-in-10 drop and asserts byte-for-byte completion.

Retransmit window auto-sized to the tx-window. Pre-v0.27 the retransmit window was fixed at 32 entries; a tx-credit window admitting more than 32 in-flight packets silently evicted unacked descriptors and lost them permanently. v0.27 derives max_pending from the tx-window so the invariant tx-window ≤ retransmit-window holds for any window. Eviction-as-silent-loss is now a misconfiguration the runtime won't reach by default.

untracked_evictions surfaced. The eviction counter that should have been a metric for years gets a rate-limited warn! (first occurrence + every 64th) and an untracked_evictions() accessor so production loss is visible in dashboards.

Hard-failure signal on retransmit give-up. A descriptor past max_retries now flags the stream failed and emits a SUBPROTOCOL_STREAM_RESET to the peer; the receiver's blob-transfer engine maps the reset to BlobError and fails its pending read promptly instead of stalling to the caller's 30-second timeout.

Ack-driven pruning of the retransmit window. The retransmit window was never pruned on the happy path — packets lingered until the RTO and spuriously resent. The receiver's next_expected is now piggybacked on StreamWindow grants (now 24 bytes, +ack_seq); the sender prunes via ReliableStream::on_ack. Without this, the new give-up signal turned the spurious resend into a spurious give-up.

Proactive gap NACKs. A receiver whose consumption stalls on a gap can't drive a grant-piggybacked NACK. The retransmit loop now calls collect_gap_nacks per tick so recovery happens within an RTO instead of waiting on the sender's timeout backstop.

Adaptive RTO. RFC 6298 SRTT/RTTVAR with Karn's algorithm, clamped to [10 ms, 2 s]. Replaces the fixed 50 ms RTO that spurious-resent on slow WANs and was sluggish on fast links.

Reno-style congestion window. Slow-start and congestion-avoidance growth, multiplicative decrease on NACK loss, reset-to-floor on timeout. Gates send_on_stream via can_send; no-op on loss-free paths.

Graceful close. New MeshNode::close_stream_graceful waits for the reliable layer to drain (every send acked) or a timeout before closing — serve_chunk's hand-rolled ack-wait close becomes a substrate primitive.

In-order contract clarified. The substrate delivers events in arrival order plus seq; the blob-transfer engine reorders by seq itself; nRPC frames its own order and is fire-and-forget. Reliability::Reliable's docstring previously claimed in-order delivery; v0.27 corrects it and pins the contract at the delivery site. A general in-order buffer is deferred (no consumer needs it).


Channel auth — root-anchored token chains, locally-held publish chains

Root-anchored credentials. Bare-token credentials are replaced with TokenChain everywhere; a presented credential is honored only if it roots at one of the channel's token_roots. The subscribe path carries the chain over the wire end to end (subscribe_channel_with_chain).

Locally-held publish chains. The above broke delegated publishers — a node holding a publish grant via owner → org → this_node could only wrap its leaf token from the local cache, whose issuer is the immediate delegator, not the channel owner; the root-anchor check then failed. v0.27 adds MeshNode::set_publish_chain(channel, chain) so a delegated publisher can install the full chain locally; publish_many consults published_chains first and falls back to the cache-derived single-link form for direct-issued grants. Direct-issued publishers (the common case) need no change.

The publish self-check gates a node against itself, so this is correctness for honest delegated publishers rather than a closed attack surface — a deployment that grants publish rights by delegation silently lost the ability to publish post-audit until v0.27.


Capability fold — bulk-query path goes index-driven

v0.25 moved CapabilitySet's typed fields into a canonical HashSet<Tag> source of truth. The fold's bulk-query path didn't get the corresponding rework — composite_query was still cloning the whole CapabilityMembership payload for every candidate so find_nodes_matching could read the NodeId and throw the rest away. v0.27 closes the gap.

Whole-candidate-set clone removed. The bulk-query path returns NodeIds directly; the payload clone is gone.

Index-driven complex queries. query_model / query_tool were full-scan + clone + re-parse-every-tag operations against ~10k-node folds. v0.27 makes the index seed the candidate set; the post-filter walks the index, not the payload.

Benchmarks (M1 Max, 10k-node fold):

query before after factor
query_single_tag 14.2 ms 184 µs ~77×
query_complex 14.2 ms 364 µs ~39×
query_require_gpu 29.1 ms 366 µs ~79×
query_gpu_vendor 29.5 ms 614 µs ~48×
query_min_memory 29.7 ms 486 µs ~61×
query_model 108 ms 88 µs ~1230×
query_tool 109 ms 374 µs ~290×

The locking surface is unchanged — concurrent queries already parallelize through the dual-RwLock-read structure that v0.22 shipped. The fix is to make each individual query cheaper, not to touch the locks.


MeshOS — snapshot change-gating, structural-view diff

The MeshOS loop runs publish_snapshot() at the end of every reconcile pass (default tick_interval 500 ms). Pre-v0.27 the call unconditionally store...

Read more

v0.27.0-beta.1

03 Jun 04:50

Choose a tag to compare

Update uv.lock

🐒 Net v0.26 — "Monkey Business"

28 May 10:07

Choose a tag to compare

Named after Skid Row's 1991 single — the opening track and lead-off shot from Slave to the Grind, the record that blew up the band's bubblegum-metal reputation and, in the same swing, became the first hard-rock album to debut at number one on the Billboard 200 in the SoundScan era. Their 1989 debut had floated on power ballads — "18 and Life," "I Remember You" — and the label wanted more of the same; the band handed back a heavier, meaner, downtuned record and put "Monkey Business" first, Rachel Bolan and Snake Sabo's swampy, menacing strut, all swagger and trouble grinning in the doorway.

A full-surface security pass, and the eight places code drifted from its own safety protocols

v0.26 is a security hardening release. It is the result of a full-surface review across the parts of the crate where a mistake costs the most: wire-protocol parsing, the crypto primitives, the C-ABI FFI boundary, identity / token / auth, on-disk storage, and the client SDKs.

Most of the classic traps — the off-by-one slice, the unchecked length prefix, the malleable signature, the path-traversal write — already carry an explicit guard and a regression test pinning it. The eight issues that came out of the pass cluster in one place: where a single piece of code diverged from a safety protocol the rest of the codebase already follows. A blob handle that skipped the quiescing dance every other handle does. An inbound length cast the wide way in one binding and the narrow way in another. A token expiry that had a saturating add but no ceiling. The fixes mostly amount to making the outlier match the rule.

A blob handle that didn't play by the handle rules. The crate documents a per-handle quiescing protocol for exactly one hazard: a foreign thread (a Go cgo callback, a Python thread, a Node worker) sitting inside an FFI call while another thread frees the same handle. Every mesh / cortex / redis handle embeds a small guard, gates each operation on it, and on free leaks the handle box rather than deallocating it — so a racing call always lands on valid memory, sees the "freeing" flag, decrements, and bails. The mesh blob-adapter handle was the one that never got the treatment: it carried only the inner pointer, and its free did an unconditional deallocation. A store / fetch / exists racing a free read freed memory; a second free was a double-free. v0.26.0 embeds the guard, gates every operation on it, and makes free leak the box and drop only the inner — the adapter now follows the same recipe as every handle around it. A regression test pins both properties: an operation on a freed handle returns the null-pointer code instead of corrupting memory, and a double-free is a no-op.

An inbound length cast the narrow way. Inbound nRPC request bodies and the MeshOS causal-event / snapshot-restore payloads were copied from the native buffer with a 64-bit size cast down to a 32-bit signed int. A length with the high bit set went negative and crashed the copy before the handler's panic recovery could catch it; a length at or past 4 GiB modulo 2³² produced a short copy — a truncated body whose framing still claimed the original size, a clean parse-desync primitive. Both are reachable from whatever a peer puts on the wire. One binding file already did this correctly — checking the length against the platform-int maximum and copying through a wide slice — but the inbound trampolines had not been updated, in two separate binding copies. v0.26.0 routes every inbound site through one guarded helper that rejects an over-range length and copies through a wide slice, applied to both copies.

Tokens that could outlive the heat death. A permission token's expiry was a saturating add of issue-time plus requested duration, with no cap on the duration — a caller could mint a token with a TTL of u64::MAX, whose expiry saturated into a timestamp that never arrives. The only way to retire such a token is an advisory revocation floor that has to be distributed out of band and that a given node might never learn to bump. v0.26.0 rejects any TTL past a one-year ceiling at issue time with a typed TtlTooLong error. Delegation only ever copies a parent's expiry, so the bound holds transitively down the whole chain. Long-lived grants now have to be periodically re-issued — which re-checks the issuer's signing key and current policy — and the blast radius of any single leaked token is capped at a year.

Constructors that skipped the guard. The registry-client, fold-query-client, and channel-registration entry points, plus the blob-adapter constructor, dereferenced the inner mesh / redex node after only a null check, with no free-race guard. A concurrent free that won its race left them reading a dropped pointer. Same class as H1, narrower blast radius — these run before the handle is widely shared. v0.26.0 gates each on the relevant handle's guard; the node-clone accessors now hold the guard across the clone and return an Option, and every caller surfaces a null / error result when the handle is being torn down.

Clock skew with no ceiling. The token cache's clock-skew tolerance — a knob for absorbing NTP and container-clock drift — accepted any value. A large skew symmetrically widens every token's validity window: an expired token stays accepted for that many extra seconds, across the whole cache. The default is strict (zero), so this was misconfiguration-gated rather than on by default, but there was no guardrail. v0.26.0 clamps the tolerance to five minutes, which comfortably covers real drift while keeping a fat-fingered config from turning the expiry check into a rubber stamp.


Test hygiene

  • Every fix that could carry a regression test does. The H1 fix pins that an operation on a freed handle bails with the null-pointer code and that a double-free is a no-op. The H3 fix pins rejection at and past the TTL ceiling and a valid, non-saturating token at exactly the ceiling. The M2 fix pins the skew clamp on both the constructor and the setter. The L3 fix plants a symlink to an out-of-root secret and asserts that fetch, exists, and stream all refuse it.
  • A follow-up review caught two things the fixes themselves introduced. Bounding the TTL turned the SDK's infallible token-issue helper — which unwraps the fallible path — into a panic on an over-long TTL; it now soft-clamps to the ceiling instead, matching the existing zero-TTL soft-clamp, with its own release / debug / fallible test trio. The new read-path symlink test was gated to the platforms that can plant a unix symlink, and the blob existence probe re-applied its regular-file contract so a directory sitting at a blob slot is not reported as present.
  • The full library test suite passes, including the new regression tests.

Breaking changes

TokenError has a new TtlTooLong variant

Additive, but TokenError is a plain enum — downstream code that matches it exhaustively without a wildcard arm will need a new arm for the variant. The binding error-string maps were updated in lockstep (ttl_too_long).

Token TTL is capped at one year

try_issue returns TtlTooLong for any duration past the one-year ceiling; the infallible issue wrapper panics on it (use try_issue for untrusted input). The SDK's infallible issue_token soft-clamps to the ceiling rather than panicking. Callers that were minting multi-year or never-expiring tokens must re-issue inside the bound or move to a periodic re-issue.

Clock-skew tolerance is capped at five minutes

TokenCache::with_clock_skew / set_clock_skew clamp any larger value to five minutes. A config that set a larger skew silently receives the clamp.

New public constants

MAX_TOKEN_TTL_SECS (one year) and MAX_TOKEN_CLOCK_SKEW_SECS (five minutes) are exported from the identity module for callers that want to check before they call.


How to upgrade

  1. Most consumers — bump the dependency. The fixes are on by default and need no source changes unless you mint tokens with very long TTLs, configure a large clock skew, or match TokenError exhaustively.

  2. Token issuers — check your TTLs. Anything past one year is now rejected on the fallible path and clamped on the SDK's infallible path. If you were relying on a never-expiring token, switch to a periodic re-issue — that is the point of the cap. MAX_TOKEN_TTL_SECS is the ceiling to check against.

  3. Anyone matching TokenError — add the TtlTooLong arm. Exhaustive matches without a wildcard will not compile until you do.

  4. Operators who tuned clock skew — confirm your value. Anything above five minutes is now clamped to it. If you genuinely needed a wider window you were papering over a clock problem; fix the clock instead.

  5. Foreign-language callers sharing handles — no API change, but the race is now safe. Sharing a blob-adapter handle across threads and racing a free against an in-flight call no longer corrupts memory — the racing call bails with the null-pointer code. No code change required.

  6. Wire format is unchanged; v0.25 and v0.26.0 peers handshake cleanly.

⚡ Net v0.25 — "Shock To The System"

28 May 01:44

Choose a tag to compare

Named after the lead single from Billy Idol's 1993 album Cyberpunk — the one he cut as a concept record about networks reshaping how people would work, recorded with a Mac LC III in the booth and a Macromedia Director CD-ROM tucked into the jewel case, panned at release for being too-soon and now read as a marker of the moment the network stopped being a thing other people did. Same wire, same nRPC, same capability fold — but every typed service is now an LLM-callable tool, and the capability subsystem stopped paying for what every other discovery layer is paying for.

One surface every agent can call, and a capability hot path that got back to single-digit nanoseconds

The v0.25 release is the result of two pushes against the same mesh-discovery surface from opposite ends. The agent-facing push exposes every typed nRPC service as an LLM tool — serve_tool / list_tools / watch_tools / call_tool in Rust, Node, Python, and Go, plus format translators for OpenAI / Anthropic / Gemini / MCP so the descriptor lowers directly into whichever provider the agent already runs. The substrate-facing push is a perf audit against the capability subsystem after Phase A.5.N moved CapabilitySet's typed-struct fields into a canonical HashSet<Tag>: a per-tag String::clone in Tag::axis_key() plus a Tag::to_string()-keyed sort in the wire serializer had quietly turned a 3.7 ns match_min_memory filter into a 46 µs one. Four targeted fixes recovered the regression; the perf audit doc lands in tree alongside the release.

The release's organizing observation: discovery should be free in the hot path and cheap to author at the edges. The capability fold already aggregates every node's capabilities — agent discovery just walks it. The tag-set source-of-truth pattern is the right architecture, but allocating a String per tag per predicate match isn't its tax to pay.

Where v0.25 lands against the rest of the service-discovery field

In-process capability-filter evaluation in v0.25 sits 3–7 orders of magnitude below the published latencies of the network-coordinated discovery systems the field treats as fast:

Layer Operation Typical latency vs Net has_* (~30 ns)
Net v0.25 has_gpu / has_tool / has_model 20–44 ns
Net v0.25 match_min_memory (single-field predicate) 15 ns 0.5×
Net v0.25 match_complex (6 chained predicates, decodes models) 3.8 µs ~130×
Net v0.25 CapabilitySet::to_bytes_compact (full set, postcard) 2.0 µs ~70×
Consul DNS lookup, cached 100–200 µs 3,300–6,700×
Consul DNS lookup, uncached (server) 600–700 µs 20,000–23,000×
Consul client initial query 1.6–3 ms 53,000–100,000×
etcd lookup, recommended P99 target < 10 ms > 330,000×
Kubernetes / CoreDNS service lookup (ndots:5 default) 100+ ms > 3,300,000×
mDNS / DNS-SD best-case local resolution < 1 ms > 33,000×

Caveat — apples-vs-oranges: the v0.25 numbers measure in-process predicate evaluation against capability announcements already gossiped into the local fold. Consul / etcd / Kubernetes DNS are answering "where is service X across the cluster" with a network round-trip and (usually) a consensus quorum read. They aren't doing the same job. The fair comparison is the in-mesh agent scheduling loop: once announcements are in your fold (Net does that propagation via the same gossip path every other capability rides), filtering and dispatching against them is genuinely four to seven orders of magnitude faster than the registries an agent author would otherwise reach for.

External sources for the published latencies in the table: Consul DNS perf thread, Consul DNS perf issue #1535, Consul server resource requirements, etcd recommended practices (OKD), Kubernetes DNS ndots:5 latency, mDNS / DNS-SD discovery.

Below: the wins, grouped by where they fire.


AI tool calling — every typed nRPC service is an LLM-callable tool

NRPC_AI_TOOL_CALLING_AND_AGENT_DX.md (the plan shipping alongside this release) makes the bet that tool calling is what nRPC already does — "send a JSON object to a named handler, await a JSON response, optionally stream chunks" — with three gaps: metadata so a model can decide when/how to call, a server-streaming primitive matching the unary call_service, and a structured event envelope for streaming output. v0.25 closes all three and ships the agent-author surface across every binding.

One identifier, one source of truth. A tool registered as web_search IS the nRPC service at channel nrpc:web_search.requests IS the announcement carrying the ai-tool:web_search capability tag. No separate registry, no mapping table. Plain rpc.serve("x", handler) continues to register a service without the ai-tool:* tag — invisible to list_tools(). The serve_tool / tool({...}) / @tool opt-in is what makes a service agent-discoverable; operators retain control.

Discovery is capability-fold-native, not RPC-fanout. The capability fold already aggregates ToolCapability instances across every node. list_tools(matcher) walks the fold in-memory and returns ToolDescriptors carrying id + version + node_count + small metadata. Heavy fields (oversized JSON Schemas) fall back to an on-demand tool.metadata.fetch RPC, which serve_tool auto-installs on the host the first time it's called. Subnet visibility, capability auth, region filtering — all inherited from the existing fold + TagMatcher plumbing.

Streaming tools share one event envelope. ToolEvent is a tagged JSON enum every streaming handler emits per chunk:

  • start { tool_id, call_id, metadata? } — fires once on open.
  • progress { pct?, message? } — coarse progress for spinners.
  • delta { data } — partial output (model tokens, file bytes, log lines).
  • result { data } — terminal full result; client sees one on success.
  • error { code, message, details? } — terminal failure with structured detail.

Unary tools synthesize a single result envelope under the hood. The convention lets every adapter (OpenAI / Anthropic / Gemini / MCP / Hermes / custom) lower envelopes into the framework's native streaming protocol without per-pair negotiation. Two synthesized error shapes round out the contract: missing_terminal on the streaming caller when the server closed without a result/error chunk, and handler_error on the streaming server when the handler raised mid-stream. Both are part of the T-2 JSON byte-equality fixture so adapters can match on the code reliably.

serve_tool is atomic w.r.t. observable mesh state. Either all of (handler registration, capability-fold publish, nrpc:<tool_id> tag, ai-tool:<tool_id> tag, auto-installed tool.metadata.fetch if first) succeed, or none do. Drop on the returned handle reverses all four.

Cross-language by construction. The wire is unchanged: call_tool is call_service with the typed wrapper, call_tool_streaming rides the new call_service_streaming substrate primitive (mirror of call_service returning an RpcStream). A Python Hermes agent calling a Go-hosted database tool calling a TypeScript browser tool is transparent over the existing nRPC wire. The T-1 cross-language test pins byte-equality of every format translator output (to_openai_tool / to_anthropic_tool / to_gemini_tool / to_mcp_tool) across Rust / Node / Python / Go for every fixture descriptor.

Surface by language:

Surface Rust Node TS Python Go
serve_tool / call_tool (unary) ✅ (sync + async)
serve_tool_streaming (handler returns Stream<ToolEvent>) ✅ (sync + async-gen)
call_tool_streaming (capability-routed caller) ✅ (sync + async)
list_tools / watch_tools ✅ (polling) ✅ (polling) ✅ (polling)
tool.metadata.fetch (caller + auto-install server)
Format translators × 4 (OpenAI / Anthropic / Gemini / MCP)
missing_terminal + handler_error synthesis
AbortSignal / cancel on watch_tools ✅ (ctx)

Format translators ship in one package per language. net-mesh-tools (pip) carries formats/{openai,anthropic,gemini,mcp} submodules; @net-mesh/tools (npm) carries formats/{openai,anthropic,gemini,mcp} submodules. Each translator is a small pure function from ToolDescriptor → provider tool-array entry, plus a reverse lower_tool_call(call) -> CallSpec for going from a provider's tool_use block back into a typed nRPC call. No transitive dep on any provider SDK — users wire the translator output into their OpenAI / Anthropic / Hermes / framework-of-choice client themselves.

No wire ABI bump for unary tool calls. Streaming tools use the new call_service_streaming substrate primitive; the wire shape of an individual stream is unchanged from call_streaming today. ToolEvent envelopes are JSON-encoded chunks on existing streams. NET_RPC_ABI_VERSION stays at 0x0004.


Capability perf — closing the Phase A.5.N regression cliff

PERF_AUDIT_2026_05_28_CAPABILITY.md (the audit doc shipping alongside this release) compared two M1 Max criterion runs and found that the Phase A.5.N migration — which moved CapabilitySet's typed HardwareCapabilities / Vec<ModelCapability> / etc. fields into a canonical `H...

Read more