Skip to content

nRPC Performance#380

Closed
LZL0 wants to merge 2 commits into
masterfrom
nrpc-performance-3
Closed

nRPC Performance#380
LZL0 wants to merge 2 commits into
masterfrom
nrpc-performance-3

Conversation

@LZL0

@LZL0 LZL0 commented Jun 13, 2026

Copy link
Copy Markdown
Member

Summary by cubic

Inline-ready unary RPC dispatch and lock-free folds to trim c1 latency and simplify mesh bridges. Small c1 win; throughput focus updated to the shared recv loop.

  • Refactors
    • Unary server fold polls the handler once inline and only spawns on Pending; synchronous handlers emit responses without spawn+wake. Added tests for inline emit, pending→spawn hand-off, and inline-path panic containment.
    • All server folds and the client fold now use DashMap and expose apply_shared(&self)/apply_inbound(&self); bridge code drives folds without Arc<Mutex<...>>. The RedexFold trait remains and delegates.
    • Duplicate-REQUEST refusal now uses atomic entry() checks; CANCEL and self-clean paths use DashMap::remove. Streaming folds also move flow_control and request-chunk senders to DashMap.
    • Audit doc corrected: the c16/c128 ceiling is the single recv loop, not fold/drain. Ack-piggyback and batched receive are the primary levers. Quick A/B: c1/32B 34.31 → 33.73 µs (~−1.6%); c16 unchanged.

Written for commit 1e4a0b7. Summary will update on new commits.

Review in cubic

T2.3: the unary fold polls the handler future once inline via
spawn_or_inline and only tokio::spawn's it when Pending — a
synchronous handler emits its RESPONSE without the spawn + wake.
Streaming/client-stream/duplex folds are unchanged (their handler
tasks always park awaiting the pump JoinHandle).

T2.1: all four server folds gain an inherent apply_shared(&self)
(RedexFold trait untouched; trait impl delegates), in_flight /
flow_control / chunk-sender maps move Mutex<HashMap> -> DashMap with
atomic entry()-based duplicate-REQUEST refusal, and
RpcClientFold::apply_inbound takes &self — bridge tasks and the
reply-channel dispatcher drive folds with no Arc<Mutex<...>> wrapper.

Measured (quick A/B, win11 host): c1/32B 34.31 -> 33.73 us (-1.6%,
p=0.00); c16/32B unchanged — consistent with the scaling plan's
finding that the throughput ceiling is the recv loop, not dispatch.
The follow-up audit doc is corrected accordingly (c128 claims struck,
ack-piggyback + batched recv promoted to primary throughput levers).

Tests: 3 new unit tests pin inline emission, pending->spawn hand-off,
and inline-path panic containment; cortex rpc units (78), nRPC
integration (36), SDK mesh_rpc suites, and full lib (4310) all green.
@vercel

vercel Bot commented Jun 13, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
net Ready Ready Preview, Comment Jun 13, 2026 3:55pm

Request Review

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 3 files

Re-trigger cubic

@codecov

codecov Bot commented Jun 13, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 89.58333% with 25 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
net/crates/net/src/adapter/net/cortex/rpc.rs 89% 25 Missing ⚠️

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant