Skip to content

p2p design improvements and bug fixes#152

Merged
mateeullahmalik merged 1 commit intomasterfrom
p2pDesignImporvements
Sep 4, 2025
Merged

p2p design improvements and bug fixes#152
mateeullahmalik merged 1 commit intomasterfrom
p2pDesignImporvements

Conversation

@mateeullahmalik
Copy link
Copy Markdown
Collaborator

P2P: Make hot path fast & resilient — pooled TCP, dynamic deadlines, chunked BatchStore, smarter health/bans

Summary

This PR hardens and speeds up the P2P stack without changing the wire protocol. The hot path no longer wastes RPCs or times out on heavy payloads, connections are responsibly reused, and node health is managed by a sober background loop (not by request-time heuristics).

Headline changes

  • Robust connection pooling
    • Keyed by bech32@host:port for invariant identity.
    • connWrapper serializes whole-RPC I/O (prevents cross-talk on pooled sockets).
    • One safe redial on stale pooled sockets (EOF / reset / broken pipe).
    • Idle pruner (10m tick / 1h idle) + metrics: adds, hits, misses, evictions, open count.
    • TCP NoDelay (low latency) + server-side TCP keepalive (detect half-open).
  • Dynamic write deadlines
    • Write deadline scaled to encoded size with cushion; outer call timeout widened for heavy ops.
    • Server read timeout keeps the connection open on timeouts (no churn).
  • Chunked BatchStoreData
    • Per-node payload split into ~180 MB chunks (stays well under the 200 MB hard cap).
    • Skips empty batches (no-op RPCs) and uses the long-timeout lane for large chunks.
  • Timeout tuning (realistic, payload-aware)
    • BatchStoreData: 90s, BatchGetValues: 90s, Find*/StoreData: 5–15s.
    • Client write deadlines derived from message size; read bounded by operation timeout.
  • Health & banlist behavior
    • Background checkNodeActivity loop (2m, bounded concurrency) with 3s pings.
    • Only demote to inactive when failures exceed threshold=3 (was 1).
    • Unban immediately on successful ping; routing table updated via health, not hot-path pings.
  • Bootstrap strategy (full network view)
    • Periodic chain sync (10m) updates replication_info and seeds routing table.
    • No bootstrap-time pings; hot path relies on full view in-memory routing table (sized for ≤1000 validators).
  • Message safety
    • Hard 200 MB cap enforced in encode/decode; responders compress batch GETs.
    • Server continues on read timeout; client retries once on stale pooled socket.

Key Changes (by area)

Networking / I/O

  • Client (Network.Call)
    • Whole-RPC lock on pooled connections; one redial on stale socket.
    • Dynamic writeDeadline = base + size/throughputFloor + cushion.
    • Deadlines cleared after every RPC (reusable pooled conns).
  • Server
    • Per-message read deadline 90s, but on timeout we continue (don’t close).
    • Write deadline per response (prevents hung writers).
    • TCP keepalive enabled on accepted sockets.

Batch store

  • Per-node chunking (~180 MB) to respect the 200 MB envelope after gob overhead.
  • Skips empty batches (no more useless RPCs).
  • Heavy chunks run with long-timeout profile; smaller ones complete faster.

Health & bans

  • Ban threshold 3; health loop (2m) handles ban/unban and Active flip.
  • Hot paths no longer make ban decisions; fewer false negatives and less churn.

Bootstrap

  • One-time + periodic (10m) chain sync → upsert replication_info and seed routing.
  • No eager pings; routing table maintains full view for ≤1000 validators.

Performance & Reliability Impact

  • Latency: lower due to TCP_NODELAY, pooled conns, and no-reopen on server timeouts.
  • Throughput: better because heavy writes get appropriate deadlines; chunking avoids cap hits.
  • Stability: resilient to transient resets; background health avoids flapping & over-banning.
  • Resource use: limited concurrent batch sends; idle pruner keeps pool bounded.

Backward Compatibility

  • No protocol changes; existing peers interoperate.
  • Timeouts increased only for heavy ops; light RPCs unchanged or lowered.

Configuration Knobs

  • BatchStoreData timeout: 90s (execTimeouts map).
  • Chunk target: ~180 MB raw payload (headroom under 200 MB cap).
  • Pool pruner: 10m tick, 1h idle; pool capacity 256 (tunable).
  • Health loop: 2m cadence; per-ping timeout 3s; ban threshold 3.

Risks & Mitigations

  • Large WAN variance: dynamic write deadlines + longer outer timeouts for heavy ops.
  • Oversized messages: 200 MB hard cap; chunking at ~180 MB prevents rejects.
  • Banlist sensitivity: threshold bumped to 3; unban on successful ping.

Rollback: revert timeout map and chunker; server read timeout remains safe even when reduced.


Testing Performed

  • Unit: encode/decode size guard; stale-socket classification; chunker sizing.
  • Integration:
    • Batch store with mixed 1–50 KB records up to caps; verified chunks <200 MB.
    • Induced server-side close/reset; ensured single redial recovers.
    • Ban/unban via health loop; confirmed no hot-path bans.

Observability

  • Pool metrics: adds/replacements/hits/misses/evictions/open_current/capacity.
  • Logs: “Stale pooled connection on write/read; redialing” markers; batch chunk sizing & timing.
  • Health: banlist size; Active flips; last_seen updates.

Checklist

  • Connection pooling safe under concurrency
  • One redial on stale pooled socket
  • Dynamic write deadlines
  • Server read timeout keeps conn open
  • BatchStore chunking (~180 MB), no empty batches
  • 200 MB cap enforced
  • Health loop governs ban/unban (threshold=3)
  • Bootstrap refresher seeds routing; no hot-path find-node
  • Timeouts tuned (heavy ops 60–90s)

@mateeullahmalik mateeullahmalik merged commit 2aee34a into master Sep 4, 2025
7 checks passed
@mateeullahmalik mateeullahmalik deleted the p2pDesignImporvements branch September 5, 2025 11:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants