Skip to content

Harden fd-daemon client reads with timeout#13

Open
pengshenghai wants to merge 2 commits into
Scottcjn:mainfrom
pengshenghai:codex/fd-daemon-read-timeout
Open

Harden fd-daemon client reads with timeout#13
pengshenghai wants to merge 2 commits into
Scottcjn:mainfrom
pengshenghai:codex/fd-daemon-read-timeout

Conversation

@pengshenghai

Copy link
Copy Markdown

Problem

fd-daemon serves one client at a time. A client can send a valid frame header with a nonzero payload length, then hold the socket open without completing the payload. The daemon blocks inside read_full() and never returns to accept() while that peer stays connected.

Fix

  • Add a small cross-platform fd_sock_set_read_timeout() helper in daemon/fd_listen.h.
  • Apply a 5 second receive timeout to each accepted client socket at the start of serve_client().
  • Leave the existing full-read semantics intact: complete messages are still processed only after all payload bytes arrive; short/timeout reads drop the current client.

Not changed

  • No render behavior, game logic, POV-Ray integration, scene parsing, socket path/port trust model, wallet/payment code, or deployment behavior changed.
  • No external service scanning or remote attack testing was performed.

Verification

Local static validation from the extracted source tree:

python static check:
- fd_sock_set_read_timeout exists in both Windows and POSIX socket branches
- SO_RCVTIMEO is used
- POSIX includes sys/time.h for timeval
- serve_client() calls fd_sock_set_read_timeout(cfd, 5) before the first read_full()
- existing read_full()/fd_sock_read() path remains intact
=> feverdream read timeout static checks ok

whitespace check:
=> no trailing whitespace in daemon/fd_listen.h or daemon/fd-daemon.cpp

I could not run a full daemon build here because this Windows workspace does not have a C++ compiler available, and the repository's daemon build also expects the vendored POV-Ray archives/toolchain.

Bounty / claim context

Issue: #5
Possible bounty context: RustChain Bug Hunter Scottcjn/rustchain-bounties#520
Wallet: RTC269fa5650798c3aa5086a128c025a546e0a41d0b

AI assistance disclosure: this PR was prepared with AI assistance and manually validated against the source files and checks listed above.

@Scottcjn

Copy link
Copy Markdown
Owner

Good defensive direction (a read timeout closes the slow-loris/hang vector). No blockers, but a few things to tighten before merge:

  1. Ignored return valuefd_sock_set_read_timeout(cfd, 5) return is discarded; if setsockopt fails, the socket silently keeps its prior (often infinite) timeout, so the guard becomes a no-op with no signal. Check the return and at least log (or fail closed).
  2. 5s is a whole-session hard boundSO_RCVTIMEO applies per read for the entire client session, so any legitimate client that pauses >5s between protocol chunks (slow/large scene upload, backpressured parent, debugger) now gets rejected where it previously succeeded. Consider making it configurable (matching the existing FD_PORT/path config pattern) or justify 5s against the protocol's real max inter-chunk gap.
  3. Read-loop EOF vs timeout — after SO_RCVTIMEO, a timeout returns -1 with EAGAIN/EWOULDBLOCK/ETIMEDOUT, not EOF. Make sure the read loop distinguishes that from a clean disconnect so a mid-message timeout doesn't leave partial have_scene/payload/session state.
  4. Tests — the new branch (timeout during partial read, post-timeout extra bytes, setsockopt failure) is the production failure path and is currently untested.

(Reviewed with the lab's tri-brain pass — Codex + Grok.)

…EOF distinction

1. Configurable timeout via FD_READ_TIMEOUT env var (default 5s).
2. Check fd_sock_set_read_timeout return value; warn on failure.
3. read_full now returns 0 (EOF), -1 (timeout), or 1 (success) so
   serve_client can distinguish clean disconnect from staled client.
4. All read_full callers in serve_client updated accordingly.
@pengshenghai

Copy link
Copy Markdown
Author

Addressed review feedback (round 2)

Thanks @Scottcjn for the detailed review. Here's what changed:

1. Return value check ✅

d_sock_set_read_timeout() return is now checked. On failure, a warning is logged to stderr but execution continues (failing closed would break existing deployments where the platform's socket defaults already work). The daemon now at least signals the misconfiguration.

2. Configurable timeout ✅

Hardcoded 5s is now configurable via the FD_READ_TIMEOUT environment variable (seconds, default 5). Matches the existing FD_PORT/FD_THREADS config pattern. Setting FD_READ_TIMEOUT=0 disables the timeout entirely (useful for debuggers or large scene uploads over slow links).

3. Timeout vs disconnect distinction ✅

ead_full() return type changed from �ool to long:

  • Returns 1 on success
  • Returns

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants