Skip to content

fix(security): phase 1 hardening — log redaction, transport fixes, input safety#84

Open
iquidus wants to merge 18 commits into
masterfrom
dev-security-phase-1
Open

fix(security): phase 1 hardening — log redaction, transport fixes, input safety#84
iquidus wants to merge 18 commits into
masterfrom
dev-security-phase-1

Conversation

@iquidus

@iquidus iquidus commented Apr 28, 2026

Copy link
Copy Markdown
Contributor

Closes RND-559

  • Redact sensitive payload from authenticated message log (handlers.go:25). Demoted to DEBUG level with only payload_len exposed, preventing key share leakage to log aggregation systems.

  • Fix response body leaks in DKG/reshare commitment broadcasts (transport/client.go:205,241). Previously discarded both response and error with _, _; now closes response bodies and logs errors via structured logger.

  • Add logger support to transport.Client with WithLogger() builder and nil-safe log() accessor. Replaces fmt.Printf usage with structured zap logging.

  • Fix ECDSA attestation input mutation (ecdsa.go:159). Signature slice is now copied before modifying the recovery byte, preventing caller data corruption.

  • Upgrade operator address mismatch on restore from warning to hard error (node.go:540). Prevents starting a node against mismatched persisted state which could cause cryptographic inconsistency.

…put safety

- Redact sensitive payload from authenticated message log (handlers.go:25).
  Demoted to DEBUG level with only payload_len exposed, preventing key
  share leakage to log aggregation systems.

- Fix response body leaks in DKG/reshare commitment broadcasts
  (transport/client.go:205,241). Previously discarded both response and
  error with `_, _`; now closes response bodies and logs errors via
  structured logger.

- Add logger support to transport.Client with WithLogger() builder and
  nil-safe log() accessor. Replaces fmt.Printf usage with structured
  zap logging.

- Fix ECDSA attestation input mutation (ecdsa.go:159). Signature slice
  is now copied before modifying the recovery byte, preventing caller
  data corruption.

- Upgrade operator address mismatch on restore from warning to hard
  error (node.go:540). Prevents starting a node against mismatched
  persisted state which could cause cryptographic inconsistency.
@claude

claude Bot commented Apr 28, 2026

Copy link
Copy Markdown

Claude finished @iquidus's task in 2m 57s —— View job


Code Review — fix(security): phase 1 hardening

  • Gather context and understand PR changes
  • Review pkg/attestation/ecdsa.go — input mutation fix
  • Review pkg/node/handlers.go — log redaction
  • Review pkg/node/node.go — address mismatch hard error
  • Review pkg/transport/client.go — response body leak fixes + structured logging
  • Review pkg/types/messages.go — message type changes
  • Review pkg/util/util.go — utility changes
  • Review client changes (kmsClient, web3signer)
  • Review test coverage
  • Post comprehensive review

Summary

This PR addresses five meaningful security-hardening issues. The fixes are well-motivated and generally well-implemented, with good test coverage. I found one silent regression that needs a fix before merge, one functional issue with the broadcast logger that needs to be wired up, and a handful of minor issues worth addressing.


Issues

🔴 Bug — WithLogger is defined but never called

transport.NewClient() is called at pkg/node/node.go:345 without chaining .WithLogger(n.logger):

// pkg/node/node.go:344-345
n.transport = transport.NewClient(operatorAddress, tps)
// ← missing: .WithLogger(n.logger)

This means every c.log().Sugar().Warnw(...) call added in BroadcastDKGCommitments and BroadcastReshareCommitments silently discards to the nop logger. The new structured log warnings introduced by this PR will never appear in production. The logger infra is correct; it just isn't wired up at the call site.

Fix this →


🟡 Bug — Duplicate "operator_address" key in two Debugw calls

pkg/clients/kmsClient/client.go has the same structured-log key twice in a single call, in two separate places:

// client.go:433-436 (CollectPartialSignatures)
c.logger.Sugar().Debugw("Collected partial signature from operator",
    "operator_index", idx,
    "operator_address", operatorAddr.Hex(),   // ← key used twice
    "operator_address", response.OperatorAddress)
// client.go:954 (collectPartialSignaturesForDecrypt)
c.logger.Sugar().Debugw("Collected partial signature",
    "operator_index", idx,
    "operator_address", operatorAddr.Hex(),   // ← key used twice
    "operator_address", response.OperatorAddress)

Zap includes both fields; most log aggregators silently drop or overwrite the first occurrence. The second "operator_address" (response.OperatorAddress) should be renamed to "response_operator_address" for clarity, consistent with the existing warn log at line 424.

Fix this →


🟡 Functional gap — Broadcast functions return nil on complete failure

Both BroadcastDKGCommitments and BroadcastReshareCommitments continue past per-peer failures and always return nil. If every peer rejects or times out, the caller receives no error while the DKG/reshare protocol silently stalls. The new structured logging helps, but it doesn't surface the failure to the caller.

Consider tracking a failure count and returning an error (or partial error via errors.Join) when zero or fewer-than-threshold peers accepted the broadcast:

// pkg/transport/client.go
successCount := 0
...
if resp.StatusCode == http.StatusOK {
    successCount++
}
...
if successCount == 0 {
    return fmt.Errorf("commitment broadcast failed: no peers accepted")
}

Fix this →


🟡 Debuggability — Non-OK broadcast response body is not read

When a peer returns a non-200 status in BroadcastDKGCommitments/BroadcastReshareCommitments, only the status code is logged. The rejection reason (which is usually in the response body) is discarded:

if resp.StatusCode != http.StatusOK {
    c.log().Sugar().Warnw("DKG commitment rejected",
        "peer", op.OperatorAddress.Hex(), "status", resp.StatusCode)
    // ← no body read; peer's error message lost
}
_ = resp.Body.Close()

Reading a bounded slice of the error body (e.g., io.ReadAll(io.LimitReader(resp.Body, 256))) would significantly improve incident response.

Fix this →


🟢 Minor — fmt.Sscanf with %s for challenge parsing is subtly fragile

parseChallenge in ecdsa.go:202 uses fmt.Sscanf(challenge, "%d-%s", ...). The %s verb stops at any whitespace, so a challenge containing a space before the nonce would silently truncate — the nonce length check would then catch it, but the error message would be "invalid nonce length" rather than "invalid challenge format", which is misleading.

A more explicit split would be:

// pkg/attestation/ecdsa.go
parts := strings.SplitN(challenge, "-", 2)
if len(parts) != 2 { ... }
timestamp, err := strconv.ParseInt(parts[0], 10, 64)
nonce = parts[1]

This is low risk in practice (nonces are hex-encoded), but makes intent clearer.


🟢 Minor — validateAuthenticatedMessage always returns nil for its third return value

pkg/node/handlers.go:43 has signature (...) (*AuthenticatedMessage, *OperatorSetPeer, interface{}, error) and always returns nil for the interface{} slot. All call sites use _, _, to discard it. This dead return slot is pre-existing but could be removed in a future cleanup to reduce confusion.


What's Well Done

ECDSA mutation fix (ecdsa.go:159-163) — copy(signature, request.Attestation) before modifying the recovery byte is exactly right. The TestVerifyDoesNotMutateAttestation and TestECDSAVerifyIdempotent tests directly validate both the fix and its invariant.

Log redaction (handlers.go:49) — Demoting authMsg.Payload to DEBUG and replacing it with payload_len is the correct approach. It preserves debugging utility (you can see traffic volume) without risking key share leakage to log aggregators that operate at INFO level.

Hard error on address mismatch (node.go:536-539) — The upgrade from Warnw to a hard error is the right call for a cryptographic safety check. The error message includes both addresses and a remediation hint. The common.HexToAddress() normalization on both sides before comparison (line 536) prevents EIP-55 case differences from causing false positives. Both the mismatch case and the case-insensitive success case have dedicated tests.

ValidateAppID (util/util.go:152-163) — Clean implementation: single compiled regex, informative errors, and sensible 255-byte cap. Applied consistently in both /secrets and /app/sign handlers.

Response body size limits (kmsClient/client.go, web3signer/client.go) — maxSuccessBodySize + io.LimitReader on the success path, and MaxErrorBodySize on the error path, prevents OOM from malicious large responses.

DeserializeFr range check (types/messages.go:68) — Catching values at or above the field modulus before they reach SetBigInt is important; gnark-crypto would silently reduce them mod p, corrupting share arithmetic.


Test Coverage Assessment

Coverage is strong for the new changes:

  • TestVerifyDoesNotMutateAttestation + TestECDSAVerifyIdempotent — ECDSA mutation fix ✓
  • TestRestoreState_OperatorAddressMismatch + TestRestoreState_OperatorAddressCaseInsensitive — address mismatch hard error ✓
  • TestDeserializeFr — field element range validation ✓
  • TestNodePersistence suite — persistence infrastructure ✓

Gap: No test exercises BroadcastDKGCommitments / BroadcastReshareCommitments with a failing peer to verify the new warning paths fire or that the WithLogger wiring works end-to-end.


iquidus added 17 commits April 30, 2026 15:45
- BroadcastDKGCommitments and BroadcastReshareCommitments now warn when
  a peer returns non-200, consistent with sendCommitmentBroadcast.
  Aids diagnostics for rejected protocol messages.

- WithLogger(nil) is now a no-op instead of overwriting the logger field
  with nil, which would cause log() to allocate a fresh zap.NewNop() on
  every call.
NewClient always initializes c.logger to zap.NewNop(), and WithLogger
guards against nil replacement, so c.logger is never nil after
construction. Remove the dead fallback branch and misleading comment
…match on startup

The RestoreState check compared persisted address strings directly against
EIP-55 checksummed output. If older code wrote all-lowercase hex, the node
would refuse to start despite the address being identical. Now both sides
are normalized via common.HexToAddress before comparison.

Also adds TestECDSAVerifyIdempotent to explicitly document the idempotency
contract of attestation verification, and TestRestoreState_OperatorAddressCaseInsensitive
to guard against case-sensitivity regressions.
…eference

Tests constructing Client struct literals without a logger field would
panic when BroadcastCommitmentsWithProofs hit an error path that logs
via c.log().Sugar().Warnw(). Initialize logger with zap.NewNop() in all
test clients, matching the default set by NewClient().
- DeserializeFr now returns errors instead of silently returning zero on
  malformed data, preventing corrupted shares from entering DKG/reshare
- ValidateAppID enforces max length (255) and character whitelist
  [a-zA-Z0-9._-] to prevent injection via crafted app IDs
- AttestationTime rejects timestamps >5min in the future to prevent
  abuse of historical key version lookups
- All unbounded io.ReadAll calls wrapped with io.LimitReader (64KB for
  web3signer responses, 1MB for error bodies) to prevent OOM from
  malicious peers
…d, observability, tests

- Split web3signer response limit into maxSignResponseSize (64KB) and
  maxListResponseSize (1MB) so ListAllPublicKeys doesn't silently truncate
- Replace inline 1<<20 literal in node.go with named maxErrorBodySize
  constant for consistency with kmsClient
- DeserializeFr(nil) now returns an explicit error instead of (nil, nil)
  to prevent nil dereference in future callers
- Promote maxAttestationTimeSkew to package-level constant for
  visibility and testability
- Add structured log on future-timestamp rejection for observability
- Add unit tests for DeserializeFr error paths (nil, empty, malformed)
- Add integration test for future attestation time rejection
…shared constants

- DeserializeFr now rejects values >= BLS12-381 Fr field order via
  big.Int range check before calling SetBigInt; gnark-crypto's SetString
  silently reduces mod p, which would corrupt DKG/reshare shares
- SerializeFr panics on nil input with a clear message instead of a
  cryptic nil-pointer dereference, matching DeserializeFr's nil contract
- Move attestation time check before expensive attestation crypto and
  on-chain lookups to avoid wasted work on malicious requests
- Apply same future-timestamp guard to /app/sign for consistent behavior
- Consolidate duplicate maxErrorBodySize into types.MaxErrorBodySize
- Split web3signer limits: maxSignResponseSize (64KB) vs
  maxListResponseSize (1MB) to avoid truncating large key lists
- Fix FuzzValidateAppID to match new character/length constraints
- Add field-order boundary test cases to DeserializeFr tests
…nd tests

- Rename maxAttestationTimeSkew to maxAttestationFutureOffset (5min)
  to accurately describe its unidirectional purpose
- Add maxAttestationPastAge (24h) to reject stale attestation timestamps
  that could target older/weaker key versions; applied to both /secrets
  and /app/sign handlers
- AttestationTime=0 (use active version) is exempted from the past check
- Add TestHandleAppSign_AttestationTimeBounds covering future-rejected,
  past-rejected, and recent-accepted cases
- Add negative decimal string test case ("-1") to DeserializeFr tests
- Update existing test to use time.Now() instead of epoch constant 1
- DeserializeFr error messages now use "DeserializeFr: ..." prefix,
  consistent with SerializeFr's "SerializeFr: ..." panic style
- Add comment to MaxErrorBodySize explaining intentional parity with
  MaxExtraDataSize and their independent tunability
- Add comment to maxListResponseSize documenting capacity (~10K keys)
  and failure mode (JSON parse error, not silent truncation)
- Call util.ValidateAppID in both handleSecretsRequest and handleAppSign
  immediately after request parsing, before allowlist/logging/on-chain
  lookups — prevents log injection via control chars and rejects invalid
  app IDs at the HTTP boundary rather than deep in the crypto layer
- Add testSecretsEndpointPastAttestationTime covering the past-age guard
  in /secrets (symmetric with the existing /app/sign test)
- Add round-trip value assertions to DeserializeFr tests — valid cases
  now verify the deserialized element equals the original, catching
  incorrect conversions that would otherwise silently pass
…its, add boundary tests

- Fix bug: makeJSONRPCRequest now uses maxListResponseSize (1MB) instead
  of maxSignResponseSize (64KB), preventing truncation when listing keys
  via JSON-RPC on signers with many keys
- Remove stale //nolint:staticcheck from makeHttpRequest (leftover from
  ioutil.ReadAll migration)
- Extract validateAttestationTime() helper to eliminate duplicated 15-line
  validation block between handleSecretsRequest and handleAppSign
- Add TestValidateAppID_Boundaries: deterministic table test covering
  exact length boundaries (5, 255, 256) and invalid character classes
  (space, newline, slash)
- Add "max valid value (modulus - 1) accepted" test case to
  TestDeserializeFr, completing boundary-value analysis
…strengthen tests

- Change validateAttestationTime to return error (not string) for
  idiomatic Go — enables errors.Is/errors.As if classification is
  needed later
- Add explicit early-return for attestationTime==0 with comment
  documenting it as "use current key version" sentinel, removing the
  implicit > 0 guard that was hard to reason about
- Strengthen "recent attestation time accepted" test to assert the
  specific 500 (missing key share) rather than weak NotEqual(400)
- Add "zero attestation time bypasses time check" test case to
  document and guard the sentinel behavior
- Add TODO on makeJSONRPCRequest noting the 1MB limit applies
  uniformly across signing and listing JSON-RPC calls
…mits, fence-post tests

- Wrap all json.NewDecoder(resp.Body) calls in kmsClient with
  io.LimitReader(resp.Body, maxSuccessBodySize) to prevent OOM from
  malicious operator responses on success paths (GetMasterPublicKey,
  CollectPartialSignatures, requestSecretsFromKMS, collectPartialSigs)
- Parameterize makeJSONRPCRequest with maxBytes: signing calls now use
  maxSignResponseSize (64KB), listing calls use maxListResponseSize (1MB),
  eliminating the asymmetry with the REST API signing path
- Add fence-post attestation time tests: exactly-at-boundary (accepted)
  and one-second-past-boundary (rejected) for both future (+300s) and
  past (-86400s) limits
…ppID minimum

- Fix ReloadKeys and ListAllPublicKeys passing context.Background()
  instead of caller's ctx — requests are now cancellable as intended
- Add client-side util.ValidateAppID to Encrypt, Decrypt,
  CollectPartialSignatures, EncryptForApp, DecryptForApp for early
  feedback before network round-trips
- Remove 5-character AppID minimum (breaking change risk for existing
  short IDs like "kms", "app"); the character whitelist [a-zA-Z0-9._-]
  provides injection protection regardless of length
- Add explicit unicode rejection test case ("app中") to document
  ASCII-only whitelist intent
- Simplify fuzz test invariant: remove isEmpty (subsumed by badChars
  check on empty string via regex non-match)
- Fix test comment: 500 is due to no key version matching historical
  attestation time, not "missing key share"
- Fix maxListResponseSize comment: ~5,000 G2 keys or ~10,000 G1 keys
#85)

Closes RND-560

- DeserializeFr now returns errors instead of silently returning zero on
malformed data, preventing corrupted shares from entering DKG/reshare
- ValidateAppID enforces max length (255) and character whitelist
[a-zA-Z0-9._-] to prevent injection via crafted app IDs
- AttestationTime rejects timestamps >5min in the future to prevent
abuse of historical key version lookups
- All unbounded io.ReadAll calls wrapped with io.LimitReader (64KB for
web3signer responses, 1MB for error bodies) to prevent OOM from
malicious peers
Master changed CollectPartialSignatures to return map[common.Address] and
HandleReceivedShare to accept common.Address directly. Merged with this
branch's security hardening (ValidateAppID, DeserializeFr error checking).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant