chore!: audit fixes #2764

tac0turtle · 2025-10-16T18:30:04Z

Overview

claude · 2025-10-16T18:30:21Z

Claude finished @tac0turtle's task —— View job

Code Review Complete

I've performed a comprehensive code review of the audit fixes PR. Below are my findings organized by the focus areas requested.

1. Code Quality & Architecture

✅ Strengths

Clean error handling pattern: The introduction of common.ErrOversizedItem as an unrecoverable error type enables graceful halts rather than live-lock situations (block/internal/common/errors.go:22)
Separation of concerns: The split between DA submission (da_submitter.go) and processing (submitter.go) is well-architected
Exponential backoff with clamping: reaper.go:100-131 implements a robust backoff strategy with failure tracking

⚠️ Concerns

1. Test Coverage is Critically Low (28%)

The Codecov report shows 144 lines of missing coverage with several files having 0-16% patch coverage:

block/internal/submitting/submitter.go: 0% (23 lines missing)
block/internal/submitting/da_submitter.go: 0% (12 lines missing)
block/internal/reaping/reaper.go: 14.28% (29 lines missing)
block/internal/executing/executor.go: 11.76% (15 lines missing)

Critical paths with no test coverage:

sendCriticalError() functions across multiple components
DA submission halting logic (submitter.go:162-170, 184-192)
Executor halt on permanent validation errors (executor.go:284-292, 302-309)
Retry logic with exponential backoff

Recommendation: Add unit tests for error handling paths, especially:

Mock tests for ErrOversizedItem triggering halt behavior
Retry backoff validation tests
Critical error channel behavior tests

2. Context Handling in da_submitter.go:401-402

submitCtx, cancel := context.WithTimeout(ctx, submissionTimeout)
defer cancel()

The defer cancel() is placed immediately after creation, but the function continues in a retry loop. If the loop runs long, this could lead to resource leaks. Consider moving the cancel to execute after the submission result is processed:

submitCtx, cancel := context.WithTimeout(ctx, submissionTimeout)
res := types.SubmitWithHelpers(submitCtx, s.da, s.logger, marshaled, rs.GasPrice, namespace, options)
cancel() // Call immediately after submission completes

3. Magic Constants Need Documentation

defaultMaxBlobSize = 1.5 * 1024 * 1024 appears in both executing/executor.go:27 and cmd/run_node.go:29 and submitting/da_submitter.go:28. Consider centralizing this in a config constant.
MaxRetriesBeforeHalt referenced in submitter.go:272-274 but defined elsewhere—not immediately visible in the diff.

2. Error Handling & Resilience

✅ Strengths

1. Robust Halt Mechanisms for Unrecoverable Errors

The PR introduces proper halt logic for critical failures:

ErrOversizedItem in submitter.go:162-170, 184-192 triggers immediate shutdown
Permanent block validation errors halt executor (executor.go:284-292, 302-309)
SetFinal failures with retry exhaustion trigger controlled shutdown (submitter.go:243-248)

2. Improved Retry Logic with Exponential Backoff

reaper.go:100-131 implements capped exponential backoff with failure counter
da_submitter.go includes sophisticated retry policy with gas price adjustment
Proper use of context cancellation checks throughout

3. Critical Error Propagation

The errorCh chan<- error pattern in executor.go:60 and submitter.go:57 enables components to signal critical failures to the node orchestrator:

func (s *Submitter) sendCriticalError(err error) {
    if s.errorCh != nil {
        select {
        case s.errorCh <- err:
        default:
            // Channel full, error already reported
        }
    }
}

⚠️ Concerns

1. Silent Failure in Reaper (reaper.go:197-200)

if err := r.seenStore.Put(r.ctx, key, []byte{1}); err != nil {
    // Log but don't fail on persistence errors
    r.logger.Error().Err(err).Str("txHash", txHash).Msg("failed to persist seen tx")
}

This could lead to transaction re-submission if the process restarts before the seenStore persists. While the comment acknowledges this, consider:

Adding a metric to track persistence failures
Implementing a background sync mechanism
Or clarifying in documentation that restart may cause duplicate submissions

2. Inconsistent Error Wrapping

Some functions use fmt.Errorf("failed to...: %w", err) consistently, but others don't always include context:

da_submitter.go:359: return err - loses context from marshalItems
Consider standardized error wrapping across all return paths

3. Context Deadline Handling in P2P Handler

p2p_handler.go:59-68 uses 500ms timeout for GetByHeight:

timeoutCtx, cancel := context.WithTimeout(ctx, 500*time.Millisecond)
header, err := h.headerStore.GetByHeight(timeoutCtx, height)
cancel()

The 500ms constant should be configurable or documented why this value was chosen
Returning immediately on timeout prevents syncing subsequent heights—this could cause sync delays if network is slow

3. Security

✅ Strengths

1. DataHash Validation to Prevent Data Tampering

Critical security additions in multiple locations to prevent pairing legitimate headers with malicious data:

In p2p_handler.go:98-108:

// CRITICAL: Validate that data matches the header's DataHash commitment
// This prevents accepting legitimate headers paired with tampered data from different blocks
dataCommitment := data.DACommitment()
if !bytes.Equal(header.DataHash[:], dataCommitment[:]) {
    h.logger.Warn().
        Uint64("height", height).
        Str("header_data_hash", fmt.Sprintf("%x", header.DataHash)).
        Str("actual_data_hash", fmt.Sprintf("%x", dataCommitment)).
        Msg("DataHash mismatch: header and data do not match from P2P, discarding")
    continue
}

Also enforced in:

p2p_handler.go:181-191 (ProcessDataRange)
da_retriever.go:205-215 (processBlobs)

This is an excellent defense against block replacement attacks.

2. Enhanced Passphrase Security

In pkg/signer/file/local.go:

Uses Argon2id for key derivation (line 417-423) instead of naive key stretching
Implements zeroBytes() to wipe sensitive data from memory (line 442-444)
defer zeroBytes(passphrase) throughout ensures passphrases don't linger
File permissions set to 0600 for key files (local.go:232, 307)

3. Timestamp Validation

types/header.go:136-142 and types/data.go:141-146 validate timestamps aren't too far in the future (1 minute clock drift allowance), preventing timestamp manipulation attacks.

⚠️ Concerns

1. Proposer Address Validation

While proposer validation exists (p2p_handler.go:215-221, da_retriever.go:305-310), the comparison uses:

if string(proposerAddr) != string(r.genesis.ProposerAddress)

This converts byte slices to strings for comparison. While functionally correct, using bytes.Equal() would be more idiomatic and avoid unnecessary allocations. This is done correctly in other places (p2p_handler.go:216).

2. Passphrase File Handling in keys.go

keys.go:59 uses strings.TrimSpace(string(passphraseBytes)) which could leave the passphrase in memory. Consider:

passphrase := strings.TrimSpace(string(passphraseBytes))
defer zeroBytes([]byte(passphrase)) // Wipe after use

However, this won't work perfectly since Go strings are immutable. The current approach is acceptable but document this limitation.

3. No Rate Limiting on P2P Gossip

p2p_handler.go processes headers/data from peers without apparent rate limiting. A malicious peer could flood with invalid blocks. Consider adding:

Per-peer rate limits
Reputation scoring for peers sending invalid data
Exponential backoff for peers triggering validation errors

4. Performance & Resource Efficiency

✅ Strengths

1. Concurrent Marshaling with Semaphore

da_submitter.go:526-551 uses a semaphore pattern to limit concurrency:

sem := make(chan struct{}, 32)

This prevents unbounded goroutine spawning while maintaining parallelism.

2. Efficient Batch Size Limiting

da_submitter.go:488-511 limitBatchBySize() implements linear scan to fit items within blob limits, avoiding expensive binary search for this use case.

3. Lazy Mode Optimization

executor.go:262-268, 276-280 implements lazy block production—only creating blocks when transactions are available, reducing unnecessary work.

⚠️ Concerns

1. Unbounded Retry Loops

da_submitter.go:390-486 has a retry loop controlled by pol.MaxAttempts, but if this is set high (or unbounded), and context doesn't cancel, this could run indefinitely.

Verify MaxSubmitAttempts has a reasonable default cap in config validation.

2. Potential Goroutine Leak in Submitter

submitter.go:158-174, 180-196 spawn goroutines with TryLock():

if s.headerSubmissionMtx.TryLock() {
    go func() {
        defer s.headerSubmissionMtx.Unlock()
        // ...
    }()
}

If a goroutine panics before defer executes, the mutex remains locked. While Go's defer is panic-safe, consider adding panic recovery:

go func() {
    defer func() {
        if r := recover(); r != nil {
            s.logger.Error().Interface("panic", r).Msg("panic in header submission")
        }
        s.headerSubmissionMtx.Unlock()
    }()
    // ...
}()

3. In-Memory Cache Growth

da_retriever.go:57-58:

pendingHeaders map[uint64]*types.SignedHeader
pendingData    map[uint64]*types.Data

These are unbounded maps. If headers and data arrive out of sync for extended periods, memory could grow. Consider:

Adding a max pending items limit
Implementing TTL-based eviction
Monitoring via metrics

5. Testing & Reliability

⚠️ Critical Issues

28% Patch Coverage is Unacceptable for Security-Critical Code

Given this PR is titled "audit fixes," the lack of tests for new error handling paths is concerning:

Missing Test Coverage for Critical Paths:

Halt Logic Testing (Priority: HIGH)
- Test that ErrOversizedItem triggers proper shutdown
- Test executor halt on permanent validation errors
- Test submitter halt on SetFinal exhaustion
- Verify error channels properly propagate failures
Retry and Backoff Logic (Priority: HIGH)
- Test reaper exponential backoff (reaper.go:100-131)
- Test DA submission retry with gas price adjustment
- Test context cancellation during retries
Security Validations (Priority: HIGH)
- Test DataHash mismatch detection in P2P handler
- Test DataHash mismatch detection in DA retriever
- Test timestamp validation (too far in future)
- Test proposer address validation
Edge Cases (Priority: MEDIUM)
- Test empty transaction handling
- Test single oversized item error path
- Test partial batch submission success
- Test concurrent access to seen store in reaper

Specific Test Recommendations:

// Example test structure for reaper.go
func TestReaperExponentialBackoff(t *testing.T) {
    // Test that consecutive failures increase backoff exponentially
    // Test that backoff caps at MaxBackoffInterval
    // Test that success resets backoff
}

func TestReaperSeenStoreFallback(t *testing.T) {
    // Test behavior when seenStore.Put fails
    // Verify logging occurs
    // Verify transactions still submitted despite persistence failure
}

func TestSubmitterCriticalErrorHalt(t *testing.T) {
    // Test ErrOversizedItem triggers halt
    // Verify errorCh receives error
    // Verify context cancellation occurs
}

Test Quality Concerns in Existing Tests:

Looking at the changed test files:

reaper_test.go: Changes mock expectations but doesn't add tests for new backoff logic
submitter_test.go: Only changes mock call counts, no new test cases
E2E tests updated but don't specifically validate audit fix behaviors

6. Documentation & Developer Experience

✅ Strengths

1. Excellent inline comments for security-critical code:

p2p_handler.go:98-100: Clear explanation of DataHash validation
submitter.go:327-334: Documents sequencer height to DA height mapping purpose
keys.go:32-37: Strong warnings about key export security

2. Structured error messages with context:

s.logger.Error().Err(err).
    Msg("CRITICAL: Header exceeds DA blob size limit - halting to prevent live lock")

⚠️ Concerns

1. Missing Package-Level Documentation

Files like da_submitter.go, submitter.go, reaper.go lack package-level comments explaining:

Overall responsibility
Lifecycle management
Threading model
Error handling philosophy

2. Undocumented Configuration Changes

No documentation on what MaxPendingHeadersAndData does (executor.go:335-346)
LazyMode and LazyBlockInterval configuration lack clear usage docs
DA retry policy parameters need explanation in config structs

3. Migration Guide Missing

For the passphrase file changes (keys.go, run_node.go), operators need:

Instructions on creating passphrase files
Security best practices (file permissions, storage)
Migration path from old key storage (if applicable)

Summary & Recommendations

Critical Issues (Fix Before Merge)

Add comprehensive tests for error handling paths to bring coverage above 60%
Fix context handling in da_submitter.go retry loop (cancel after each submission)
Add bounded size limits to pendingHeaders/pendingData maps in da_retriever.go
Add panic recovery to goroutines in submitter.go

High Priority Improvements

Add metrics for seenStore persistence failures in reaper
Centralize defaultMaxBlobSize constant
Make P2P timeout (500ms) configurable
Add rate limiting for P2P gossip
Document configuration parameters

Architectural Strengths

Excellent halt mechanisms for unrecoverable errors
Strong data integrity validation (DataHash checks)
Improved cryptographic key management
Well-structured error propagation via error channels

Overall Assessment

The audit fixes demonstrate strong architectural improvements for error resilience and security, particularly:

Preventing live-lock scenarios with proper halts
Defending against data tampering attacks
Enhanced key security with Argon2

However, the critically low test coverage (28%) is a major concern for production readiness. The error handling paths introduced—which are the primary purpose of this audit fix PR—are largely untested.

Recommendation: Do not merge until test coverage reaches at least 60% with specific focus on the new error handling, retry logic, and security validation paths.

github-actions · 2025-10-16T18:30:32Z

The latest Buf updates on your PR. Results from workflow CI and Release / buf-check (pull_request).

Build	Format	Lint	Breaking	Updated (UTC)
`✅ passed`	`⏩ skipped`	`✅ passed`	`✅ passed`	Oct 20, 2025, 2:56 PM

codecov · 2025-10-20T12:56:41Z

Codecov Report

❌ Patch coverage is 27.86070% with 145 lines in your changes missing coverage. Please review.
✅ Project coverage is 60.46%. Comparing base (0dba4e7) to head (a717df6).

Files with missing lines	Patch %	Lines
block/internal/reaping/reaper.go	13.88%	30 Missing and 1 partial ⚠️
block/internal/submitting/submitter.go	0.00%	23 Missing ⚠️
block/internal/executing/executor.go	11.76%	15 Missing ⚠️
block/internal/syncing/p2p_handler.go	16.66%	14 Missing and 1 partial ⚠️
types/data.go	16.66%	10 Missing and 5 partials ⚠️
block/internal/submitting/da_submitter.go	0.00%	12 Missing ⚠️
pkg/cmd/keys.go	61.53%	8 Missing and 2 partials ⚠️
pkg/cmd/run_node.go	33.33%	7 Missing and 3 partials ⚠️
block/internal/syncing/da_retriever.go	12.50%	6 Missing and 1 partial ⚠️
types/header.go	22.22%	5 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2764      +/-   ##
==========================================
- Coverage   61.26%   60.46%   -0.81%     
==========================================
  Files          81       81              
  Lines        8589     8744     +155     
==========================================
+ Hits         5262     5287      +25     
- Misses       2830     2943     +113     
- Partials      497      514      +17

Flag	Coverage Δ
combined	`60.46% <27.86%> (-0.81%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

tac0turtle added 3 commits October 16, 2025 19:42

audit fixes 1

9898d55

audit fixes 2

644dda6

remove secret phrase and cache changes

a8efcd2

github-project-automation bot added this to Evolve Oct 16, 2025

github-actions bot assigned tac0turtle Oct 16, 2025

tac0turtle added 5 commits October 16, 2025 21:10

remove fatal calls

e8d0a68

remove old flag

1270a71

fix tests

012ec4a

Merge branch 'main' into marko/audit

4cef406

fix tests

aafe6c5

tac0turtle changed the title ~~chore: audit fixes~~ chore!: audit fixes Oct 20, 2025

tac0turtle added 2 commits October 20, 2025 15:52

fix e2e tests

b0bf254

fix lint and reaper

4116254

tac0turtle marked this pull request as ready for review October 20, 2025 14:53

comment fix

a717df6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore!: audit fixes #2764

chore!: audit fixes #2764

Uh oh!

tac0turtle commented Oct 16, 2025

Uh oh!

claude bot commented Oct 16, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Oct 16, 2025 •

edited

Loading

Uh oh!

codecov bot commented Oct 20, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

chore!: audit fixes #2764

Are you sure you want to change the base?

chore!: audit fixes #2764

Uh oh!

Conversation

tac0turtle commented Oct 16, 2025

Overview

Uh oh!

claude bot commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Complete

1. Code Quality & Architecture

✅ Strengths

⚠️ Concerns

2. Error Handling & Resilience

✅ Strengths

⚠️ Concerns

3. Security

✅ Strengths

⚠️ Concerns

4. Performance & Resource Efficiency

✅ Strengths

⚠️ Concerns

5. Testing & Reliability

⚠️ Critical Issues

6. Documentation & Developer Experience

✅ Strengths

⚠️ Concerns

Summary & Recommendations

Critical Issues (Fix Before Merge)

High Priority Improvements

Architectural Strengths

Overall Assessment

Uh oh!

github-actions bot commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude bot commented Oct 16, 2025 •

edited

Loading

github-actions bot commented Oct 16, 2025 •

edited

Loading

codecov bot commented Oct 20, 2025 •

edited

Loading