Skip to content

End-to-end multi-node cluster: Phase 1 LAN testnet #42

@jeremymanning

Description

@jeremymanning

Description

The whitepaper's staged release plan requires Phase 1: "3–5 Machine LAN Testnet" on physical machines (not nested VMs). This validates the core cluster formation, job execution, preemption, and failure recovery in a real network environment.

Requirements (from whitepaper Phase 1)

  • Physical machines — not nested VMs — on a real network
  • At least one ARM machine
  • Peer discovery succeeds without manual IP configuration (mDNS within 2 seconds)
  • A node failure mid-job is detected by missed heartbeat; job reschedules from checkpoint
  • Resource yield occurs within 1 second of simulated keyboard event
  • No cross-node data leakage
  • Kill conditions: any cross-node sandbox breach, host OOM, or data loss from simulated node failure

Success Criteria

  • 3+ physical machines form cluster via mDNS in < 5 seconds
  • R=3 job executes correctly across nodes
  • Node failure → job rescheduled from checkpoint → correct result
  • Preemption within 1 second of keyboard activity
  • Zero cross-node data leakage
  • No host residue after job completion
  • At least one ARM node participating
  • Evidence artifact published with all test results

Testing (Principle V)

This IS the test — it must run on real hardware:

  • 3-5 diverse physical machines (x86 + ARM)
  • Real LAN (not virtual network)
  • Real workloads (not toy assertions)
  • Deliberate failure injection (kill node, disconnect cable)
  • Keyboard event → measure preemption latency
  • Inspect each host after tests for residual files/processes

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions