Description
Per spec T144c and whitepaper Phase 2 requirement: "Over a 72-hour continuous run, 80% of submitted test jobs complete correctly with 30% simulated node churn."
Requirements
- Build configurable churn simulator: random node kill/rejoin at configurable rate
- Integrate with multi-node test harness
- Track job completion rates under churn
- Validate checkpoint/resume works across node failures
- Run for configurable duration (target: 72 hours for Phase 2)
- Report completion rate, data loss events, and recovery metrics
Success Criteria
Testing (Principle V)
- Run on 20+ node testbed with real hardware
- 30% churn rate → verify 80% job completion
- Verify zero data loss during churn
- Measure checkpoint/resume latency under various churn rates
Description
Per spec T144c and whitepaper Phase 2 requirement: "Over a 72-hour continuous run, 80% of submitted test jobs complete correctly with 30% simulated node churn."
Requirements
Success Criteria
Testing (Principle V)