Description
The scheduler module has structural definitions (job/task/workflow state machines, manifest validation, priority scoring) but the runtime broker logic is not wired for real multi-node operation:
- ClassAd-style bilateral matchmaking (task requirements ↔ agent capabilities)
- Lease issuance, renewal via heartbeat, and expiry handling
- Speculative execution and lineage tracking
- R=3 replica placement with disjoint-AS enforcement
- Checkpoint commit flow through data plane
- Regional broker election and failover
Requirements
- Broker matches tasks to agents based on capability profiles
- Leases issued with configurable TTL, renewed on heartbeat
- Expired leases trigger rescheduling
- R=3 replicas placed on disjoint autonomous systems
- Speculative execution for latency-sensitive tasks
- Checkpoint flow: sandbox → CID store → erasure coding → placement
Success Criteria
Testing (Principle V)
- Multi-node cluster → submit job → verify broker matches to capable node
- Kill executor mid-task → verify rescheduling from checkpoint
- Submit job requiring GPU → verify matched only to GPU nodes
- Verify R=3 placement uses disjoint nodes
Description
The scheduler module has structural definitions (job/task/workflow state machines, manifest validation, priority scoring) but the runtime broker logic is not wired for real multi-node operation:
Requirements
Success Criteria
cargo testpassesTesting (Principle V)