Distributed mesh LLM: ensemble-of-experts inference engine

## Description

This is the largest remaining feature — the distributed mesh LLM described in the whitepaper and issue #27. The mesh LLM is an inter-model Mixture-of-Experts system where GPU donor nodes each run a small language model, a distributed router selects K-of-N experts per token, and the system self-prompts to improve the cluster.

**This issue supersedes #27** and provides the detailed implementation breakdown.

### Architecture (from whitepaper)

- Each GPU donor runs a complete small model (LLaMA-3-8B at 4-bit quantization, ~4-6GB VRAM)
- Distributed router selects K-of-N expert nodes per output token
- Each expert returns top-256 (token_id, logit) pairs (~1.5KB) — 99%+ bandwidth reduction
- Router aggregates sparse logit distributions to produce next token
- At K=4, 100ms latency: ~3.2 tokens/second (adequate for autonomous agents, not interactive chat)
- LLaMA-3 tokenizer standardized (128,256 tokens)

### Three uses (from #27)

1. **Resource**: continually growing/improving language model, free to everyone
2. **Self-improvement**: guides development and improvements of the network itself
3. **Security**: carries out regular security audits and spot checks

### Fractal scaling (from #27)

- Max intelligence: ALL nodes → one "super" LLM
- Intermediate: resources allocated by problem complexity
- Minimum: single modest-hardware node as simple model

### Components (from spec Phase 9, T111-T119)

1. **Router** (`src/agent/mesh_llm/router.rs`): K-of-N expert selection per token, LLaMA-3 tokenizer
2. **Expert node** (`src/agent/mesh_llm/expert.rs`): registration, health tracking, capacity reporting
3. **Aggregator** (`src/agent/mesh_llm/aggregator.rs`): sparse logit aggregation, weighted average, sampling
4. **Self-prompting loop** (`src/agent/mesh_llm/self_prompt.rs`): autonomous agent generating improvement tasks
5. **Agent subsetting** (`src/agent/mesh_llm/subset.rs`): independent parallel agent subsets for concurrent tasks
6. **Safety system** (`src/agent/mesh_llm/safety.rs`): action tier classification, governance kill switch
7. **gRPC service** (`proto/mesh_llm.proto`): RegisterExpert, GetRouterStatus, SubmitSelfTask, HaltMesh

### Action tiers (from whitepaper)

| Tier | Examples | Approval |
|-|-|-|
| Read-only | Analyze metrics, generate reports | None |
| Suggest | Draft config changes, governance motions | Human review |
| Sandbox-test | A/B experiment on 1% of traffic | Automated validation |
| Deploy-minor | Update non-critical config | 2-of-3 governance quorum |
| Deploy-major | Change scheduler algorithm | Full governance vote + 24h review |

### Phased rollout

| Phase | Nodes | Capability |
|-|-|-|
| 0-1 | 0-500 | Centralized model; read-only + suggest only |
| 2 | ~280-1,000 | Distributed ensemble; sandbox-test after 30-day stability |
| 3 | ~1,000 | 3-7 parallel domain streams; deploy-minor |
| 4 | ~5,000+ | 37+ parallel streams; deploy-major |

## Requirements

- Router model with K-of-N expert selection
- Sparse logit aggregation (top-256 logits per expert)
- Expert node registration and health monitoring
- Self-prompting autonomous agent loop (1-24 hour cycle)
- Action tier classification with safety enforcement
- Governance kill switch (cannot be overridden by mesh itself)
- gRPC service for mesh management
- Support for heterogeneous GPU hardware (different model sizes/fine-tunes, same tokenizer)
- Graceful degradation below 280 nodes (fall back to centralized model)

## Success Criteria

- [ ] Router selects K-of-N experts and dispatches in parallel
- [ ] Sparse logit aggregation produces coherent text
- [ ] Expert registration and health tracking functional
- [ ] Self-prompting loop generates actionable improvement tasks
- [ ] Action tier classification correctly gates operations
- [ ] Governance kill switch immediately halts all inference
- [ ] gRPC service exposes all management operations
- [ ] 3.2+ tokens/second at K=4, 100ms inter-node latency
- [ ] Integration test: multi-node token generation via sparse aggregation

## Testing (Principle V)

- Deploy 4+ GPU nodes with LLaMA-3-8B (4-bit) → verify token generation
- Measure tokens/second at various K values and latencies
- Test kill switch → verify immediate halt
- Test self-prompting loop → verify actionable output
- Test action tier escalation → verify governance gating
- Test with heterogeneous models (different sizes, same tokenizer)
- Test graceful degradation with fewer than 280 nodes
- Bandwidth measurement: verify <2KB per expert per token

## Notes

This is a **major undertaking** that should be broken into sub-tasks during planning. The phased rollout means Phase 0-1 (centralized model, read-only) can ship first, with distributed ensemble features enabled at each phase transition via governance vote.

References:
- Whitepaper: §Mesh LLM: Distributed Self-Improvement
- Issue #27: parallel_mesh_of_diffusers_whitepaper.pdf
- `research/09-mesh-llm.md`
- `research/10-prior-art-distributed-inference.md`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distributed mesh LLM: ensemble-of-experts inference engine #54

Description

Architecture (from whitepaper)

Three uses (from #27)

Fractal scaling (from #27)

Components (from spec Phase 9, T111-T119)

Action tiers (from whitepaper)

Phased rollout

Requirements

Success Criteria

Testing (Principle V)

Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Tier	Examples	Approval
Read-only	Analyze metrics, generate reports	None
Suggest	Draft config changes, governance motions	Human review
Sandbox-test	A/B experiment on 1% of traffic	Automated validation
Deploy-minor	Update non-critical config	2-of-3 governance quorum
Deploy-major	Change scheduler algorithm	Full governance vote + 24h review

Phase	Nodes	Capability
0-1	0-500	Centralized model; read-only + suggest only
2	~280-1,000	Distributed ensemble; sandbox-test after 30-day stability
3	~1,000	3-7 parallel domain streams; deploy-minor
4	~5,000+	37+ parallel streams; deploy-major

Distributed mesh LLM: ensemble-of-experts inference engine #54

Description

Description

Architecture (from whitepaper)

Three uses (from #27)

Fractal scaling (from #27)

Components (from spec Phase 9, T111-T119)

Action tiers (from whitepaper)

Phased rollout

Requirements

Success Criteria

Testing (Principle V)

Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions