Single source of truth for all actionable work. ROADMAP.md = vision. This file = what to do next.
The codebase has coherence debt. Features are designed but not wired end-to-end. Trio mode exists in docs and config flags but never completed a real session. Compaction exists but crashes when it matters most. Context budgets exist but aren't enforced per role. The pattern: design β partial implement β patch holes β next feature.
New rule: every blocking item must have a verification step. "Done" means: ran a real session with local models, checked metrics, checked logs, confirmed the feature actually fired. Not "code compiles and tests pass."
When working with Claude Code on these items:
- One item at a time. Don't let it "also fix" adjacent things.
- Start with a failing test or reproduction. Show the broken state first.
- Verify with
metrics.jsonl+nanobot.logafter every change.- Read the existing code before writing. Half these bugs are "feature exists but isn't called."
- B3-B11 are green. I7/LCM E2E verified. L1 concept router 80% accurate (2x orchestrator). QW1-QW4 shipped. Blockers remaining: B5 (experiments), B12 (config debt β proposal needed). Next priority: B12 proposal β I9 (tiered routing) β I8 (SearXNG).
Anti-patterns (from 2026-02-23 review):
- Stop adding router parsing fallbacks β 4 parsers mask model quality. Use constrained decoding instead.
- Stop adding config fields without removing others β 90+ fields = untestable. Feature-gate experiments, don't add runtime toggles.
- Stop building Phase 2-3 code during Phase 0 β write designs in docs, not code in src/.
- Stop using
#![allow(dead_code)]at module level β apply per-item only.- Stop mixing concerns in TurnContext β decompose 28-field struct into sub-structs with explicit dependencies.
-
B11: Heartbeat as foundational liveness service β‘ β Priority. The current heartbeat is a glorified cron β shell commands + optional HEARTBEAT.md check. It needs to become the central liveness and health service that all modes (text/voice), channels (CLI/Telegram/WhatsApp), and configurations (local trio/cloud) depend on. Current state (verified 2026-02-21): Runs one hardcoded command (
qmd update -c sessions), LLM callback alwaysNone(never wired),#![allow(dead_code)]on the module. Zero awareness of providers, endpoints, channels, or self-health. Health probes needed:Probe What Frequency Action on failure Provider health Ping each configured provider ( /v1/modelsor equivalent)Every tick Mark provider unavailable, trigger fallback chain LCM compactor Ping compactionEndpoint.url+ verify model loadedEvery tick Set lcm.available = false, skip compaction gracefully, warn userTrio models Verify router/specialist models loaded in LM Studio Every tick (local only) Degrade to inline mode, log which role is missing Search backend Ping SearXNG / check Brave API key validity Every 5 ticks Disable web_searchtool, surface in/statusChannel liveness Telegram/WhatsApp/Email connection state Per-channel interval Reconnect with backoff, surface in /statusSelf-health Session size, memory pressure, disk space Every tick Trigger compaction, warn user, auto-rotate session Critical gap β LCM compaction has no pre-flight health check: When qwen3-0.6b is unreachable, Level 1+2 silently fail β Level 3 deterministic fallback fires (works, but lossy). No warning, no status surfaced. If the LLM hangs (not down, just slow), the compaction spawn blocks indefinitely and in_flightnever resets β blocking ALL future compaction for the session.Design principles: - Channel-agnostic β same health loop whether CLI, Telegram, or voice mode
- Mode-agnostic β works for local trio, cloud, and hybrid configurations
- Graceful degradation β failures disable features, never crash. Compactor down β LCM pauses. Router gone β deterministic routing. Provider down β queue + retry.
- Observable β
/statusREPL command shows all probe states. Status injectable into context (connects to N6). - No LLM required β health probes are HTTP pings and process checks, not agent tasks. Existing Layer 2 (HEARTBEAT.md β agent) stays as optional add-on. Implementation sketch:
HealthRegistryβ register probes at startup based on config (iflcm.enabled, register LCM probe; if trio, register trio probes; etc.)- Each probe:
name,check() β Result<(), String>,interval,on_failurecallback HeartbeatServiceruns the registry on each tick, stores probe states inSystemState/statuscommand reads probe states- Provider/router/compaction code checks probe state before attempting calls
- Timeout guard on compaction spawn (default 30s) β kill task, reset
in_flight, log error Compounds with: N6 (status injection), I8 (SearXNG health), I4 (multi-provider fallback), I7 (LCM compactor availability), I9 (tiered routing needs health-aware escalation). Ref:src/heartbeat/service.rs,src/agent/agent_loop.rs(compaction spawn),src/agent/system_state.rs
-
B3: Update default local trio β β Trio configured: Main
gemma-3n-e4b-it(server.rs:18), Routernvidia_orchestrator-8b, Specialistministral-3-8b-instruct-2512(both inconfig.jsontrio section + B10 auto-detect as fallback).TrioConfig::default()has empty strings but runtime always populated via explicit config or auto-detect. -
B4: Multi-model config schema β β Obsolete as scoped.
TrioConfigalready provides per-role model/port/ctx_tokens/temperature/endpoint for router and specialist. LM Studio JIT-loads models on demand β no need for nanobot to spawn separate llama-server instances. Thelocal.rlmslot became the specialist role;local.memoryusesmemory.modelconfig. -
B5: RLM model evaluation β Systematic experiments to find best RLM model per VRAM tier. Critical for "3 impossible things". See experiment plan below. Routing benchmarks started in
experiments/lcm-routing/(orchestrator_bench.py, test_bench.py). -
B8: Trio mode activation & role-scoped context β‘ β β All 5 steps complete. Metrics + circuit breaker (commit
0f80ad9). Auto-activation + auto-detect as B10 (commit3774742). E2E verified: local session βdelegation_mode=Trioin log β Main emits natural language β Router preflight intercepts β Specialist executes tool. Ref:src/agent/router.rs
- Phase 1: parking_lot::Mutex (18 files), LazyLock regexes (20+ sites)
- Phase 2: Credential redaction (custom Debug impls), removed dead
needs_user_continuation - Phase 3: ToolError retryability +
execute_with_retry()with exponential backoff - Phase 4: File splits β
tool_runner.rstests extracted,cli.rsβsrc/cli/module (6 files),agent_loop.rsβ 3 files via#[path],commands.rsβ 7 files via#[path]. All 4117 tests pass. - Phase 5: Shell injection adversarial tests (15 evasion patterns), LM Studio
--jinjaprobe, MLX in-process provider wiring βMlxProviderconnected to all 4 entry points (REPL, gateway, voice, rebuild). Config:inferenceEngine: "mlx". Perplexity gate auto-enabled when MLX active. 4 E2E tests (chat, perplexity, train, closed-loop). 4157 tests pass.
-
[~] B12: Configuration debt β eliminate hardcoded magic values (Approach E implemented: Phases 1-3 complete, Phase 4 deferred)
- β
Phase 1: ModelCapabilities registry (commit
22107ad) β Eliminated 7+ model name sniffing sites acrosstool_runner.rs,compaction.rs,agent_core.rs,thread_repair.rs,subagent.rs. Capability flags (supports_tool_calling,supports_thinking,max_reliable_output, etc.) replacemodel.contains("nanbeige")string matches. Config-overridable viamodelCapabilitiesmap. File:src/agent/model_capabilities.rs(348 lines, 24 tests). - β
Phase 2-3: Module-local configs (commit
22107ad) β CircuitBreaker, Subagent, Session, Compaction, and Hygiene tuning moved from hardcodedconstvalues to schema-backed structs with#[serde(default)]. File:src/config/schema.rs(+215 lines). Result: ~50 hardcoded values are now configurable without touching module source. Existingconfig.jsonfiles unchanged (all new fields use#[serde(default)]). - β Phase 4: Deferred β Lower priority; items below are I/O-coupled domain knowledge where config-driven tuning adds less value than Phases 1-3:
- HeartbeatConfig (health probe intervals and failure thresholds)
- PipelineTuning (step iteration limits for I9 tiered routing)
- ProvenanceConfig (audit log extension fields)
- Rationale: Phases 1-3 addressed 83% of the abstraction debt (model sniffing + numeric knobs). Phase 4 items can be added incrementally as those features mature.
- Status as of: 2026-02-21, commit
22107ad - Design: docs/plans/b12-config-debt-elimination.md (implemented per Approach E)
- Compounds with: B11 (heartbeat needs config-driven probes β Phase 4), I9 (tiered routing needs configurable thresholds β Phase 4), N1 (hardware auto-detection feeds profile selection).
- Ref:
src/agent/model_capabilities.rs,src/config/schema.rs
- β
Phase 1: ModelCapabilities registry (commit
-
I0: Trio pipeline actions β Router can only emit ONE action per turn. Multi-step tasks (research + synthesize) fail because the router picks one tool and stops. Need pipeline-as-first-class router output + shared scratchpad between trio roles. Superseded by I9 for the routing layer β I0 remains relevant for the execution/scratchpad side. Ref:
thoughts/shared/plans/2026-02-20-trio-pipeline-architecture.md -
I9: Tiered routing with orchestrator escalation β‘ β Priority. The L1 concept router (all-MiniLM-L6-v2, centroid classification) proved 80% accurate at 5ms/0 VRAM vs orchestrator-8b's 43% at 637ms/6GB. But rigid template matching (L2) caps at ~7 predefined multi-step patterns. Real workflows need 10-100+ steps with dynamic re-planning, conditional branching, and error recovery β templates can't express this. Architecture: Three-tier routing with orchestrator escalation:
Tier Engine Latency When Traffic % T1: Concept Router Embedding centroid (CPU) ~5ms Unambiguous single-action queries ~70% T2: Template Expander Embedding β template match ~10ms Known multi-step patterns (L2 templates) ~15% T3: Orchestrator Reasoning LLM (nemotron-orchestrator-8b or nanbeige) ~600ms Complex/novel/failing workflows, low-confidence T1 ~15% Escalation triggers (T1βT3): (a) Cosine similarity margin <0.4 (low confidence). (b) No template match for detected multi-step intent. (c) Step failure mid-execution (re-plan). (d) User query references prior context (pragmatic ambiguity). T3 orchestrator responsibilities: Dynamic step decomposition (not limited to templates). Mid-workflow re-planning when steps fail or return unexpected results. Conditional branching (if build fails β fix errors β retry). State tracking across 10-100+ steps via scratchpad. Budget/token cost monitoring. Key insight from L1 experiments: The 6 concept router failures were all pragmatic (hedging, vagueness, context-dependent) β exactly the cases where an LLM reasoning model adds value. The concept router handles the ~70% easy cases at zero cost; the orchestrator handles the ~15% hard cases where reasoning matters. This is cheaper than running the orchestrator on 100% of traffic. Implementation path: 1) Wire concept router into router_preflight()as fast path. 2) Add confidence threshold β below 0.4, escalate to LLM orchestrator. 3) Add step executor that consumesVec<RouterDecision>from either T2 templates or T3 orchestrator. 4) Add shared scratchpad for multi-step state. 5) Add re-planning hook: when a step fails, send context + failure to T3 for new plan.Compounds with: I0 (pipeline execution), L1/L2 experiments ( experiments/lcm-routing/), B5 (model evaluation).Ref: experiments/lcm-routing/results/L1_analysis.md,experiments/lcm-routing/multi_step_templates.json -
I1: Local role/protocol crashes β Fix
systemrole crash, alternation crash, orphan tool messages. Thread repair pipeline exists but needs hardening. Ref:docs/plans/local-trio-strategy-2026-02-18.md,docs/plans/local-model-reliability-tdd.mdQW1 (2026-02-23): Tool guard per-tool limits shipped β read tools get limit 2 (tightened from 5, 2026-02-28), write tools keep default 3. Multi-file read blockage resolved. -
I2: Non-blocking compaction β β Absorbed into I7 (matryoshka compaction). Per-cluster parallel summarization replaces the three-tier approach.
-
I3: Context Gate β Replace dumb char-limit truncation with
ContentGate: pass raw / structural briefing / drill-down. Zero agent-facing API changes. Partial progress:admit_with_specialist()incontext_gate.rs(commit3580c38) provides the structural briefing path via specialist LLM. Ref:docs/plans/context-gate.md,docs/plans/context-protocol.md -
I4: Multi-provider refactor β Break up
SwappableCoregod struct, extensible provider registry, fallback chains. Ref:docs/plans/multi-provider-refactor.md,docs/plans/nanobot_architecture_review.md -
I5: Dynamic model router β Prompt complexity scoring (light/standard/heavy), auto-downgrade simple messages to cheaper models. Ref:
docs/plans/dynamic-model-router.md -
I6: Context Hygiene Hooks β β Implemented as
anti_drift.rs(851 lines, 25 tests). PreCompletion: pollution scoring, turn eviction, repetitive-attempt collapse, format anchor re-injection. PostCompletion: thinking tag stripping, babble collapse. Ref: commit56dedce,src/agent/anti_drift.rs -
I8: SearXNG search backend β Replace Brave Search (API key required, rate-limited) with SearXNG (free, local, unlimited). 3 touchpoints: 1)
schema.rs: addprovider: String("brave"|"searxng", default "searxng") +searxng_url: String(default "http://localhost:8888") toWebSearchConfig. 2)web.rs: add SearXNG path inexecute()βGET {url}/search?q={query}&format=json, parseresults[].title/url/content. No API key needed. 3)registry.rs+tool_wiring.rs: extendToolConfigto carrysearch_provider+searxng_url. Fallback: if SearXNG unreachable and Brave key set, use Brave. If neither, helpful error: "Rundocker run -d -p 8888:8080 searxng/searxngor set a Brave API key." Onboard integration:cmd_onboard()prints Docker one-liner. Optionalnanobot onboard --searchauto-pulls+starts+configures SearXNG. SearXNG container tested working 2026-02-20:docker run -d -p 8888:8080 --name searxng searxng/searxng+ enable JSON format in settings.yml. QW2 (2026-02-23): Recovery hints added to web_search errors (status-specific: 401/422/429/5xx). -
I7: Lossless Context Management (LCM) (supersedes matryoshka design) β β DAG-based lossless compaction per Ehrlich & Blackman (2026). Immutable store (session JSONL) + Summary DAG with pointers to originals + active context assembly. Implemented:
src/agent/lcm.rs(~1100 lines, 17 tests):SummaryDag,LcmEngine(ingest/compact/expand), three-level escalation (preserve_details β bullet_points β deterministic truncate), dual-threshold control loop (Ο_soft 50% / Ο_hard 85%).LcmSchemaConfigin config schema. Wired intoagent_loop.rs.lcm_expandtool registered when LCM enabled. E2E verified (2026-02-21): Real E2E test against nemotron-nano-12b on LM Studio: 12 messages throughprocess_directβ compaction triggered at Ο_soft β Level 2 summary created β DAG node with lossless source IDs βexpand()retrieves originals. 6 invariants checked (store lossless, active shrinks, DAG populated, source IDs resolve, Summary entries present, expand works). Benchmark across 4 models: qwen3-0.6b best compressor (83.2% compression, 3.4s), nemotron-nano-12b fastest (81.4%, 2.8s). Bigger models (gemma-3n-e4b 54.6%, qwen3-1.7b 72.8%) produce more verbose summaries β worse for compaction. Remaining: Performance profiling under sustained load. Verifylcm_expandactually invoked by LLM during conversation. Persist DAG across session rotations. Compounds with: I6 (anti-drift cleans within summaries). B9 (pre-flight truncation as safety net). I3 (ContentGate decides raw vs summary). Ref:src/agent/lcm.rs,src/config/schema.rs:1219 -
[~] I10:
/clearand/newREPL commands β Manual context reset for local models with small context windows (4K-8K). (Gateway/clearshipped 2026-02-23 β clears working memory + history viagateway_commands.rs.) Essential for trio mode where the Main model accumulates full conversation history while Router and Specialist are already stateless (ephemeral per-turn message arrays)./clearβ reset working context, keep session:- If LCM enabled: compact all
ctx.messagesinto a single summary node. Model starts "fresh" but canlcm_expandto retrieve originals. - If LCM disabled: truncate messages, carry forward last 2 as context seed.
- Reset working memory section of CONTEXT.md.
- Emit
--- context cleared (N messages compacted) ---in REPL. - Session JSONL continues β no data lost from the audit trail.
/new [name]β fresh session entirely: - Start a new session file (new JSONL, clean slate).
- Optional
nameparameter, otherwise auto-generate. - Existing session stays on disk, accessible via
/sessions. Scope: Only Main model context needs reset β Router and Specialist are already ephemeral (verified inrouter.rs:tool_messages/router_messagesbuilt fresh each turn,specialist_messagesbuilt fresh each dispatch). No per-role action needed. Compounds with: I7/LCM (compaction on clear), B11 (health status after clear), N6 (status injection reset). Ref:src/agent/router.rs(trio context isolation),src/agent/agent_loop.rs(LCM wiring),src/session/manager.rs(session lifecycle)
- If LCM enabled: compact all
-
I11: Extract iteration state machine from agent_loop.rs β agent_loop.rs is ~4,455 lines. The 5-phase state machine (Preparing/PreCall/Calling/Processing/Executing) is untestable in isolation. Create
src/agent/iteration.rswithIterationPhase,IterationOutcome,StepResultenums and 5step_*free functions taking(&TurnContext, &AgentLoopShared). Gateway runner andprocess_message()stay in agent_loop.rs. Ref: REVIEW-2026-02-23.md Β§1.2 -
I12: Feature-gate dead Phase 2-3 code β
lora_bridge.rs(737 lines),step_voter.rs,confidence_gate.rs,knowledge_store.rs(681 lines) are dead code. Move behind#[cfg(feature = "experimental")]. Define Phase 0 completion criteria: trio >95% accuracy, zero tool failures, session recall works,/localhot-swap 100%. Ref: REVIEW-2026-02-23.md Β§2.3 -
I13: Integration test harness β Zero integration tests. Create harness running
process_messagewith mock LLM provider. Test: trio end-to-end (router β specialist β main), context overflow β compaction β persistence, tool failure β recovery hint. Ref: REVIEW-2026-02-23.md Β§2.4 -
I14: Trio decision chain live visibility β Trio mode has ~3s silence (router 900ms + specialist 800ms + main 1200ms). Emit status through
text_delta_txin dim text:[router] analyzing...,[specialist] processing...,[main] synthesizing.... Ref: REVIEW-2026-02-23.md Β§3.1,src/agent/router.rs,agent_loop.rs
- N1: Auto hardware detection β Detect VRAM/RAM/CPU, auto-assign tier (Potato/Sweet/Power/Beast), select quant level.
nanobot doctorcommand. - N2:
nanobot setupβ Interactive first-run: detect hardware, download models, generate optimal config. - N3: Streaming rewrite β Incremental markdown renderer, line-by-line syntax highlighting, no full-response rerender. Ref:
docs/plans/streaming-rewrite.md - N4: Full-duplex REPL β ESC+ESC instant cancel, backtick injection prompt, priority message channel. Ref:
docs/plans/full-duplex-repl.md - [~] N5: Thinking toggle β
/thinkcommand works in REPL + gateway channels. Remaining: Ctrl+T toggle for REPL. Ref:docs/plans/thinking-toggle.md - N6: Status injection β Auto-inject background worker status into context each turn. (Spacebot idea)
- N8: Narration stress test β Validate narration compliance across local models. Ref:
docs/plans/narration-stress-test.md
- P1.1: File-backed volumes β
MappedVolumestruct with mmap + line-offset index - P1.2: Chunk index β 4K-char chunks, simhash signatures,
ctx_search - P1.3: Semantic index β Optional e5-small embeddings, vector similarity
- P1.4: Proof β Needle-in-haystack at 1M tokens, 95%+ recall, <60s
- P2.0: Calibration run β Measure per-step
pon 1K-10K steps using winning RLM model from E3 - P2.1: MAKER voting β
first_to_ahead_by_k, red-flagging, output token cap - P2.2: MAD decomposition β Atomic step definitions per domain
- P2.3: Process tree β Persistent execution tree, checkpoint/resume
- P2.4: RLM completion β
ctx_summarize, recursive depth, smart short-circuit. Ref:docs/plans/rlm-completion-proposal.md,docs/plans/adaptive_rlm_design.md - P2.5: Swarm architecture β Workers spawn Workers, budget propagation. Ref:
docs/plans/swarm-architecture.md - P2.6: Event log pipeline β Append-only JSONL, pipeline runner. Ref:
docs/plans/event-log-pipeline.md - P2.7: Proof β Towers of Hanoi 20 disks, 1M+ steps, zero errors, local only
- P3.1: Trace logger β Structured JSONL per process
- P3.2: Skill crystallization β Auto-create skills from repeated successes
- P3.3: Budget calibration β Per-task-type stats in SQLite
- P3.4: LoRA distillation β Export traces β Zero pipeline β hot-swap LoRA
Reduce assumptions one at a time. No coding until we know what works.
| Role | Model | Size | Why |
|---|---|---|---|
| Main | gemma-3n-e4b-it | ~4B effective | Fast, good chat, small footprint |
| Orchestrator | nvidia_orchestrator-8b | 8B | 10/10 routing accuracy (proven in experiments/) |
| Specialist | ministral-3-8b-instruct-2512 | 8B | Strong tool-calling, instruction following |
- Nemotron Orchestrator: 10/10 routing (vs NanBeige 6/10). Purpose-built. Proven.
- NanBeige 3B: Good with
<think>\n</think>\n\nprefill, but weak as router. - Main + Orchestrator work well together in practice.
- Sequential self-routing would add latency vs parallel separation. Keep roles split.
- Router single-action bottleneck:
request_strict_router_decision()returns ONERouterDecision. Multi-step tasks (fetch 2 URLs + synthesize) cannot be expressed. The router picks one tool and the pipeline stalls. See I9 for solution. Deterministic fallback too narrow: Fixed in B3.1.router_fallback.rsnow has 9 patterns (research+URL, plain URL, HN, latest news, read, write, edit, list, search, exec) + default ask_user. All guarded byhas_tool().- L1 Concept router validated (2026-02-21): all-MiniLM-L6-v2 centroid classification: 24/30 (80%) vs orchestrator-8b 13/30 (43%). 5ms vs 637ms. 0 VRAM vs 6GB. 100% on non-ambiguous queries. 5/5 multilingual. Failures are all pragmatic/vague β exactly the cases where LLM reasoning adds value. Data:
experiments/lcm-routing/. - L2 Multi-step templates built: 7 templates (research_and_summarize, read_and_analyze, fetch_and_compare, search_and_update, check_and_report, plan_and_implement, verify_and_fix). Max 4 steps. Limitation: rigid patterns can't scale to 10-100+ step workflows or handle failures/branching. Orchestrator model needed for dynamic planning. See I9.
- Specialist has no tools:
dispatch_specialist()sends a single-shot chat β no tool access. Can synthesize given context but cannot fetch/execute. Update (commit3580c38): Specialist now also used for content gate admission viaadmit_with_specialist()β generates structural briefings for the context gate. Trio never tested end-to-end: Resolved. B8 Done entry confirms E2E verification: delegation_mode=Trio in log, MainβRouterβSpecialist flow completed real tasks through LM Studio. Further hardened by trio E2E test runner (commitacbc738) with failure classification and adaptive retries.- 2026-02-21 diagnostic: Trio mode didn't activate. NanBeige ran as Inline main with full tool schemas. 21 metrics entries show
tool_calls_requested: 1, tool_calls_executed: 0β model generated tool calls (proving it had tool schemas) that were blocked as duplicates. Compaction crashed twice (n_keep 12620 >= n_ctx 8192). Fixed: B8 (metrics + circuit breaker) and B9 (tool guard replay + compaction overflow) shipped. Death spiral no longer occurs. Remaining: wire trio activation so NanBeige runs in Trio mode, not Inline. - System prompt is ~15-20K tokens even before conversation starts. Opus first call:
prompt_tokens: 21705. A 3B model with 8K context has zero room. Even with 32K context, 15K of prompt leaves only 17K for conversation β and most of that prompt is AGENTS.md/SOUL.md/TOOLS.md that small models can't follow anyway. Metrics broken for local modelsβ Fixed in B8 (commit0f80ad9). Token counts now captured from llama.cppusagefield.
Test each candidate model in each role independently.
| Model | As Main | As Orchestrator | As Specialist | As RLM |
|---|---|---|---|---|
| gemma-3n-e4b-it | ? | ? | ? | ? |
| nvidia_orchestrator-8b | β 10/10 routing | β proven | ? | ? |
| ministral-3-8b-instruct-2512 | ? | ? | ? | ? |
| nanbeige4.1-3b | ? | 6/10 | ? | ? |
Test bench per role:
- Main: 10 conversation tasks (chat quality, coherence, narration compliance)
- Orchestrator: 10 routing cases (existing test suite from experiments/)
- Specialist: 10 tool-calling tasks (file ops, exec, multi-step)
- RLM: 5 delegation loops (multi-step file edit, research, build cycle)
Critical for "3 impossible things" β must work across hardware tiers.
| Tier | VRAM | Trio Budget | Candidate Combos |
|---|---|---|---|
| Potato | 4-6 GB | ~4B total | 1 model does all? |
| Sweet | 8-12 GB | ~12B total | 2 small models |
| Power | 16-24 GB | ~24B total | Full trio (current target) |
| Beast | 48+ GB | Unlimited | Bigger specialists |
The key unknown. Test candidates on delegation loop benchmarks:
- Multi-step file edit (read β plan β edit β verify)
- Web research synthesis (search β fetch β summarize)
- Build cycle (edit β compile β fix errors β retry)
Metrics: completion rate, token cost, latency, error recovery.
Once E1-E3 identify winners, run full nanobot session with the new trio. Compare against current setup on real tasks.
- E1 first β know what each model can do in each role
- E3 next β find the RLM (biggest unknown)
- E2 then β scale findings across VRAM tiers
- E4 last β validate the winning combo end-to-end
Captured from spacebot. Ideas only, no code.
| Idea | Status | Mapped to |
|---|---|---|
| Non-blocking compaction | β Absorbed into I7 (matryoshka) | Phase 0 |
| Status injection | Backlog N6 | Phase 0 |
| Message coalescing | Backlog N7 | Phase 0 |
| Branch concept (context-fork) | Not started | Phase 2 (related to swarm) |
| Prompt complexity routing | Backlog I5 | Phase 0 |
| Memory bulletin (Cortex) | Not started | Phase 3 (related to memory) |
Continuous Voice Mode (Realtime Module)β VAD-based hands-free voice pipeline with streaming LLMβTTS output and barge-in support.InputModeenum (Continuous/PushToTalk) insession.rs. Purenext_state()state machine (VoiceAgentState: Listening/Processing/Speaking) withVoiceActiondispatch.LlmProcessortrait abstracts LLM calls (productionAgentLoopProcessorwrapsprocess_direct_streaming(),MockLlmProcessor/SlowMockLlmProcessorfor tests).run_voice_event_loop()β event-driven loop: receivesRealtimeEvent, runs state machine, spawns LLM tasks withCancellationTokenfor barge-in.drive_llm_to_tts()bridges LLM deltas throughSentenceAccumulatorto TTS.start_continuous_capture()β AudioCapture bridge withtry_send()backpressure (drop-oldest for real-time). CLI--modeflag (continuous/ptt). 27 new tests (13 voice_agent state machine + LLM + barge-in, 13 session config + capture, 1 CLI parsing), all green. 5 files changed, +875 lines. (2026-03-03,src/realtime/voice_agent.rs,src/realtime/session.rs,src/cli.rs)B11: Heartbeat as foundational liveness serviceβHealthRegistrywith pluggableHealthProbetrait, config-driven probe registration viabuild_registry(). First probe:LcmCompactionProbe(GET /health, 5s timeout, 60s interval, 3-failure degradation threshold). Critical fix: 30s timeout guard on both compaction spawns βin_flightalways resets even on timeout/hang. Pre-flight check skips LCM compaction when endpoint degraded. Wired into HeartbeatService (Layer 0), AgentLoop, CLI, REPL./statusshows probe health with color indicators. 25+ tests (health module + TrioEndpointProbe + timeout guard), 1526 total green. (2026-02-21, commits3bb1161,6c71866,1454240,src/heartbeat/health.rs,src/agent/agent_loop.rs)I7: Lossless Context Management (LCM)β DAG-based lossless compaction.LcmEnginewith three-level escalation (LLM preserve_details β bullet_points β deterministic truncate). Dual-threshold control loop (Ο_soft/Ο_hard).lcm_expandtool for lossless retrieval. E2E verified against 4 local models: qwen3-0.6b best compressor (83.2%, 3.4s), nemotron-nano-12b fastest (81.4%, 2.8s). 17 tests (4 mock E2E + 1 real E2E + 1 benchmark + 4 config + 9 unit). 1526 total green. (2026-02-21,src/agent/lcm.rs, commits0697bd4,9893d91,bde583f,72b94c8)B3: Update default local trioβ Trio configured: Maingemma-3n-e4b-it, Routernvidia_orchestrator-8b, Specialistministral-3-8b-instruct-2512. Explicit config + B10 auto-detect. (2026-02-21)B3.1: Smarter deterministic fallbackβrouter_fallback.rs: 9 deterministic patterns + default ask_user (was 2). Patterns: research+URLβspawn researcher, plain URL/HNβweb_fetch, latest newsβspawn, read/show+pathβread_file, write/create+pathβwrite_file, edit/modify+pathβedit_file, list/lsβlist_dir, run/execute/cargoβexec, searchβweb_search. All guarded byhas_tool(). 19 tests. (2026-02-21,src/agent/router_fallback.rs)B4: Multi-model config schemaβ Closed as obsolete. TrioConfig provides per-role model/port/endpoint. LM Studio JIT-loads models; no separate server spawning needed. (2026-02-21)B8: Trio mode activation & role-scoped contextβ All 5 steps complete. Metrics + circuit breaker (commit0f80ad9). Auto-activation + auto-detect as B10 (commit3774742). E2E verified: delegation_mode=Trio in log, Main emits natural language, Router preflight intercepts, Specialist executes tool. (2026-02-21,src/agent/router.rs)Session indexer + REPL /sessions commandβ Bridge between raw JSONL sessions (230 files, 116MB) and searchable SESSION_.md memory files.session_indexer.rs: pureextract_session_content()+index_sessions()orchestrator (extracts user+assistant messages, skips tool results, caps at 50 messages, truncates to 500 chars each). REPL:/sessionscommand with list/export/purge/archive/index subcommands (/ssalias). CLI:nanobot sessions index. Fixedprocess::exit(1)insessions_cmd.rsfor REPL safety. Updatedrecalltool description. E2E verified: 149 sessions indexed (6β155 SESSION_.md), idempotent re-run, grep finds content. 17 new tests, 1395 total green. (2026-02-21,src/agent/session_indexer.rs)B10: Auto-detect trio models from LM Studioβpick_trio_models()scans available LMS models at startup for "orchestrator"/"router" (router) and "function-calling"/"instruct"/"ministral" (specialist) patterns. Only fills empty config slots β explicit config always wins. Fuzzy main-model exclusion handles org prefixes and unresolved GGUF hints. Wired into REPL startup before auto-activation. 13 tests including e2e flow and real LMS model list. (2026-02-21, commit3774742)B9: Compaction safety guard + tool guard death spiralβ Tool guard replays cached results instead of injecting error messages small models can't parse. Compaction respects summarizer model's actual context window viacompaction_model_context_sizeconfig + pre-flight truncation (0.7 safety margin). Circuit breaker threshold 3β2. E2E verified against NanBeige on LM Studio. (2026-02-21, commit0f7f365)B8: Metrics accuracy + tool loop circuit breakerβ Fixed local model metrics capture (prompt_tokens,completion_tokens,elapsed_ms). Added circuit breaker for consecutive all-blocked tool call rounds. (2026-02-21, commit0f80ad9)B7: Provider retry withβ Replaced 3 hand-rolled retry loops withbackonbackoncrate. Sharedis_retryable_provider_error()predicate. Added retry to streaming path. (2026-02-21, commit640bdc9)B6: SLM provider observabilityβ 8 silent failure paths now logged.#[instrument]spans onchat()/chat_stream(). Promotedllm_call_failedtowarn!. (2026-02-21, commit0b6bc5f)Fix: Audit log hash chain race conditionβrecord()had a TOCTOU bug: seq allocation + prev_hash read were not serialized under the file lock. Two concurrent executors (tool_runner + inline) both read seq 940 and wrote seq 941 with the same prev_hash, forking the chain at entry 942. Fix: acquire file lock first, re-read authoritative seq + prev_hash from file under lock, then compute hash and write. 12/12 audit tests pass. (2026-02-21, commit835cf6d,src/agent/audit.rs)B1: 132 compiler warningsβ 0 warnings (2026-02-20)B2: 2 test failuresβ 1429 pass, 0 fail (2026-02-21)Fix: Subprocess stdin stealβ.stdin(Stdio::null())on all 4 spawn sites in shell.rs + worker_tools.rs (2026-02-20)Fix: Esc-mashing freezes REPLβ drain_stdin() after cancel (2026-02-20, commit 57ec883)Fix stale comment in(2026-02-17)ensure_compaction_modelRaise tool result truncation threshold(2026-02-17)Document multi-session CONTEXT.md race(2026-02-17)Input box disappears during streaming(2026-02-17)Agent interruption too slow(2026-02-17)Subagent improvements (wait, output files, budget, compaction)(2026-02-18)Tool runner infinite loop fix(2026-02-18)Specialist content gate, daily notes reader, auto-LCM for localβadmit_with_specialist()in context_gate.rs: specialist LLM generates structural briefing for content gate admission.read_recent_daily_notes()in memory.rs: reads last N daily notes for context. Auto-LCM:cli.rsenables LCM automatically for local-mode sessions. TOOLS.md/IDENTITY.md workspace template auto-creation. (2026-02-21, commit3580c38)Trio resilience: top_p, health probes, circuit breaker gatingβtop_pparameter added toLLMProvidertrait + all provider implementations (OpenAI-compat, Anthropic).TrioEndpointProbehealth probes for router/specialist endpoints (4 new health tests). Circuit breaker gating onrouter_preflight()anddispatch_specialist()β skips trio roles when endpoint degraded.HealthRegistryinjected intoTurnContextfor runtime health checks./statusimprovements. (2026-02-21, commit1454240)Trio E2E test runner with failure classificationβscripts/test_trio_e2e.sh: automated E2E test harness for trio mode. Failure classification (INFRA_DOWN / JIT_LOADING / TIMEOUT / MODEL_QUALITY / BUG) with adaptive retry delays. Preflight canary validates endpoint liveness before test runs. Summary reports with pass/fail counts. Auto-repair protocol for transient failures. (2026-02-21, commitacbc738)QW1: Per-tool guard limitsβ Replaced blanketmax_same_tool_call_per_turn: 1with per-tool-category limits. Read-only tools (read_file,list_dir,recall,read_skill,web_search,web_fetch) get limit 2 (originally 5, tightened 2026-02-28 to block redundant identical reads β cache replay handles blocked calls). All other tools use default 3 (raised from 1).ToolGuardnow hastool_limits: HashMap<String, u32>populated fromREAD_TOOLSconstant. 7 tests (addedtest_read_tool_different_args_unlimited), 1679+ total green. (2026-02-23, updated 2026-02-28,src/agent/tool_guard.rs,src/config/schema.rs)QW2: Recovery hints on tool errorsβ Added actionableHint:suffixes to error messages in filesystem, web, and shell tools. Filesystem: file-not-found β "use list_dir", permission-denied β "check permissions", old_text-not-found β "use read_file for exact text". Web: Brave 401/403 β "check API key", 422 β "subscription may be inactive", 429 β "rate limited, wait", 5xx β "service error". Shell: timeout β "try simpler command", exec error β "verify command in PATH". 9 new tests. (2026-02-23,src/agent/tools/filesystem.rs,src/agent/tools/web.rs,src/agent/tools/shell.rs)Strategic reviewβ Comprehensive review saved asREVIEW-2026-02-23.md. 5 architectural recommendations, 5 strategic recommendations, 5 UX recommendations, 5 quick wins, 5 anti-patterns. Recommended 4-week execution order. (2026-02-23)QW3: Gateway slash commands + trio auto-detectionβ 10 slash commands (/start,/help,/status,/clear,/agents,/kill,/think,/long,/context,/memory) intercepted before LLM processing in gateway mode viagateway_commands.rs. Anti-coalesce guard prevents commands from being batched with adjacent messages. Telegram bot-mention suffix stripping (/status@my_botβ/status). Trio auto-detection added tocmd_gateway()(was REPL-only). Fixedis_localhardcoded tofalsein gateway. (2026-02-23,src/agent/gateway_commands.rs,src/agent/agent_loop.rs,src/cli.rs)QW4: Exec tool working directory + tool call dedupβ Three-layer fix for exec tool defaulting to workspace instead of process cwd: (1)tool_wiring.rssetsexec_working_dirto process cwd, (2)registry.rsfallback chain: config β process cwd β workspace, (3)agent_loop.rsinjectsworking_dirinto exec calls when LLM omits it. Three-layer tool call dedup for local models emitting duplicate calls: (1) batch dedup instep_execute_tools(), (2) initial batch dedup inrun_tool_loop(), (3) cross-iteration dedup viaseen_callsinanalyze_via_scratch_pad(). System prompt improved: "Rules" section with "ALWAYS use tools, NEVER guess" + "File Operations" section with pwd-first guidance./mcommand fixed for remote LM Studio (queries API instead of scanning~/models/). (2026-02-23,src/agent/agent_loop.rs,src/agent/tool_wiring.rs,src/agent/tools/registry.rs,src/agent/context.rs,src/repl/commands.rs)N7: Message coalescingβ Already implemented: 400ms coalesce window inagent_loop.rs:run()batches rapid messages from same session (Telegram/WhatsApp). Anti-coalesce guard added for slash commands. (Pre-existing, noted 2026-02-23)