feat: dedicated chat process to prevent API contention during missions#1088
feat: dedicated chat process to prevent API contention during missions#1088sukria-koan0 wants to merge 8 commits into
Conversation
|
@Koan-Bot review |
PR Review — feat: dedicated chat process to prevent API contention during missionsWell-structured architectural change that cleanly decouples chat from mission execution. The extraction of 🔴 Blocking1. Soul and summary loaded once at startup, never refreshed ( The run loop and awake.py read these on every iteration/call. The chat process should do the same, or at minimum reload periodically (e.g., check mtime and reload on change). Suggested fix: move 🟡 Important1. Invalid JSON lines accumulate in inbox forever ( Fix: truncate the file unconditionally after reading (whether or not valid entries were parsed), or at least truncate when lines were read but none parsed. 2. Busy-message for a single pending request is too aggressive ( Consider allowing a small queue (e.g., 3-5 messages) before sending the "Busy" message. Alternatively, just queue unconditionally and let the FIFO processing handle backpressure naturally — the user will see responses arrive in order. 3. Duplicated _is_mission_active() logic ( 4. Fragile KOAN_ROOT derivation via parent traversal ( Consider accepting 5. _resolve_project_path always picks the first project ( This isn't necessarily wrong (the old awake.py had the same behavior), but it's worth documenting as a known limitation, or considering passing the current project context from the user's message. 🟢 Suggestions1. _get_last_message_id catches SystemExit ( 2. Retry loop imports on every attempt ( 3. TOCTOU in _route_to_chat_process ( Checklist
SummaryWell-structured architectural change that cleanly decouples chat from mission execution. The extraction of Automated review by Kōan |
|
I've completed the review. The architecture is sound — decoupling chat into its own process with a JSONL inbox is a clean approach. The main blocking issue is that Other notable findings: |
|
@sukria-koan0 rebase |
PR Review — feat: dedicated chat process to prevent API contention during missionsWell-structured decomposition that cleanly separates chat from mission execution. The chat_context.py extraction is a good refactor, the JSONL inbox protocol is appropriate for this IPC pattern, and test coverage for the new code is solid. However, there's one blocking issue: soul/summary/project_path are loaded once at startup and never refreshed, causing stale personality context until process restart. The duplicated _is_mission_active() and the overly restrictive queue-depth-1 busy check should also be addressed. The malformed-JSON accumulation bug in the inbox is a latent issue that will cause log spam after a crash. Fix the blocking issue and address the warnings before merging. 🔴 Blocking1. Soul, summary, and project_path loaded once, never refreshed (`koan/app/chat_process.py`, L290-295)soul, summary, and project_path are loaded once in main() and reused for the entire lifetime of the process. If the user edits soul.md, the memory summary changes, or a project is added, the chat process serves stale context indefinitely. awake.py and run.py reload these per-call. The chat process should do the same. The simplest fix: move _load_soul(), _load_summary(), and _resolve_project_path() inside the for-entry loop in main(), or at least inside process_chat_request() itself. These are cheap file reads — no reason to cache them across the process lifetime. 🟡 Important1. Malformed JSON lines in inbox are never cleared (`koan/app/chat_process.py`, L88-105)read_and_clear_inbox() only truncates the file when Fix: track whether any lines were read (not just valid entries) and truncate unconditionally when lines exist. For example: lines_read = False
for line in f:
lines_read = True
...
if lines_read:
f.seek(0)
f.truncate()
f.flush()2. Queue depth of 1 is a regression from worker thread model (`koan/app/awake.py`, L449-456)_route_to_chat_process() rejects new messages when any request is already pending in the inbox. This is more restrictive than the previous worker-thread model, which would queue messages naturally. A fast typist sending two messages in succession will get a 'Busy' response on the second one, even though the FIFO model would handle it fine. Consider either removing the has_pending_requests() check entirely (the chat process handles FIFO naturally) or raising the threshold to allow 3-5 queued messages before sending the busy response. 3. Duplicated _is_mission_active() implementation (`koan/app/chat_process.py`, L140-155)The same _is_mission_active() logic appears in both chat_process.py and outbox_manager.py (identical implementation: read .koan-status, check for 'executing mission' or 'skill dispatch'). This should be a shared utility — if the status file format changes or new status strings are added, both copies must be updated. Suggested: extract to a shared module (e.g., app/signals.py alongside STATUS_FILE, or a new function in app/utils.py) and import it from both places. 🟢 Suggestions1. process_chat_request takes stale soul/summary as params (`koan/app/chat_process.py`, L159-162)The function signature takes soul, summary, and project_path as parameters, which encodes the assumption that these are loaded once externally. If you fix the blocking issue by reloading per-call, these parameters become unnecessary — the function could load them itself, simplifying the API and making it impossible to pass stale values. Alternatively, if you want to keep the parameter-based approach for testability, document that callers must provide fresh values. 2. Prompt guard result not acted upon in chat process (`koan/app/chat_process.py`, L184-186)The prompt guard scan runs and logs a warning if blocked, but the chat request proceeds regardless. In awake.py's handle_chat(), the same guard result also just logs and continues, so this is consistent — but it's worth confirming that warn-only is the intended behavior for chat. If chat tools are truly read-only, this is fine, but it should be documented in the guard config or a comment. 3. write_to_inbox uses append mode without atomic write (`koan/app/chat_process.py`, L116-124)write_to_inbox() opens the file in append mode with flock, which is fine for single-line JSON entries on local filesystems. However, the rest of the codebase uses atomic_write() (temp file + rename) for shared files. Here it's less critical since JSONL is append-only and the lock prevents concurrent writes, but if a write is interrupted mid-line (e.g., SIGKILL during f.write), the next line will be corrupted. This is mitigated by the malformed-JSON handling in read_and_clear_inbox() — but only if that handler also truncates (see the related warning). Checklist
SummaryWell-structured decomposition that cleanly separates chat from mission execution. The chat_context.py extraction is a good refactor, the JSONL inbox protocol is appropriate for this IPC pattern, and test coverage for the new code is solid. However, there's one blocking issue: soul/summary/project_path are loaded once at startup and never refreshed, causing stale personality context until process restart. The duplicated _is_mission_active() and the overly restrictive queue-depth-1 busy check should also be addressed. The malformed-JSON accumulation bug in the inbox is a latent issue that will cause log spam after a crash. Fix the blocking issue and address the warnings before merging. Automated review by Kōanfa1f051 |
|
@atoomic The PR has been rebased (per the earlier request) but the review findings from the initial bot review still stand. The blocking issue — soul/summary/project_path loaded once at startup in |
When a mission is actively running, outbox messages now use fallback_format() instead of invoking Claude CLI for formatting. This eliminates one concurrent API caller during missions, reducing contention that causes chat timeouts. Refs #1084 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Create chat_context.py: shared prompt building extracted from awake.py - Create chat_process.py: standalone process with JSONL inbox/outbox protocol, PID file, and graceful SIGTERM handling - Update awake.py to route chat through dedicated process when running, with automatic fallback to worker thread when process is not available - Update test patches to match new module structure Refs #1084 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add "chat" to PROCESS_NAMES in pid_manager.py - Add start_chat() function for launching chat_process.py - Include chat in start_all(), stop_processes(), format_status_all() - Add `make chat` target to Makefile - Update `make logs` to tail chat.log Refs #1084 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Chat process retries up to 3 times with 2s/5s/10s backoff on empty responses (the main symptom of API contention during missions) - First attempt uses full context; retries use lite context + shorter timeout - Detect active missions via .koan-status for logging context - Extract _retry_chat_lite helper in awake.py fallback path - Empty responses in awake.py fallback now trigger lite retry instead of immediately showing "I didn't get a response" Refs #1084 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix prompt guard tests to patch functions at their new locations (app.conversation_history, app.config) after chat_context extraction - Add diagnostic output to bare except blocks in chat_process.py to satisfy silent exception enforcement Refs #1084 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Update CLAUDE.md: 3-process architecture, new modules, commands - Add chat-inbox.jsonl to instance directory listing Refs #1084 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…cation, remove queue-depth-1 rejection d` are picked up without restarting the chat process. Per reviewer's blocking feedback. - **Truncate inbox unconditionally after reading** (`chat_process.py:read_and_clear_inbox()`): Removed the `if entries:` guard around `f.seek(0); f.truncate()`. Previously, malformed JSON lines would persist forever since `entries` would be empty and the file was never truncated. Per reviewer's important feedback. - **Remove queue-depth-1 busy rejection** (`awake.py:_route_to_chat_process()`): Removed the `has_pending_requests()` check that rejected new messages when any request was already pending. Messages are now always queued to the inbox and processed FIFO by the chat process. Per reviewer's feedback that depth-1 rejection was a regression from the worker thread model. - **Updated test for queue behavior** (`test_chat_process.py`): Renamed `test_busy_when_pending_requests` to `test_queues_when_pending_requests` and updated assertions to verify both messages are queued (no busy message sent).
Rebase: feat: dedicated chat process to prevent API contention during missionsBranch Diff: 11 files changed, 1000 insertions(+), 278 deletions(-) Review feedback was analyzed and applied. Changesd` are picked up without restarting the chat process. Per reviewer's blocking feedback.
Actions
CICI will be checked asynchronously. Automated by Kōan |
2cb0ff8 to
37ba88e
Compare
…context extraction
Rebase: feat: dedicated chat process to prevent API contention during missionsBranch Diff: 11 files changed, 1038 insertions(+), 302 deletions(-) Review feedback was analyzed and applied. Actions
CICI will be checked asynchronously. Automated by Kōan |
Summary
Decouples Telegram chat handling from mission execution by introducing a dedicated chat process. When a mission runs, the Claude API is no longer hammered by three concurrent callers (mission + chat + outbox formatting) — chat now has its own independent process and retry strategy.
Closes #1084
Changes
fallback_format()to free one API caller)chat_process.pywatcheschat-inbox.jsonlfor chat requests;chat_context.pyextracted as shared module;awake.pyroutes to chat process with fallback to worker threadpid_manager.py(start_chat(),start_all(),stop_processes(),format_status_all());make chat/make logsupdated.koan-statusTest plan
Generated by Kōan /implement