Summary
Gate worker processes started via Gate.serve() register a session with the server but are never terminated when abandoned. Over many editor/agent sessions and projects they accumulate as idle multi-GB Julia processes. On my machine I found four orphaned serve REPLs (one project had two) plus a leaked /tmp/kaimon_test_runner child — ~20 GB of RSS held by processes that had been idle for hours to over a day.
Root cause
close_session! (src/session.jl) only sets session.state = CLOSED — it never signals the OS process, so the worker keeps running.
- The only stale-session reaper (
_reap_stale_sessions! in src/tui/view.jl) is driven from the TUI render loop and operates on MCP client sessions (STANDALONE_SESSIONS), not on the gate worker processes (ConnectionManager REPLConnections with .pid). A headless -m Kaimon server never ticks it.
- Idle can't be measured from
last_seen: health pings flow through _req_send_recv, which refreshes last_seen every cycle. There was no activity timestamp independent of pings.
Net effect: nothing ever sends a signal to an abandoned gate, even though REPLConnection already carries everything needed (pid, project_path, status, allow_restart).
Proposed fix
Branch: 1-Bart-1:fix/reap-orphaned-serve-repls · commit 7a8ec78
Reap idle gates from the ConnectionManager health loop (runs in both TUI and headless modes):
- Add a ping-independent
last_tool_call timestamp, stamped via _note_tool_activity! at every real tool call / eval.
_should_reap_idle_gate flags a :connected, restartable, non-paused gate idle beyond a threshold (evaluating / stalled / debug-paused / allow_restart=false extension gates are skipped).
- Such gates are shut down gracefully via
send_shutdown! and pruned through the existing removal path.
- Threshold is a new
gate_idle_reap_seconds preference, default 0 (disabled) so behaviour is opt-in.
Happy to open this as a PR.
Summary
Gate worker processes started via
Gate.serve()register a session with the server but are never terminated when abandoned. Over many editor/agent sessions and projects they accumulate as idle multi-GB Julia processes. On my machine I found four orphaned serve REPLs (one project had two) plus a leaked/tmp/kaimon_test_runnerchild — ~20 GB of RSS held by processes that had been idle for hours to over a day.Root cause
close_session!(src/session.jl) only setssession.state = CLOSED— it never signals the OS process, so the worker keeps running._reap_stale_sessions!insrc/tui/view.jl) is driven from the TUI render loop and operates on MCP client sessions (STANDALONE_SESSIONS), not on the gate worker processes (ConnectionManagerREPLConnections with.pid). A headless-m Kaimonserver never ticks it.last_seen: health pings flow through_req_send_recv, which refresheslast_seenevery cycle. There was no activity timestamp independent of pings.Net effect: nothing ever sends a signal to an abandoned gate, even though
REPLConnectionalready carries everything needed (pid,project_path,status,allow_restart).Proposed fix
Branch:
1-Bart-1:fix/reap-orphaned-serve-repls· commit7a8ec78Reap idle gates from the
ConnectionManagerhealth loop (runs in both TUI and headless modes):last_tool_calltimestamp, stamped via_note_tool_activity!at every real tool call / eval._should_reap_idle_gateflags a:connected, restartable, non-paused gate idle beyond a threshold (evaluating / stalled / debug-paused /allow_restart=falseextension gates are skipped).send_shutdown!and pruned through the existing removal path.gate_idle_reap_secondspreference, default0(disabled) so behaviour is opt-in.Happy to open this as a PR.