I don't know if this is helpful, but below is claude code's own report of some buggy behaviour I encountered.
Kaimon MCP — bug report
Date: 2026-06-23
Context: Driving an iterative Julia loop via the kaimon MCP from an automated coding agent — "edit a source file → re-evaluate in the session → inspect the result", with many small edits to a single package source file and a re-evaluation after each batch.
Identifiers below are placeholders: MyPkg = the package under development (a path-dev-ed Julia package with its own Project.toml), at <project-root>. Session keys and PIDs are reproduced as observed (they are random/local and not sensitive).
Environment
- OS: macOS (Darwin 23.5), Apple Silicon (arm64-darwin).
- Julia: 1.11, juliaup-managed.
- Kaimon package:
~/.julia/packages/Kaimon/6N7iB/ (version hash 6N7iB). Stacktraces point into ~/.julia/packages/Kaimon/6N7iB/src/gate.jl.
- Project under test: a path-
dev-ed Julia package (MyPkg) with a few registered deps; nothing exotic.
- Concurrent unrelated session (stable throughout): key
0f5b58b7, agent-spawned for a different project, PID 45258, up ~7.5h. It never misbehaved — only the MyPkg session(s) did. Useful as a control.
TL;DR — five distinct symptoms
- Wrong working directory in a fresh session's REPL:
pwd() was an unrelated local project directory, not the project_path passed to start_session.
- Revise never hot-reloaded edits to package source in the persistent session — even after an explicit
Revise.revise(); newly-added top-level definitions stayed undefined. Only a full process restart picked changes up.
manage_repl("restart") timed out and was a no-op: afterwards the process kept its old PID and accumulating uptime, and ran stale code.
- Session registry race: the set of live sessions and the project's session key flipped non-deterministically between consecutive MCP calls (two keys appeared/disappeared; a shut-down session reappeared).
start_session phantom "already running": it returned a session key that ping and ex simultaneously reported as nonexistent — and therefore declined to create a new one, leaving the project with no usable REPL.
Net effect: the persistent-REPL workflow became unusable. Fell back to spawning a one-shot julia --project=<project-root> -e '…' from a plain shell, which worked 100% reliably thereafter.
What worked (to isolate the fault)
start_session — the first call returned a usable key.
run_tests — always worked and always reflected current on-disk code. (It spawns a fresh subprocess, sidestepping both the stale-Revise and registry issues.)
ex with absolute paths — worked whenever the session was actually live.
manage_repl("restart") — worked the first time; hung the second time.
- One-shot
julia --project=<project-root> -e '…' from a plain shell — worked every time. This confirms the on-disk edits were valid and complete; the fault is in the persistent session / Revise / registry layer, not the code.
Issue 1 — Fresh session's REPL working directory is not the project directory
Repro:
start_session(project_path="<project-root>")
→ "Session started. Session key: c97c8a8e"
ex(ses=c97c8a8e, e='include(<a project-relative path>)')
Result (verbatim, path redacted):
ERROR: SystemError: opening file
"<unrelated-project-dir>/<the relative path>": No such file or directory
Stacktrace:
[1] include(mapexpr::Function, mod::Module, _path::String) @ Base ./Base.jl:307
...
[4] _eval_with_capture(expr::Expr) @ Kaimon.Gate ~/.julia/packages/Kaimon/6N7iB/src/gate.jl:1211
[5] gate_eval(code::String; _mod::Module, display_code::String) @ Kaimon.Gate .../gate.jl:920
[6] (::Kaimon.Gate.var"#_do_async_eval#handle_message##0")() @ .../gate.jl:1571
[7] (::Kaimon.Gate.var"#handle_message##5#handle_message##6"{...})() @ .../gate.jl:1610
Observation: pwd() in the new session resolved a project-relative path against an unrelated local project directory, not the project_path passed to start_session. The REPL apparently inherited a cwd from the MCP server process (or a global/previous default) instead of cd-ing into the activated project.
Expected: a session activated for project_path = X should have pwd() == X (or the cwd behaviour should be documented), so project-relative paths work.
Workaround: used an absolute project root constant and joinpath everywhere.
Severity: low (easy workaround) but surprising; a footgun for any relative-path code.
Issue 2 — Revise does not hot-reload edits to package source in the persistent session
Repro: Edited one package source file repeatedly — changed function bodies, and added three brand-new top-level definitions (one function + two consts; call them newfn, NEW_A, NEW_B). Re-ran the workflow via ex.
Result: the produced output reflected the old code. Confirmed at the artifact level — the generated output still contained the pre-edit form of a changed function, after that code had been replaced on disk.
Direct confirmation (verbatim):
ex(ses=…, e='using Revise; Revise.revise(); using MyPkg;
(isdefined(MyPkg,:newfn), isdefined(MyPkg,:NEW_A), isdefined(MyPkg,:NEW_B))')
→ (false, false, false)
Even after an explicit Revise.revise(), none of the newly-added top-level symbols existed in the module. The session banner reported "Revise active."
Cross-check: run_tests (fresh subprocess) and the one-shot julia --project did see all the new code — so the edits were on disk and valid; only the long-lived session's module was stale, and Revise's incremental update never fired.
Hypotheses for maintainer:
- The package may be loaded such that Revise isn't tracking it (loaded from a precompiled image / not
dev-ed into the active environment / file watcher not started). Revise.revise() being a no-op suggests Revise has no tracked files for this package.
- Worth checking at session start: is the package
Pkg.develop-ed in the active env? Is Revise's FS watcher attached to its src/? Do Revise.pkgdatas / Revise.watched_files contain it? Any queued revision errors (Revise.queue_errors)?
Severity: high for an interactive dev loop — it silently produces stale results (no error), the worst failure mode.
Issue 3 — manage_repl("restart") timed out and did not actually restart
First restart (worked):
manage_repl(command="restart", session=c97c8a8e)
→ "Session c97c8a8e restarted. Fresh Julia state. Revise active."
After this, fresh code was loaded — a working restart was the only thing that picked up edits.
Second restart (hung):
manage_repl(command="restart", session=c97c8a8e)
→ "The operation timed out."
Then ping showed (verbatim, names redacted):
● 438477fc <MyPkg> (connected, up 1h 37m, PID 63715, free)
● 0f5b58b7 <other-proj> (connected, up 7h 33m, PID 45258, agent-spawned)
Observation: the MyPkg process showed uptime 1h 37m and an unchanged PID (63715) — i.e. it was not actually restarted (a real restart would show ~0m uptime / a new PID). Yet the call returned a timeout, not success or failure. Note also the key visible to ping (438477fc) differed from the key I had restarted (c97c8a8e) — see Issue 4.
Severity: high — restart was the only reliable way to pick up edits (given Issue 2), and it became unavailable.
Issue 4 — Session registry race / non-deterministic key reassignment
The set of "available sessions" and the project's session key changed between consecutive calls, with no session activity from me in between. Verbatim sequence (names redacted, keys/PIDs as observed):
1) ping
→ ● 438477fc <MyPkg> (up 1h 37m, PID 63715) ● 0f5b58b7 <other-proj>
2) manage_repl(shutdown, session=438477fc)
→ "Session 438477fc shut down."
3) start_session(project_path="<project-root>")
→ "Session already running for this project. Session key: c97c8a8e" ← c97c8a8e?!
4) manage_repl(shutdown, session=c97c8a8e)
→ ERROR: "No session matched 'c97c8a8e'.
Available: 0f5b58b7 (<other-proj>), 438477fc (<MyPkg>)" ← 438477fc is BACK
5) ping
→ "Sessions: 1 connected / 1 total"
● 0f5b58b7 <other-proj> only ← both MyPkg sessions gone now
6) start_session(project_path="<project-root>")
→ "Session already running for this project. Session key: c97c8a8e" ← claims c97c8a8e again
7) ex(ses=c97c8a8e, …)
→ ERROR: "No session matched 'c97c8a8e'. Available: 0f5b58b7 (<other-proj>)"
8) ping
→ ● 0f5b58b7 <other-proj> only
Two keys (c97c8a8e, 438477fc) appear and disappear inconsistently; 438477fc reappeared after being shut down (step 2 → step 4); ping, start_session, and the ex/shutdown "Available:" listings disagreed with each other within seconds.
Hypothesis: the key↔process registry has a race / stale-entry bug — possibly restart registers a new key/process without atomically retiring the old one, and timeouts (Issue 3) leave half-committed registry state. Different endpoints (ping vs start_session vs the ex/shutdown "Available:" list) appear to read different or differently-cached views of the registry.
Severity: high — makes the session unaddressable; you can't reliably target a key.
Issue 5 — start_session phantom "already running" (won't create a usable session)
In steps 3 and 6 above, start_session returned:
"Session already running for this project. Session key: c97c8a8e"
…but at those same moments ping reported no MyPkg session (steps 5/8) and ex(ses=c97c8a8e) failed with "No session matched 'c97c8a8e'" (step 7). So:
start_session's "already running" check consulted a stale registry entry and returned a key that is not actually live/connected.
- Because it believed a session existed, it did not spin up a new one — leaving the project with no usable REPL at all. The only escape was abandoning the MCP for the rest of the session.
Expected: start_session should verify the candidate is actually alive (ping the gate / check the PID) before reporting "already running"; if dead/unreachable, reap the stale entry and start a fresh process.
Severity: high — this is the terminal failure that forced the fallback.
Consolidated hypotheses & suggested diagnostics
In rough priority:
- Registry liveness reconciliation (Issues 3–5). Reconcile the session registry against live PIDs on every
ping / start_session / ex; reap dead/phantom entries; make restart atomic (retire the old key/PID together with registering the new); on a restart timeout, return the actual resulting state and the new key rather than a bare timeout. Have start_session health-check the "already running" candidate before returning it.
- Revise tracking (Issue 2). Verify Revise is actually attached to the dev package's
src/ at session start. Useful introspection to capture: Revise.pkgdatas, Revise.watched_files, length(Revise.revision_queue), Revise.queue_errors. Check whether the package is dev-ed vs loaded from a precompiled image, and whether the FS watcher thread is alive. A no-op Revise.revise() strongly suggests nothing is being tracked.
- Working directory (Issue 1).
cd the gate process into project_path (or expose/document the cwd) at session creation.
Artifacts that would help (could not be retrieved reliably once the session died):
- The kaimon server log around the timeout (there is a
server_log MCP tool).
- An internal session-registry dump at the moments of disagreement.
- kaimon/gate version: package hash
6N7iB; relevant frames src/gate.jl:1211 (_eval_with_capture), :920 (gate_eval), :1571, :1610.
Reproduction recipe (suspected):
start_session for a path-dev-ed Julia package; check pwd() in the session (Issue 1).
- Edit a
src/ file (add a new top-level symbol); ex Revise.revise(); isdefined(Mod, :newsym) → expect false (Issue 2).
manage_repl("restart") repeatedly / under load until one times out; ping and compare PID / uptime / key (Issues 3–4).
- After a timed-out restart,
start_session again and observe the phantom "already running" plus a failing ex against that key (Issue 5).
I don't know if this is helpful, but below is claude code's own report of some buggy behaviour I encountered.
Kaimon MCP — bug report
Date: 2026-06-23
Context: Driving an iterative Julia loop via the kaimon MCP from an automated coding agent — "edit a source file → re-evaluate in the session → inspect the result", with many small edits to a single package source file and a re-evaluation after each batch.
Identifiers below are placeholders:
MyPkg= the package under development (a path-dev-ed Julia package with its ownProject.toml), at<project-root>. Session keys and PIDs are reproduced as observed (they are random/local and not sensitive).Environment
~/.julia/packages/Kaimon/6N7iB/(version hash6N7iB). Stacktraces point into~/.julia/packages/Kaimon/6N7iB/src/gate.jl.dev-ed Julia package (MyPkg) with a few registered deps; nothing exotic.0f5b58b7, agent-spawned for a different project, PID 45258, up ~7.5h. It never misbehaved — only theMyPkgsession(s) did. Useful as a control.TL;DR — five distinct symptoms
pwd()was an unrelated local project directory, not theproject_pathpassed tostart_session.Revise.revise(); newly-added top-level definitions stayedundefined. Only a full process restart picked changes up.manage_repl("restart")timed out and was a no-op: afterwards the process kept its old PID and accumulating uptime, and ran stale code.start_sessionphantom "already running": it returned a session key thatpingandexsimultaneously reported as nonexistent — and therefore declined to create a new one, leaving the project with no usable REPL.Net effect: the persistent-REPL workflow became unusable. Fell back to spawning a one-shot
julia --project=<project-root> -e '…'from a plain shell, which worked 100% reliably thereafter.What worked (to isolate the fault)
start_session— the first call returned a usable key.run_tests— always worked and always reflected current on-disk code. (It spawns a fresh subprocess, sidestepping both the stale-Revise and registry issues.)exwith absolute paths — worked whenever the session was actually live.manage_repl("restart")— worked the first time; hung the second time.julia --project=<project-root> -e '…'from a plain shell — worked every time. This confirms the on-disk edits were valid and complete; the fault is in the persistent session / Revise / registry layer, not the code.Issue 1 — Fresh session's REPL working directory is not the project directory
Repro:
Result (verbatim, path redacted):
Observation:
pwd()in the new session resolved a project-relative path against an unrelated local project directory, not theproject_pathpassed tostart_session. The REPL apparently inherited a cwd from the MCP server process (or a global/previous default) instead ofcd-ing into the activated project.Expected: a session activated for
project_path = Xshould havepwd() == X(or the cwd behaviour should be documented), so project-relative paths work.Workaround: used an absolute project root constant and
joinpatheverywhere.Severity: low (easy workaround) but surprising; a footgun for any relative-path code.
Issue 2 — Revise does not hot-reload edits to package source in the persistent session
Repro: Edited one package source file repeatedly — changed function bodies, and added three brand-new top-level definitions (one function + two
consts; call themnewfn,NEW_A,NEW_B). Re-ran the workflow viaex.Result: the produced output reflected the old code. Confirmed at the artifact level — the generated output still contained the pre-edit form of a changed function, after that code had been replaced on disk.
Direct confirmation (verbatim):
Even after an explicit
Revise.revise(), none of the newly-added top-level symbols existed in the module. The session banner reported "Revise active."Cross-check:
run_tests(fresh subprocess) and the one-shotjulia --projectdid see all the new code — so the edits were on disk and valid; only the long-lived session's module was stale, and Revise's incremental update never fired.Hypotheses for maintainer:
dev-ed into the active environment / file watcher not started).Revise.revise()being a no-op suggests Revise has no tracked files for this package.Pkg.develop-ed in the active env? Is Revise's FS watcher attached to itssrc/? DoRevise.pkgdatas/Revise.watched_filescontain it? Any queued revision errors (Revise.queue_errors)?Severity: high for an interactive dev loop — it silently produces stale results (no error), the worst failure mode.
Issue 3 —
manage_repl("restart")timed out and did not actually restartFirst restart (worked):
After this, fresh code was loaded — a working restart was the only thing that picked up edits.
Second restart (hung):
Then
pingshowed (verbatim, names redacted):Observation: the
MyPkgprocess showed uptime 1h 37m and an unchanged PID (63715) — i.e. it was not actually restarted (a real restart would show ~0m uptime / a new PID). Yet the call returned a timeout, not success or failure. Note also the key visible toping(438477fc) differed from the key I had restarted (c97c8a8e) — see Issue 4.Severity: high — restart was the only reliable way to pick up edits (given Issue 2), and it became unavailable.
Issue 4 — Session registry race / non-deterministic key reassignment
The set of "available sessions" and the project's session key changed between consecutive calls, with no session activity from me in between. Verbatim sequence (names redacted, keys/PIDs as observed):
Two keys (
c97c8a8e,438477fc) appear and disappear inconsistently;438477fcreappeared after being shut down (step 2 → step 4);ping,start_session, and theex/shutdown"Available:" listings disagreed with each other within seconds.Hypothesis: the key↔process registry has a race / stale-entry bug — possibly
restartregisters a new key/process without atomically retiring the old one, and timeouts (Issue 3) leave half-committed registry state. Different endpoints (pingvsstart_sessionvs theex/shutdown"Available:" list) appear to read different or differently-cached views of the registry.Severity: high — makes the session unaddressable; you can't reliably target a key.
Issue 5 —
start_sessionphantom "already running" (won't create a usable session)In steps 3 and 6 above,
start_sessionreturned:…but at those same moments
pingreported noMyPkgsession (steps 5/8) andex(ses=c97c8a8e)failed with"No session matched 'c97c8a8e'"(step 7). So:start_session's "already running" check consulted a stale registry entry and returned a key that is not actually live/connected.Expected:
start_sessionshould verify the candidate is actually alive (ping the gate / check the PID) before reporting "already running"; if dead/unreachable, reap the stale entry and start a fresh process.Severity: high — this is the terminal failure that forced the fallback.
Consolidated hypotheses & suggested diagnostics
In rough priority:
ping/start_session/ex; reap dead/phantom entries; makerestartatomic (retire the old key/PID together with registering the new); on arestarttimeout, return the actual resulting state and the new key rather than a bare timeout. Havestart_sessionhealth-check the "already running" candidate before returning it.src/at session start. Useful introspection to capture:Revise.pkgdatas,Revise.watched_files,length(Revise.revision_queue),Revise.queue_errors. Check whether the package isdev-ed vs loaded from a precompiled image, and whether the FS watcher thread is alive. A no-opRevise.revise()strongly suggests nothing is being tracked.cdthe gate process intoproject_path(or expose/document the cwd) at session creation.Artifacts that would help (could not be retrieved reliably once the session died):
server_logMCP tool).6N7iB; relevant framessrc/gate.jl:1211 (_eval_with_capture),:920 (gate_eval),:1571,:1610.Reproduction recipe (suspected):
start_sessionfor a path-dev-ed Julia package; checkpwd()in the session (Issue 1).src/file (add a new top-level symbol);exRevise.revise(); isdefined(Mod, :newsym)→ expectfalse(Issue 2).manage_repl("restart")repeatedly / under load until one times out;pingand compare PID / uptime / key (Issues 3–4).start_sessionagain and observe the phantom "already running" plus a failingexagainst that key (Issue 5).