Skip to content

feat(conversations): server-authoritative resumable chat turns + tab rewrite#302

Open
Ovaculos wants to merge 21 commits into
NimbleBrainInc:mainfrom
Ovaculos:feat/conversations-tab-rework
Open

feat(conversations): server-authoritative resumable chat turns + tab rewrite#302
Ovaculos wants to merge 21 commits into
NimbleBrainInc:mainfrom
Ovaculos:feat/conversations-tab-rework

Conversation

@Ovaculos
Copy link
Copy Markdown
Contributor

Summary

Rewrite the conversation tab around a server-authoritative resumable turn model (issue #254). A chat turn now runs to completion on the server regardless of the client's connection; the browser is a viewer that attaches to a per-conversation event log, replays the in-flight turn, and tails live events.

Closes #254. Includes the #253 follow-up (auto-title prompt) — upstream landed an XML-containment + sanitizer approach in #261 that is preserved here.

What this delivers

  • RunBus (src/runtime/run-bus.ts) — per-conversation in-memory log of buffered events with monotonic seq, ownership of the turn's AbortController, and a grace window for late re-attach. 500k per-run event cap with terminal error on overflow.
  • Detached startTurnPOST /v1/chat/start returns {conversationId} immediately; the turn keeps running across disconnect / refresh / tab switch. Stop button → POST /v1/conversations/:id/cancel.
  • Per-conversation SSE viewerGET /v1/conversations/:id/events replays from a client-supplied ?afterSeq=N then tails. Reconnect/backoff handled in conversation-stream.ts.
  • Per-conversation slice store (web/src/hooks/chat-store.ts) — useSyncExternalStore-backed module singleton, LRU cap, hasPendingTail for resume reconcile, probeConversation for sessionStorage-restored streaming dots.
  • Stage-1 conversation persistence at {workDir}/conversations/{id}.jsonl, user-scoped ownership, check:conversation-paths enforces it.
  • Conversations bundle UI: background-refresh list, in-place title update via a new synapse/conversation-title postMessage channel (no full list refetch on title resolve).

Known limitation: RunBus is single-process

Under platform.replicas > 1, a viewer landing on a different pod than the one that started the turn sees isActive:false and the live frames don't fan out cross-pod. Sticky routing on Mcp-Session-Id (existing prereq) mitigates the active-tab case; a pod restart or cross-pod viewer still drops resume mid-turn. A clustered Redis-backed RunBus is deferred — tracked in src/runtime/run-bus.ts. serve warns at boot when sessionStore.type === "redis" so the gap is visible.

Other fixes folded into this PR

  • Stop button now emits cancelled frame to live viewers (was: just aborted, viewer stuck isStreaming:true).
  • startTurn serializes create to prevent double-create race when client supplies an id.
  • ConversationCorruptedError → 422 in handleChatStart (was: 500).
  • Log sinks (structured-log-sink, workspace-log-sink) wrap writes in try/catch + warn-once-per-episode (was: throw into the event-emit path and crash detached turns).
  • Auto-title generates and live-updates via new conversation.title SSE event (with wsId scoping). List iframe receives via dedicated postMessage; chat panel header updates via existing chat-store hook.
  • Resume race: partial on-disk turn now completes from RunBus grace replay (no flicker). Three-way client reconcile (isActive / activeSeq>0 / GC'd).
  • Stuck streaming after >30s reconnect: server-authoritative isActive:false on subscribed frame drives chat-store reset.

Tests & hygiene

  • Runtime.start({}) now hard-errors under NODE_ENV=test when workDir is missing. Stops integration tests from silently writing echo-model conversations into the developer's real ~/.nimblebrain. Fixed 8 leakers + added test/helpers/test-workdir.ts.
  • scripts/check-code-style.ts skips node_modules/ and src/bundles/ per its doc comment (was: scanned vendored .d.ts after build:bundles, surfacing 1625 false-positive violations).
  • 20+ new tests covering chat-store slice semantics, partial-tail-completes-from-grace, complete-tail-no-flicker, RunBus overflow cap, etc.

Test plan

  • bun run verify — full CI parity, all green
  • Manual: start a turn, refresh mid-stream — assistant message reconstructs from disk + RunBus replay, no token loss
  • Manual: start a turn, switch to a different conversation mid-stream, switch back — no bleed, response still streaming on origin
  • Manual: start a turn, close the tab, reopen — title pops in live when auto-title resolves
  • Manual: Stop button — cancellation event arrives, streaming dot clears across all open tabs
  • Verify sessionStore.type: "redis" startup warning fires when configured

🤖 Generated with Claude Code

Ovaculos and others added 21 commits May 25, 2026 10:25
Decouple a chat turn's lifecycle from the originating HTTP request. A turn
now runs to completion on the server regardless of the client connection;
the client is a viewer that attaches to a replayable per-conversation event
log (RunBus), replays the in-flight turn, then tails live events. Refresh /
conversation-switch / disconnect never lose or duplicate work — they detach
and re-attach. Only the Stop button (RunBus.cancel) aborts generation.

- RunBus: in-memory, replayable, monotonic-seq turn log with grace-window
  retention for late re-attach; threads its own AbortSignal into the engine.
- runtime.startTurn: detached turn driver; seeds user.message into the log;
  reports isTurnActive / activeConversationIds / turnSeq / getTurnReplay.
- Per-conversation SSE: ConversationEventManager + /v1/conversations/:id/events
  route with subscribed{isActive} frame + buffered replay; /v1/chat/start and
  cancel endpoints.
- Broadcast conversation.title + data.changed on the global SSE when a title
  generates, so viewers update without a turn-stream connection.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…imbleBrainInc#253)

The single-string transcript prompt made the fast model pattern-match a
conversation to continue and emit the start of the assistant's response as
the title (worst on creative/long answers). Use real role turns
(user → assistant → user-instruction); the trailing user turn is an
unambiguous instruction, not text to continue.

Closes NimbleBrainInc#253

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the single shared useChat with a per-conversation slice store. Each
conversation owns its own state; a turn's stream writes only into its origin
slice, so switching conversations mid-turn never bleeds the response into the
destination chat (NimbleBrainInc#254) and switching back shows it still arriving.

- chat-store: slice map (LRU-capped) backing useChat via useSyncExternalStore;
  sendTurn → /v1/chat/start then subscribe via conversation-stream (replay +
  live tail). Resume reflects server isActive (indicator/Stop survive reload),
  trims the stale in-flight turn from disk, and drops grace-buffer replay of a
  finished turn (no duplicate). AbortSignal threaded for cleanup only.
- Streaming dots: store streamingIds → hostContext (SlotRenderer) so the list
  shows live per-row activity; persisted per-tab in sessionStorage and re-probed
  on reload (dots survive refresh, self-heal when finished).
- Reload restore: last-viewed conversation reopened from sessionStorage.
- Live title: consume conversation.title SSE → slice title; ChatPanel header
  prefers the generated title. MessageList lands the view at the bottom.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Per-row streaming indicator: read host-pushed streamingConversationIds
  (useHostContext) and render a pulsing dot on conversations with an in-flight
  turn.
- No more list flicker on updates: data.changed refreshes run in the
  background (no skeleton swap) and only for conversation changes — other apps'
  data.changed are ignored. Skeleton stays for initial load + view switches.
- New conversations appear immediately: ConversationIndex.flushPending() picks
  up a just-created file on the read path instead of racing the fs.watch
  debounce.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The per-conversation viewer replaced the old streaming path, leaving the
client-side pieces unreferenced:

- client.ts: streamChat / streamChatMultipart / consumeSSEStream (callers
  now use startChatTurn + connectConversationStream).
- conversation-sse.ts and useConversationEvents.ts (the old cross-tab SSE
  hook) — zero importers.

The server `/v1/chat/stream` + handleChatStream stay (still a valid REST
surface, exercised by integration tests). Also comment the one ported
`as unknown as` cast for the done-payload `files` field.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…prompt change

The capturing-model test helper skipped auto-title model calls by matching
the old prompt string ("Generate a 3-6 word title"). The NimbleBrainInc#253 rework changed
the title system prompt, so the guard stopped recognizing title calls and they
polluted the captured chat system prompt. Match the new prompt instead.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Detached turns emit events after the originating request ends; if the workdir
is gone (test teardown) or a write otherwise fails (disk full, perms), the
unguarded appendFileSync threw out of emit() as an unhandled error. Wrap the
write best-effort — a failed log line must never crash the caller.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
RunBus.cancel() flips the run to terminal synchronously, after which publish()
is a no-op. The engine's post-abort `cancelled` publish (startTurn's catch)
therefore never reached the SSE feed (onTurnEvent), so the Stop button aborted
generation but left the UI stuck streaming until a reload. done/error work
because they publish before ending while the run is still active.

Publish `cancelled` in cancelTurn BEFORE RunBus.cancel ends the run. The
engine's later cancelled publish becomes a harmless no-op. Adds a regression
test on the live onTurnEvent path (the attach/onEnd path the old tests used
did not cover this).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
startTurn → store.load throws ConversationCorruptedError for a pre-migration
(ownerless) conversation on the resume path, but handleChatStart let it fall
through to a raw 500. handleChat and handleChatCancel both already map it to
422 via conversationCorruptedResponse — match them. Adds an HTTP test
asserting /v1/chat/start on an ownerless conversation returns 422.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two concurrent starts with the same not-yet-existing conversationId could both
load()->null then both create({id}) — and create is a truncating writeFile with
no exists-guard, so the loser would clobber the winner's in-progress file.
runBus.begin serializes the run, but it ran AFTER create, too late to help.

For a provided id, reserve the run (begin) BEFORE touching storage: the loser
now throws RunInProgressError before it can create. On any load/create failure
we evict to release the reservation so a failed start can't leave the id stuck
"running". Adds a regression test (delayed load forces the window; asserts
create runs exactly once).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
connectConversationStream reconnects with backoff up to 30s; RunBus GCs a
terminal turn 30s after it ends. If a viewer is disconnected past that window
while the turn finishes, the reconnect's subscribed{isActive:false} carries no
replayed terminal frame — and the client only reconciled on the first (resume)
subscribed, so the slice stayed isStreaming=true until a manual reload.

Reconcile on any subscribed frame: if the server reports no active turn while
the slice still thinks it's streaming, clear the streaming state (the server is
authoritative). A terminal frame still within grace replays right after and
finalizes the content; if GC'd, the partial is kept and a reload fetches the
final — either way the spinner no longer hangs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…icker)

loadConversation reads disk then subscribes. A short turn ending in that window
left a partial snapshot, and onSubscribed hit the drop branch — discarding the
grace-buffer replay that held the full `done`, so the final response only showed
after a reload.

Mark the gap precisely instead of always recombining (which would flicker every
recently-finished conversation on open): the reader flags an assistant turn with
no run.done/run.error as `pending`. On resume the client now branches on it:

- live turn, or pending tail + retained run (activeSeq>0) → trim + apply replay
  (rebuilds the full turn, no duplicate);
- pending tail + run GC'd → refetch the now-complete transcript;
- complete tail → ignore the grace replay (no dup, no flicker — the common
  open-a-recent-conversation path is untouched).

A trailing user message (no assistant yet) also counts as pending. probe slices
have no messages, so they're unaffected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
conversation-stream viewers track the RunBus seq (id: line) for replay/resume.
The legacy /v1/chat and /v1/chat/stream endpoints fan out via
broadcastToConversation, which is seq-less — those frames apply live but don't
advance lastSeq and have no resume. The web shell is RunBus-only; only an
external caller hitting the legacy endpoints while a web tab watches the same
conversation would mix the two. Document the boundary on both sides.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The earlier hardening (commit 9ddfd2e) silenced log-sink throws to keep emit()
safe, but a real disk-full / perms incident produced zero operator signal.
Surface the first write failure of an episode with console.warn, then suppress
until a subsequent success re-arms — first failure visible, sustained outage
doesn't spam, intermittent recovery re-warns when it breaks again.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A viewer landing on a different pod than the one running the turn sees
isActive:false and silently drops resume — RunBus is single-process and isn't
yet clustered. Sticky routing on Mcp-Session-Id (prereq NimbleBrainInc#2) mitigates for the
active tab; a pod restart or any cross-pod viewer still loses the in-flight
turn. Two changes so this isn't a silent regression:

- AGENTS.md: add a "Known limitation under replicas > 1" note under the four
  prereqs, pointing at run-bus.ts and the deferred clustered RunBus.
- serve.ts: emit a loud startup warn when sessionStore.type === "redis" (the
  one strong signal of multi-replica intent) so operators see the gap at boot.

Not a hard error — sticky routing is sufficient for many cases, and operators
may accept the limit until the clustered RunBus lands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…owth

A runaway or adversarial event producer — a model stream looping
pathologically, a chatty tool spamming progress events — could push
log.events past safe memory limits before the grace-window GC fired.
Tens of thousands of token deltas held in-process for the grace window
on a multi-tenant deployment is real heap pressure.

Cap at 500k events per run with discard-as-error semantics: on overflow,
abort the turn's signal, append a synthetic terminal `error` event with
`error: "buffer_overflow"` so viewers see the cause, end the run as
`error`, and warn once. Subsequent publishes drop via the standard
`status !== "running"` guard.

500k sits ~10x over the worst legit run we can measure (extended-thinking
turn with chatty tool progress: ~30-50k events). Hitting it requires
the kind of pathological producer that is exactly what this cap exists
to terminate cleanly.

Configurable via the RunBus constructor for tests; production leaves it
at the default.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`Runtime.start({})` defaulted `workDir` to `~/.nimblebrain`, so any
integration test that forgot the field silently wrote echo-model
conversations, test workspaces, and bundle data straight into the
developer's real workdir. They then showed up in the conversations tab
on `bun run dev` — confusing diagnosis that looked like a title-pipeline
bug but was really 268 of 282 convs being test fixtures with
`workspaceId: "ws_test"`.

`resolveWorkDir` now throws under `NODE_ENV=test` (bun:test sets this
automatically) when `workDir` is missing, with a clear example fix.
Catches future regressions of this category at construction time.

Eight test files were leaking:
- chat-stream-concurrent
- appcontext-wiring (3 Runtime.starts)
- dependency-checking (6 starts; share file-scope testDir)
- conversation-integration
- remote-integration (3 starts)
- runtime (7 starts; one needed its own workDir to avoid skill bleed
  from earlier same-file tests writing to the shared global skill dir)
- security/workspace-isolation
- usage-integration

New helper `test/helpers/test-workdir.ts` exposes `makeTestWorkDir(label)`
that returns `{workDir, cleanup}` for tests that want per-test isolation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The runtime fired `emitConversationsChanged()` three times on a
first-turn chat: when isNew (sidebar row appears), at .finally (terminal
state), and again when the title resolved. Each broadcast triggered the
conversations-list iframe to refetch the whole list.

The title-resolve refetch is now redundant. The `conversation.title`
SSE event already exists (the chat panel header consumes it for live
title display); the list iframe just didn't have a channel for it.
Add one: the web shell forwards `conversation.title` via postMessage to
matching iframes, and the conversations bundle listens with a raw
`window.addEventListener` (the synapse SDK doesn't know this envelope,
which is fine — its own listener ignores unrecognized methods, no
double-handling). The iframe patches the matching row's title in-place,
no list refetch.

Drop the now-redundant `emitConversationsChanged()` from the
title-resolve handler. Broadcast count drops from 3 to 2 on first turn;
subsequent turns unchanged at 1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…dle src

Two unrelated cleanups:

- `web/src/api/client.ts`: comments still referenced the deleted
  `streamChat` / `streamChatMultipart` helpers (uploadResource header
  pattern note + startChatTurn docblock). Rot-flagged only; no
  behavior change.

- `scripts/check-code-style.ts`: the doc comment says "bundles are out
  of scope" but the glob walked `src/**/*.ts`, which after
  `bun run build:bundles` pulls in `src/bundles/*/ui/node_modules/`
  and surfaces 1625 false-positive inline-type-import violations in
  vendored `.d.ts` files. Skip `node_modules/` and `src/bundles/`
  per the documented scope.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tab-rework

# Conflicts:
#	src/api/routes/chat.ts
#	src/conversation/auto-title.ts
#	web/src/api/client.ts
#	web/src/api/conversation-sse.ts
#	web/src/components/AppWithChat.tsx
#	web/src/components/ChatPanel.tsx
#	web/src/context/ChatContext.tsx
#	web/src/hooks/useChat.ts
#	web/test/inlineError.test.tsx
#	web/test/streamingState.test.tsx
…tab-rework

Conflicts resolved in:
- web/test/inlineError.test.tsx
- web/test/streamingState.test.tsx

Adopted upstream's realClient snapshot pattern from web/test/setup.ts
(safer than `await import("../src/api/client")` per the setup.ts comment)
while preserving the branch's server-authoritative mocks
(startChatTurn / startChatTurnMultipart / cancelChatTurn +
connectConversationStream).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Mid-turn conversation switch bleeds streaming response into destination chat

1 participant