fix(engine): bound tool results in model context on replay; 1h cache TTL by mgoldsborough · Pull Request #362 · NimbleBrainInc/nimblebrain

mgoldsborough · 2026-06-02T04:30:29Z

Problem

A small number of long-lived conversations were driving the overwhelming majority of token cost. Root cause, traced end to end:

Tool results are capped at MAX_TOOL_RESULT_CHARS (50K) only inside the live engine loop (engine.ts).
The full output is persisted to tool.done.output, and the history reconstructor replays that full value verbatim (event-reconstructor.ts) with no cap.
So on every subsequent run, the model is re-fed the entire payload — more than it even saw live. Long threads balloon toward the 1M context ceiling, and with prompt caching the giant prefix gets re-written into the cache on every turn (cache-write was ~80% of spend; write:read ratio ~2:1, backwards).

tool.done.output was a single field serving two masters: the UI/record (wants the full payload) and model replay (wants a bound).

Fix

Separate the two concerns:

output (full) — UI/display + conversation record. Unchanged.
modelOutput (bounded) — exactly what the model saw on the live turn. New optional field on ToolDoneEvent, persisted only when it differs from output.

A single shared boundToolResultForModel() (in content-helpers.ts, next to extractTextForModel) produces the bound. The engine and the reconstructor both call it, so the model's live view and replayed view are byte-identical. Legacy events without modelOutput fall back to bounding output at read time, so existing conversations are fixed too. The bound is pure/deterministic → the replayed prompt prefix stays stable and cacheable. It trims on line boundaries (never mid-record) and preserves the inline-UI pointer behavior for new events (legacy UI-tool events without a persisted modelOutput are line-trimmed on replay — still bounded, strictly better than the prior full-payload replay).

Framed precisely: this is a replay-fidelity fix — today replay gives the model more than the live run did. It happens to also eliminate the runaway context growth.

Also: 1h prompt-cache TTL

Both Anthropic cache breakpoints move from the 5-minute default to ttl: "1h". Agentic turns pause far longer than 5 minutes (user steps away; automation waits on I/O), so the prefix constantly lapses and the next turn re-writes the whole thing at the write rate. 1h keeps the prefix alive across those gaps, converting full re-writes into cheap reads.

Tests

boundToolResultForModel: under-limit passthrough, line-boundary trim, single-huge-line hard-slice fallback, inline-UI pointer, determinism, limit<=0 unbounded.
Reconstructor replay: large legacy result bounded; modelOutput replayed verbatim; small result unchanged; determinism; UI metadata still carries full output.
Updated existing cache-control assertions (now ttl: "1h") and the prompt-injection truncation-notice assertion (new marker wording). Security property (injection beyond the bound is dropped) preserved.

tsc --noEmit clean, biome clean on src/, full unit suite green except one unrelated pre-existing missing-dep (dompurify) in an automations-UI test this PR doesn't touch.

Rollout note

The model bound takes effect immediately for all conversations (new events store modelOutput; legacy events are bounded on read). The 1h TTL is fleet-wide. No tenant config change required. Expected to cut the dominant cache-write cost substantially and stop context from ballooning on long threads.

Large tool results were capped at 50K chars only inside the live engine loop; the full output was persisted to tool.done and replayed verbatim by the history reconstructor on every subsequent run. Long conversations therefore re-fed the model the entire payload each turn, ballooning context toward the 1M ceiling and dominating cost via repeated cache writes. Separate the two concerns the tool.done `output` field was conflating: - `output` (full) — UI/display and the conversation record. Unchanged. - `modelOutput` (bounded) — what the model actually saw. New optional field, persisted only when it differs from `output`. A single shared boundToolResultForModel() produces the bound; the engine and the history reconstructor both call it, so the model's live view and its replayed view of a result are byte-identical. Legacy events without `modelOutput` fall back to bounding `output` at read time, fixing existing conversations too. The bound is pure and deterministic, keeping the replayed prompt prefix stable and cacheable. Also raise the Anthropic prompt-cache TTL from the 5-minute default to 1h so the cached prefix survives the multi-minute gaps between agentic turns, converting full prefix re-writes into cheap cache reads. Tests: unit coverage for boundToolResultForModel (line-boundary trim, UI pointer, determinism, limit<=0) and reconstructor replay (legacy bound, modelOutput verbatim, small unchanged, determinism). Updated cache-control and truncation-notice assertions to the new behavior.

…le cache prefix

…ingle TTL source)

mgoldsborough added 5 commits June 1, 2026 18:30

style: biome format the tool-result destructuring (line width)

dfe17e7

refactor(engine): pin marker number formatting to en-US for byte-stab…

e294d8b

…le cache prefix

refactor(engine): reuse CACHE_CONTROL_EPHEMERAL for system message (s…

056294b

…ingle TTL source)

docs(engine): note boundToolResultForModel limit is a soft target

453d20e

mgoldsborough added the qa-reviewed QA review completed with no critical issues label Jun 2, 2026

mgoldsborough merged commit 053b5c2 into main Jun 2, 2026
5 checks passed

mgoldsborough deleted the fix/tool-result-replay-bound branch June 2, 2026 07:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(engine): bound tool results in model context on replay; 1h cache TTL#362

fix(engine): bound tool results in model context on replay; 1h cache TTL#362
mgoldsborough merged 5 commits into
mainfrom
fix/tool-result-replay-bound

mgoldsborough commented Jun 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mgoldsborough commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Also: 1h prompt-cache TTL

Tests

Rollout note

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mgoldsborough commented Jun 2, 2026 •

edited

Loading