Summary
TokenJuice's compaction savings dashboard (tokenjuice.savings_stats) prices the tokens it saves using the configured default model (config.default_model), not the model the tool result is actually being compressed for on that turn. Attribute savings to the live per-turn model so the cost figure is accurate for sessions that use model overrides, per-agent models, or team lead/worker model splits.
Problem / Context
The content router compacts a tool result inside the agent loop and records the savings via savings::record(...), which calls cost_saved_usd(model, ...) with a process-global attribution model installed once at startup (tokenjuice::savings::configure from config.default_model).
The active per-turn model is available deeper in the harness (run_turn_engine(..., model: &str, ...)), but it is not threaded down to the compaction call sites (agent/harness/session/agent_tool_exec.rs, agent/harness/engine/tools.rs), and AgentToolExecCtx does not currently carry it. Threading it through both call sites + the turn engine was deferred to keep the initial savings feature small.
Impact today: cost-saved is correct when a session uses the default model, but skewed when:
- a per-turn model override is in effect,
- agents run on different models (lead vs worker, per-agent
model),
- the result is destined for a cheaper/more-expensive tier than the default.
Token counts (the dominant metric) are unaffected — only the USD figure and the byModel breakdown attribution are.
Scope (optional)
In scope:
- Thread the active model id from
run_turn_engine to the tool-execution compaction call sites (extend AgentToolExecCtx and the compact_output / compact_tool_output signatures with an optional model).
- Pass it into
savings::record so cost_saved_usd and the byModel bucket use the per-turn model; fall back to the configured default when unknown.
Out of scope:
- Output-token cost modeling / re-send amplification (a tool result re-enters context on every subsequent turn) — current model counts a single input occurrence, which is the conservative estimate.
Acceptance criteria
Related
- Follow-up to the TokenJuice content-router / savings work (PR in flight on branch `feat/tokenjuice-content-router`).
- Code: `src/openhuman/tokenjuice/savings.rs`, `src/openhuman/tokenjuice/compress.rs`, `src/openhuman/agent/harness/session/agent_tool_exec.rs`, `src/openhuman/agent/harness/engine/tools.rs`, `src/openhuman/agent/harness/engine/core.rs`.
Summary
TokenJuice's compaction savings dashboard (
tokenjuice.savings_stats) prices the tokens it saves using the configured default model (config.default_model), not the model the tool result is actually being compressed for on that turn. Attribute savings to the live per-turn model so the cost figure is accurate for sessions that use model overrides, per-agent models, or team lead/worker model splits.Problem / Context
The content router compacts a tool result inside the agent loop and records the savings via
savings::record(...), which callscost_saved_usd(model, ...)with a process-global attribution model installed once at startup (tokenjuice::savings::configurefromconfig.default_model).The active per-turn model is available deeper in the harness (
run_turn_engine(..., model: &str, ...)), but it is not threaded down to the compaction call sites (agent/harness/session/agent_tool_exec.rs,agent/harness/engine/tools.rs), andAgentToolExecCtxdoes not currently carry it. Threading it through both call sites + the turn engine was deferred to keep the initial savings feature small.Impact today: cost-saved is correct when a session uses the default model, but skewed when:
model),Token counts (the dominant metric) are unaffected — only the USD figure and the
byModelbreakdown attribution are.Scope (optional)
In scope:
run_turn_engineto the tool-execution compaction call sites (extendAgentToolExecCtxand thecompact_output/compact_tool_outputsignatures with an optional model).savings::recordsocost_saved_usdand thebyModelbucket use the per-turn model; fall back to the configured default when unknown.Out of scope:
Acceptance criteria
config.default_model(current behavior), no panics.tokenjuice.savings_statsbyModelreflects the real mix of models across a multi-model session.Related