Skip to content

Attribute TokenJuice savings cost to the per-turn model, not the configured default #4122

Description

@senamakel

Summary

TokenJuice's compaction savings dashboard (tokenjuice.savings_stats) prices the tokens it saves using the configured default model (config.default_model), not the model the tool result is actually being compressed for on that turn. Attribute savings to the live per-turn model so the cost figure is accurate for sessions that use model overrides, per-agent models, or team lead/worker model splits.

Problem / Context

The content router compacts a tool result inside the agent loop and records the savings via savings::record(...), which calls cost_saved_usd(model, ...) with a process-global attribution model installed once at startup (tokenjuice::savings::configure from config.default_model).

The active per-turn model is available deeper in the harness (run_turn_engine(..., model: &str, ...)), but it is not threaded down to the compaction call sites (agent/harness/session/agent_tool_exec.rs, agent/harness/engine/tools.rs), and AgentToolExecCtx does not currently carry it. Threading it through both call sites + the turn engine was deferred to keep the initial savings feature small.

Impact today: cost-saved is correct when a session uses the default model, but skewed when:

  • a per-turn model override is in effect,
  • agents run on different models (lead vs worker, per-agent model),
  • the result is destined for a cheaper/more-expensive tier than the default.

Token counts (the dominant metric) are unaffected — only the USD figure and the byModel breakdown attribution are.

Scope (optional)

In scope:

  • Thread the active model id from run_turn_engine to the tool-execution compaction call sites (extend AgentToolExecCtx and the compact_output / compact_tool_output signatures with an optional model).
  • Pass it into savings::record so cost_saved_usd and the byModel bucket use the per-turn model; fall back to the configured default when unknown.

Out of scope:

  • Output-token cost modeling / re-send amplification (a tool result re-enters context on every subsequent turn) — current model counts a single input occurrence, which is the conservative estimate.

Acceptance criteria

  • Per-turn attribution — savings recorded during a turn are priced with that turn's model, not the global default.
  • Graceful fallback — when the per-turn model is unavailable, attribution falls back to config.default_model (current behavior), no panics.
  • byModel breakdowntokenjuice.savings_stats byModel reflects the real mix of models across a multi-model session.
  • Diff coverage ≥ 80% — the implementing PR meets the changed-lines coverage gate (Vitest + cargo-llvm-cov, enforced by .github/workflows/pr-ci.yml).

Related

  • Follow-up to the TokenJuice content-router / savings work (PR in flight on branch `feat/tokenjuice-content-router`).
  • Code: `src/openhuman/tokenjuice/savings.rs`, `src/openhuman/tokenjuice/compress.rs`, `src/openhuman/agent/harness/session/agent_tool_exec.rs`, `src/openhuman/agent/harness/engine/tools.rs`, `src/openhuman/agent/harness/engine/core.rs`.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

Status
Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions