Skip to content

history: per-message cost, tokens, and latency tracking (on self-control)#54

Closed
akrentsel wants to merge 3 commits into
exoclaw-self-controlfrom
feature/cost-tracking-self-control
Closed

history: per-message cost, tokens, and latency tracking (on self-control)#54
akrentsel wants to merge 3 commits into
exoclaw-self-controlfrom
feature/cost-tracking-self-control

Conversation

@akrentsel

@akrentsel akrentsel commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Rebuild of #16 on top of exoclaw-self-control, so cost tracking lands cleanly under the self-modification work.

Why a rebuild

The original #16 commit was accidentally authored against a stale pre-#52 tree, so its diff silently reverted main's AgentCore provider and conversation-list pagination. Both that branch and exoclaw-self-control share base dea9fdb, so the pure feature diff applied here with zero conflicts and no reverts. #16 should be closed in favor of this PR.

What it does

Optional UsageRecord on every EventData::Messages event: model id, raw token counts (prompt / completion / cached / cache-creation / reasoning), USD cost, TTFT + wall-clock duration. Cost is policy, computed in userspace, never by the trusted substrate. See docs/cost-tracking-design.md for the full design, including the layering and per-provider math.

All three review decisions from #16 are included: boundary-aware prefix lookup, Bedrock treated as inclusive (additive Bedrock-Claude left as TODO), pricing source threaded through clap with EXO_LITELLM_PRICES_* env fallbacks.

Live verification (gpt-4o, both userspaces)

  • Rust Basic harness: stored cost_usd matches rate table exactly; duration_ms recorded; user-message events carry usage: null.
  • Exoclaw / TS harness: same, including completion_reasoning_tokens; TS loader consumed the price cache written by the Rust CLI (shared cache path convention).
  • Prompt-cache math: second turns with prompt_cached_tokens > 0 on both paths; stored cost matches the inclusive formula (prompt − cached)·in + cached·cache_read + completion·out to the last digit (misclassification as additive would have over-billed ~3×).
  • Provider-echoed dated model ids (gpt-4o-2024-08-06) resolve via the boundary-aware prefix lookup on both implementations.

Not live-tested (unit-tested only): Anthropic additive path and cache-creation tokens — no direct Anthropic key in the test environment.

Note on docker warm sandboxes

While testing, confirmed that docker warm-sandbox reuse works on this branch across REPL restarts (state persists, no duplicate containers). The standalone fix/docker-warm-sandbox branch (prototype of closed #43) is superseded by exoclaw-self-control's own find_running_docker_warm_sandbox (from "Add Exoclaw sandbox transparency controls") and can be deleted. One nit inherited from that implementation is fixed here in its own commit: the docker listing shelled out without the admin-command timeout the prototype had, while running under the warm_sandboxes mutex — a hung docker daemon would have blocked every sandbox operation in the process. It now goes through run_container_admin_command like the Apple path. Re-verified live after the change.

Known follow-ups (also listed in the design doc): TS path records no ttft_ms/duration_ms; azure_ai provider classified additive (hosts non-Anthropic models too); /usage REPL surface (#29) to be re-ported on top of this.

🤖 Generated with Claude Code

akrentsel and others added 3 commits June 12, 2026 05:46
Add an optional UsageRecord to every EventData::Messages event: model id,
raw token counts (prompt / completion / cached / cache-creation /
reasoning), USD cost, and TTFT + wall-clock duration. Fields are Option +
skip_serializing_if and the record is boxed; legacy events still parse.

Cost is policy, computed in userspace, never by the trusted substrate:

- crates/cost: a standalone library with the price-table data model, a
  self-contained LiteLLM loader (explicit path/url, on-disk cache,
  degrade-to-empty), and per-provider math. Lookup is boundary-aware so
  dated revisions resolve without sliding a model onto a shorter
  neighbor's rate. Anthropic-family bills additively; everything else
  (including Bedrock, a TODO) is inclusive.
- exoharness stays minimal: it holds the UsageRecord schema and persists
  it verbatim, with no pricing code or dependency.
- Basic executor fills cost from a table loaded once at startup and
  injected via the CLI (--pricing-path / --pricing-url, env as fallback).
- The TypeScript harness (exoclaw) has its own self-contained cost port
  (@exo/model-runtime/cost) that owns its data loading (env override, own
  cache, own fetch), so per-message cost works there with no dependency
  on the Rust loader or the trusted layer.

RLM is left unwired for now: its multi-call turn has different
per-message accounting and is a separate follow-up.

Rebuilt from feature/message-cost-tracking on top of exoclaw-self-control
(both share base dea9fdb), replacing the original commit that was
accidentally authored against a stale pre-#52 tree.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
find_running_docker_warm_sandbox shelled out to docker ps with a raw
Command and no timeout, while its caller ensure_warm_sandbox_ready holds
the warm_sandboxes mutex — a hung docker daemon would block every sandbox
operation in the process indefinitely. Route it through
run_container_admin_command with WARM_SANDBOX_CLEANUP_TIMEOUT, matching
the Apple Container sibling.

Verified live: docker warm-sandbox reuse across REPL restarts still works
(state persists, no duplicate containers).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The motivating demo for per-message cost tracking composed with exoclaw
self-control: the agent runs a tool-heavy repo-health-report task, reads
its own usage records via list_conversation_events, diagnoses waste,
modifies its own prompts/harness/bindings, rebuilds via the guardian, and
re-runs to prove the saving. Includes the run protocol, candidate
self-modifications by ambition tier, success gates (cost, quality,
honesty, survival), and rails.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@akrentsel

Copy link
Copy Markdown
Collaborator Author

sorry for this – got created as a PR accidentally, not real. this was reviewed and merged into main in #56

@akrentsel akrentsel closed this Jun 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant