Skip to content

Commit 8def9f2

Browse files
committed
M3 persist aggregate call origin metadata
1 parent b66e268 commit 8def9f2

16 files changed

Lines changed: 321 additions & 189 deletions

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
## Unreleased
44

55
- Remove low-value call/thread anchor diagnostics from the experimental call investigator to avoid an extra source-log scan per context load.
6+
- Persist call-origin metadata as categorical aggregate fields during indexing so normal dashboard payloads do not reopen source JSONL logs to infer user-vs-Codex initiation.
67

78
## 0.5.0 - 2026-06-10
89

docs/architecture.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,8 @@ Codex Usage Tracker is a local sidecar app. It reads aggregate token counters fr
44

55
## Boundaries
66

7-
- `parser.py` converts local JSONL events into aggregate `UsageEvent` records. It must not persist prompts, assistant text, tool output, or transcript snippets.
7+
- `parser.py` converts local JSONL events into aggregate `UsageEvent` records. It also attaches metadata-only call-origin categories such as user message, tool result, post-compaction, and agent continuation. It must not persist prompts, assistant text, tool output, or transcript snippets.
8+
- `call_origin.py` owns the pure call-origin classifier and migrated-row fallback. It must not open source JSONL files; source-log reads belong in parser refresh or explicit context loading only.
89
- `schema.py` owns persisted `usage_events` columns. Add columns there before changing SQLite migrations or export behavior.
910
- `store.py` owns SQLite setup, refresh, rebuild, and query access. Keep filesystem scanning, database writes, SQL prefilters, counts, limits, and offsets here.
1011
- `reports.py` is the application-service layer for summaries, expensive-call reports, recommendations, pricing coverage, and filtered query payloads. CLI and MCP should call this layer instead of duplicating report assembly.
@@ -28,7 +29,7 @@ Codex Usage Tracker is a local sidecar app. It reads aggregate token counters fr
2829
4. Add dashboard-only interactions in `plugin_data/dashboard/dashboard.js` and keep URL state in `dashboard_state.js`.
2930
5. Keep all examples, screenshots, mocks, and tests synthetic. Never derive fixtures from real logs.
3031
6. When editing skill instructions, update both the source `skills/...` file and the bundled `src/codex_usage_tracker/plugin_data/skills/...` copy. `scripts/check_release.py` verifies that installable plugin assets stay complete and synced.
31-
7. When adding fields derived from `cwd`, Git metadata, or source paths, decide how they behave in `normal`, `redacted`, and `strict` privacy modes before exposing them in dashboard, JSON, CSV, MCP, or support-bundle output.
32+
7. When adding fields derived from `cwd`, Git metadata, source paths, or log-event metadata, decide how they behave in `normal`, `redacted`, and `strict` privacy modes before exposing them in dashboard, JSON, CSV, MCP, or support-bundle output.
3233

3334
## Validation
3435

docs/call-drilldown-performance-checklist.md

Lines changed: 25 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -38,9 +38,9 @@ Milestone 0 inspection ran on `perf/call-drilldown-performance-hardening` after
3838

3939
Suspected hot paths confirmed by source inspection:
4040

41-
- `src/codex_usage_tracker/dashboard.py` calls `annotate_rows_with_call_origin(...)` inside `dashboard_payload`.
42-
- `src/codex_usage_tracker/call_origin.py` groups rows by `source_file` and opens each JSONL file to infer call origin.
43-
- `src/codex_usage_tracker/server.py` serves `/api/usage` by calling `dashboard_payload`, so live refresh inherits the source-log scan.
41+
- M3 removed the `dashboard_payload` source-log call-origin scan. Call origin is now persisted as aggregate categorical metadata during parser refresh, with a cheap fallback for migrated rows.
42+
- M3 converted `src/codex_usage_tracker/call_origin.py` to pure classifiers that do not open source JSONL files.
43+
- `src/codex_usage_tracker/server.py` serves `/api/usage` by calling `dashboard_payload`; after M3, this no longer inherits call-origin source-log reads.
4444
- M2 removed `_read_call_anchors(...)` from `load_call_context`, so explicit context loading no longer performs the extra anchor scan.
4545
- M2 removed all dashboard reads of `payload.call_anchors` and `payload.thread_anchors`.
4646
- `src/codex_usage_tracker/plugin_data/dashboard/dashboard_data.js` builds helper indexes, but adjacent-call lookup and render paths still need a focused large-history review.
@@ -59,7 +59,7 @@ Already implemented before this branch:
5959
- [x] M0.1 contain calls-table horizontal overflow inside the table card.
6060
- [x] M1 validate and package the call investigator dashboard asset in CI, docs, and release checks.
6161
- [x] M2 remove low-value call/thread anchor diagnostics and their extra context source scan.
62-
- [ ] M3 persist aggregate call-origin metadata during indexing so dashboard payloads do not scan source logs.
62+
- [x] M3 persist aggregate call-origin metadata during indexing so dashboard payloads do not scan source logs.
6363
- [ ] M4 persist cheap performance-critical dashboard query helper fields where feasible.
6464
- [ ] M5 add optional timing diagnostics to `/api/usage` and `/api/context`.
6565
- [ ] M6 make explicit context loading single-pass where practical.
@@ -102,10 +102,21 @@ Full branch closeout should also run the release validation listed in `docs/deve
102102
- `docs/development.md`
103103
- `src/codex_usage_tracker/plugin_data/dashboard/dashboard.css`
104104
- `src/codex_usage_tracker/context.py`
105+
- `src/codex_usage_tracker/call_origin.py`
106+
- `src/codex_usage_tracker/dashboard.py`
107+
- `src/codex_usage_tracker/models.py`
108+
- `src/codex_usage_tracker/parser.py`
109+
- `src/codex_usage_tracker/schema.py`
110+
- `src/codex_usage_tracker/store.py`
105111
- `src/codex_usage_tracker/plugin_data/dashboard/dashboard.js`
106112
- `src/codex_usage_tracker/plugin_data/dashboard/dashboard_call_investigator.js`
113+
- `docs/privacy.md`
107114
- `tests/test_privacy.py`
115+
- `tests/test_call_origin.py`
116+
- `tests/test_parser.py`
117+
- `tests/test_schema.py`
108118
- `tests/test_store_dashboard_mcp.py`
119+
- `tests/test_store_migrations.py`
109120

110121
## Tests Run
111122

@@ -126,27 +137,32 @@ Full branch closeout should also run the release validation listed in `docs/deve
126137
- `python -m pytest tests/test_privacy.py -q`
127138
- `python -m pytest tests/test_store_dashboard_mcp.py -q`
128139
- `python scripts/check_release.py`
140+
- M3 persisted call-origin metadata:
141+
- `python -m pytest tests/test_call_origin.py tests/test_parser.py::test_parser_ignores_known_non_token_context_compaction_event tests/test_parser.py::test_parser_persists_call_origin_from_metadata_segments tests/test_store_dashboard_mcp.py::test_dashboard_payload_uses_persisted_call_origin_without_source_scan -q` failed before implementation because the pure classifier API was missing.
142+
- `python -m pytest tests/test_call_origin.py tests/test_parser.py::test_parser_ignores_known_non_token_context_compaction_event tests/test_parser.py::test_parser_persists_call_origin_from_metadata_segments tests/test_schema.py tests/test_store_migrations.py::test_init_db_migrates_legacy_aggregate_table_without_data_loss tests/test_store_migrations.py::test_csv_export_keeps_current_columns_after_legacy_migration tests/test_store_dashboard_mcp.py::test_dashboard_payload_uses_persisted_call_origin_without_source_scan -q`
143+
- `python -m pytest tests/test_parser.py tests/test_call_origin.py tests/test_store_migrations.py tests/test_privacy.py tests/test_store_dashboard_mcp.py -q`
144+
- `python scripts/check_release.py`
129145

130146
## Benchmarks Run
131147

132-
- None yet. Benchmarks start after implementation milestones change measurable behavior.
148+
- None yet. M3 removed a source-log scan path and added regression tests; benchmark coverage starts in M8.
133149

134150
## Known Remaining Slow Paths
135151

136-
- Normal `dashboard_payload` currently runs source-file call-origin annotation.
137-
- Live `/api/usage` currently calls `dashboard_payload` and inherits that work.
152+
- Normal `dashboard_payload` no longer runs source-file call-origin annotation.
153+
- Live `/api/usage` still calls `dashboard_payload`, but after M3 it should not open source JSONL files for call-origin metadata.
138154
- Context loading still does selected-turn evidence and serialized-evidence work; Milestone 6 must verify whether that can be reduced to one source-file pass.
139155
- Large-history live dashboard still ships broad payloads before the SQLite-backed API slice work.
140156

141157
## Privacy Notes
142158

143159
- Milestone 0 made no product behavior changes.
144160
- The branch must keep all test data synthetic and must not persist raw transcript content.
145-
- Persisted call-origin work must store only categorical labels, reasons, and confidence values.
161+
- Persisted call-origin stores only categorical labels, reasons, and confidence values. Parser tests and privacy tests cover this with synthetic secret-bearing message/tool/compaction payloads.
146162

147163
## Merge Blockers
148164

149-
- `dashboard_payload` and `/api/usage` must stop opening source JSONL files.
165+
- `dashboard_payload` and `/api/usage` must stop opening source JSONL files. M3 covers the call-origin path; future milestones must preserve that invariant as APIs are split.
150166
- The call investigator asset must be syntax-checked in CI and release validation.
151167
- Raw call/thread anchors are removed; keep regression tests proving `call_anchors` and `thread_anchors` stay out of context payloads.
152168
- Focused privacy tests must prove no raw prompts, assistant messages, tool output, replacement history, or raw JSONL fragments are persisted by default.

docs/privacy.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ The local SQLite database is stored at `~/.codex-usage-tracker/usage.sqlite3` by
1010
- model, reasoning effort, context window
1111
- token counts and derived efficiency ratios
1212
- subagent source, role, nickname, parent session id, and parent thread name when present
13+
- call-origin category, reason, and confidence labels derived from event metadata during indexing
1314
- pricing, credit, allowance, recommendation, and project metadata derived from aggregate fields
1415

1516
## Not Stored
@@ -25,6 +26,8 @@ The parser intentionally does not store:
2526

2627
Those fields are not written to SQLite, CSV exports, generated dashboard HTML, or synthetic screenshots.
2728

29+
Call-origin metadata is heuristic and confidence-labeled. It stores categories such as `user`, `codex`, or `unknown` plus a reason such as `user_message`, `tool_result`, `post_compaction`, or `agent_continuation`. It does not store the message text, tool output, compaction replacement text, or raw JSONL fragment that produced the category.
30+
2831
## On-Demand Context
2932

3033
`usage_call_context`, `codex-usage-tracker context`, and the `serve-dashboard` context endpoint read a single source JSONL file only when explicitly requested. Returned context is redacted for common secret patterns and capped in size by default for CLI/MCP requests. The call investigator uses the same endpoint at runtime and requests full redacted evidence for the selected call when the local context API is enabled; that still does not persist raw context into SQLite, CSV, support bundles, or generated dashboard HTML.

src/codex_usage_tracker/call_origin.py

Lines changed: 50 additions & 93 deletions
Original file line numberDiff line numberDiff line change
@@ -2,93 +2,35 @@
22

33
from __future__ import annotations
44

5-
import json
6-
from collections import defaultdict
5+
from collections.abc import Iterable, Mapping
76
from dataclasses import dataclass
8-
from pathlib import Path
97
from typing import Any
108

119

1210
@dataclass(frozen=True)
13-
class _EventFlags:
11+
class CallOriginFlags:
12+
"""Metadata-only signals observed before one token_count callback."""
13+
1414
user_message: bool = False
1515
compaction: bool = False
1616
tool_result: bool = False
1717
codex_activity: bool = False
1818

19+
@property
20+
def has_signal(self) -> bool:
21+
return (
22+
self.user_message
23+
or self.compaction
24+
or self.tool_result
25+
or self.codex_activity
26+
)
1927

20-
def annotate_rows_with_call_origin(rows: list[dict[str, Any]]) -> list[dict[str, Any]]:
21-
"""Annotate dashboard rows with derived call-level initiator metadata.
2228

23-
The persisted ``thread_source`` field is session-level. A normal user-created
24-
thread can still contain many Codex-initiated model calls after tool results,
25-
agent continuations, or compactions. This helper reads only source JSONL event
26-
metadata around token-count lines. It does not copy prompt, assistant, or tool
27-
text into the returned rows.
28-
"""
29+
def event_flags_from_envelope(envelope: object) -> CallOriginFlags:
30+
"""Return categorical call-origin flags without reading raw text fields."""
2931

30-
annotated = [dict(row) for row in rows]
31-
rows_by_file: dict[str, dict[int, list[dict[str, Any]]]] = defaultdict(
32-
lambda: defaultdict(list)
33-
)
34-
for row in annotated:
35-
source_file = row.get("source_file")
36-
line_number = _positive_int(row.get("line_number"))
37-
if isinstance(source_file, str) and source_file and line_number is not None:
38-
rows_by_file[source_file][line_number].append(row)
39-
else:
40-
row.update(_fallback_origin(row, reason="missing_source"))
41-
42-
for source_file, rows_by_line in rows_by_file.items():
43-
annotations = _classify_source_file(Path(source_file), set(rows_by_line))
44-
for line_number, line_rows in rows_by_line.items():
45-
annotation = annotations.get(line_number)
46-
for row in line_rows:
47-
row.update(annotation or _fallback_origin(row, reason="source_unavailable"))
48-
return annotated
49-
50-
51-
def _classify_source_file(path: Path, target_lines: set[int]) -> dict[int, dict[str, str]]:
52-
if not target_lines or not path.exists():
53-
return {}
54-
max_line = max(target_lines)
55-
annotations: dict[int, dict[str, str]] = {}
56-
segment: list[_EventFlags] = []
57-
try:
58-
with path.open(encoding="utf-8") as handle:
59-
for line_number, line in enumerate(handle, start=1):
60-
if line_number > max_line:
61-
break
62-
try:
63-
envelope = json.loads(line)
64-
except json.JSONDecodeError:
65-
continue
66-
if _is_token_count(envelope):
67-
if line_number in target_lines:
68-
annotations[line_number] = _classify_segment(segment)
69-
segment = []
70-
continue
71-
segment.append(_event_flags(envelope))
72-
except OSError:
73-
return {}
74-
return annotations
75-
76-
77-
def _classify_segment(segment: list[_EventFlags]) -> dict[str, str]:
78-
if any(event.user_message for event in segment):
79-
return _origin("user", "user_message", "high")
80-
if any(event.compaction for event in segment):
81-
return _origin("codex", "post_compaction", "high")
82-
if any(event.tool_result for event in segment):
83-
return _origin("codex", "tool_result", "high")
84-
if any(event.codex_activity for event in segment):
85-
return _origin("codex", "agent_continuation", "medium")
86-
return _origin("unknown", "no_signal", "low")
87-
88-
89-
def _event_flags(envelope: object) -> _EventFlags:
9032
if not isinstance(envelope, dict):
91-
return _EventFlags()
33+
return CallOriginFlags()
9234
payload = envelope.get("payload")
9335
if not isinstance(payload, dict):
9436
payload = {}
@@ -116,34 +58,57 @@ def _event_flags(envelope: object) -> _EventFlags:
11658
and payload_type in {"message", "reasoning", "function_call", "tool_search_call"}
11759
and role != "user"
11860
)
119-
return _EventFlags(
61+
return CallOriginFlags(
12062
user_message=user_message,
12163
compaction=compaction,
12264
tool_result=tool_result,
12365
codex_activity=codex_activity,
12466
)
12567

12668

127-
def _is_token_count(envelope: object) -> bool:
128-
if not isinstance(envelope, dict):
129-
return False
130-
payload = envelope.get("payload")
131-
return (
132-
envelope.get("type") == "event_msg"
133-
and isinstance(payload, dict)
134-
and payload.get("type") == "token_count"
135-
)
69+
def classify_call_origin(segment: Iterable[CallOriginFlags]) -> dict[str, str]:
70+
"""Classify who most likely initiated a model call from metadata-only signals."""
71+
72+
flags = list(segment)
73+
if any(event.user_message for event in flags):
74+
return _origin("user", "user_message", "high")
75+
if any(event.compaction for event in flags):
76+
return _origin("codex", "post_compaction", "high")
77+
if any(event.tool_result for event in flags):
78+
return _origin("codex", "tool_result", "high")
79+
if any(event.codex_activity for event in flags):
80+
return _origin("codex", "agent_continuation", "medium")
81+
return _origin("unknown", "no_signal", "low")
82+
13683

84+
def fallback_call_origin(row: Mapping[str, Any]) -> dict[str, str]:
85+
"""Return cheap categorical origin for migrated rows missing persisted metadata."""
13786

138-
def _fallback_origin(row: dict[str, Any], *, reason: str) -> dict[str, str]:
13987
if (
14088
row.get("model") == "codex-auto-review"
14189
or row.get("thread_source") == "subagent"
14290
or row.get("subagent_type")
14391
or row.get("parent_session_id")
14492
):
14593
return _origin("codex", "thread_source", "medium")
146-
return _origin("unknown", reason, "low")
94+
return _origin("unknown", "missing_origin", "low")
95+
96+
97+
def ensure_call_origin(row: Mapping[str, Any]) -> dict[str, Any]:
98+
"""Copy a row and fill missing persisted origin fields without source-log reads."""
99+
100+
copied = dict(row)
101+
if (
102+
isinstance(copied.get("call_initiator"), str)
103+
and copied["call_initiator"]
104+
and isinstance(copied.get("call_initiator_reason"), str)
105+
and copied["call_initiator_reason"]
106+
and isinstance(copied.get("call_initiator_confidence"), str)
107+
and copied["call_initiator_confidence"]
108+
):
109+
return copied
110+
copied.update(fallback_call_origin(copied))
111+
return copied
147112

148113

149114
def _origin(initiator: str, reason: str, confidence: str) -> dict[str, str]:
@@ -152,11 +117,3 @@ def _origin(initiator: str, reason: str, confidence: str) -> dict[str, str]:
152117
"call_initiator_reason": reason,
153118
"call_initiator_confidence": confidence,
154119
}
155-
156-
157-
def _positive_int(value: object) -> int | None:
158-
try:
159-
parsed = int(value) # type: ignore[arg-type]
160-
except (TypeError, ValueError):
161-
return None
162-
return parsed if parsed > 0 else None

src/codex_usage_tracker/dashboard.py

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
load_allowance_config,
1818
summarize_allowance_usage,
1919
)
20-
from codex_usage_tracker.call_origin import annotate_rows_with_call_origin
20+
from codex_usage_tracker.call_origin import ensure_call_origin
2121
from codex_usage_tracker.i18n import dashboard_i18n_payload, language_direction, translations_for
2222
from codex_usage_tracker.paths import (
2323
DEFAULT_ALLOWANCE_PATH,
@@ -68,15 +68,16 @@ def dashboard_payload(
6868
privacy_mode = validate_privacy_mode(privacy_mode)
6969
normalized_offset = _normalize_offset(offset)
7070
rows = annotate_thread_attachments(
71-
annotate_rows_with_call_origin(
72-
query_dashboard_events(
71+
[
72+
ensure_call_origin(row)
73+
for row in query_dashboard_events(
7374
db_path=db_path,
7475
limit=limit,
7576
offset=normalized_offset,
7677
since=since,
7778
include_archived=include_archived,
7879
)
79-
)
80+
]
8081
)
8182
pricing = load_pricing_config(pricing_path)
8283
allowance = load_allowance_config(allowance_path, rate_card_path=rate_card_path)

src/codex_usage_tracker/models.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,9 @@ class UsageEvent:
3232
effort: str | None
3333
current_date: str | None
3434
timezone: str | None
35+
call_initiator: str | None
36+
call_initiator_reason: str | None
37+
call_initiator_confidence: str | None
3538
thread_source: str | None
3639
subagent_type: str | None
3740
agent_role: str | None

0 commit comments

Comments
 (0)