Skip to content

feat: per-runtime telemetry adapters + per-agent token attribution#68

Merged
josstei merged 3 commits into
refactor/mcp-handlers-complexityfrom
feat/telemetry-adapter
Apr 27, 2026
Merged

feat: per-runtime telemetry adapters + per-agent token attribution#68
josstei merged 3 commits into
refactor/mcp-handlers-complexityfrom
feat/telemetry-adapter

Conversation

@josstei

@josstei josstei commented Apr 27, 2026

Copy link
Copy Markdown
Owner

Summary

Implements Phase 4 of the gemini-runtime-hardening plan: per-runtime telemetry adapters and per-agent token attribution in transition_phase. Closes finding F2 (token_usage was zero across 5 invocations in the captured Gemini test run).

Stacks on PR #67 (Phase 2: phase.kind discriminator). Once #67 merges, this PR's base will move automatically.

What changed

TelemetryAdapter contract (src/platforms/shared/adapters/telemetry-adapter-{types,factory}.js):

  • Canonical Usage shape { input, output, cached } matches the existing transition_phase token_usage parameter — adapter outputs flow into the session-state aggregator without translation.
  • defineTelemetryAdapter({ runtime, extractUsage, isAvailable }) enforces typed runtime, function signatures, and defensive ZERO_USAGE fallback when underlying extractUsage returns malformed shape.
  • isTelemetryUsage predicate, TELEMETRY_USAGE_FIELDS and TELEMETRY_RUNTIMES exposed as frozen constants.

Per-runtime adapters (src/platforms/{claude,codex,gemini,qwen}/telemetry-adapter.js):

Runtime Status Source
Claude Real Anthropic SDK Usage envelope (input_tokens, output_tokens, cache_read_input_tokens + cache_creation_input_tokens)
Codex Real OpenAI-style prompt_tokens/completion_tokens with input_tokens/output_tokens fallback for older CLI versions; cached_tokens
Gemini Stub isAvailable: false — Gemini CLI's invoke_agent envelope returns only textual output. Follow-up candidates documented inline.
Qwen Stub Same architecture as Gemini, same TODO.

Each runtime-config.js now exposes its adapter via a canonical telemetry field so the orchestrator has one resolution surface.

Generator allowlist (src/generator/payload-builder.js):

  • RUNTIME_PAYLOAD_FILES constant (runtime-config.js, telemetry-adapter.js) replaces the single hardcoded path. Adding a future per-runtime surface (e.g., a tracing-adapter.js) is one line.
  • Required so runtime-config.js's require('./telemetry-adapter') resolves in detached payloads (claude/, plugins/maestro/) — without this, generated mirrors would crash at load time.

Per-agent attribution in transition_phase (src/mcp/handlers/session-state-tools.js):

  • New optional agent_name parameter (string for solo phases or array for multi-agent batches). Falls back to phase.agents from session state, then to 'unknown' when no attribution is available.
  • Multi-agent phases split usage equally with Math.floor. The total_input/output/cached fields preserve exact totals so the per-agent sum may underattribute by up to (n−1) tokens — acceptable for budget tracking.
  • Defensive init for legacy state shapes: missing token_usage block, missing by_agent field, or by_agent mutated into an array all fall back to empty {} without throwing.
  • Number(...) || 0 coercion protects against malformed counts in legacy YAML.

Tool schema (src/mcp/tool-packs/session/index.js):

  • agent_name added with oneOf: [string, array] so clients reading tools/list discover the parameter.

Orchestration steps (src/references/orchestration-steps.md):

  • Step 25 now instructs the orchestrator to load the runtime's telemetry adapter via runtime-config.telemetry, call extractUsage(invocationResult), and pass the result as token_usage plus agent_name. Adapters reporting isAvailable: false (Gemini/Qwen) signal the orchestrator to OMIT token_usage rather than recording zeros.

Test plan

  • Telemetry contract: 23 unit tests in tests/unit/telemetry-adapter-factory.test.js — frozen constants, factory validation, isTelemetryUsage edge cases, malformed-extractUsage fallback, isAvailable error containment.
  • Runtime adapters: 18 unit tests in tests/unit/runtime-telemetry-adapters.test.js — Claude full extraction, Codex prompt_tokens preferred over input_tokens, Gemini/Qwen always-zero stubs, runtime-config telemetry export.
  • Generator allowlist: payload-builder tests updated to assert RUNTIME_PAYLOAD_FILES is frozen and contains both runtime-config.js and telemetry-adapter.js.
  • Per-agent attribution integration: 10 tests in tests/integration/telemetry-attribution.test.js — single string, array split, phase.agents fallback, empty-array regression guard, "unknown" fallback, missing by_agent, missing token_usage, array by_agent, omitted token_usage, Math.floor underattribution.
  • Full suite: 1230 passing (was 1177 on Phase 2 base).
  • just check-layers clean.
  • just check zero drift.

Verification limit (mention for reviewers)

The Claude/Codex extraction shapes are plan-assumptions, not live-verified against captured SDK responses. The contract structure is what this PR ships; empirical validation that real invocations produce non-zero token_usage requires a follow-up integration run. The Phase 1 fixture pattern (tests/fixtures/runtime-contracts/...) could host real Anthropic/Codex usage envelopes for adapter tests — proposed as a Phase 4 follow-up.

Findings addressed

  • F2token_usage: { total_input: 0, total_output: 0, total_cached: 0 } across 5 invocations in the original test run. Claude and Codex now extract real usage. Gemini and Qwen explicitly report unavailable so the orchestrator omits token_usage rather than recording false zeros.

Cross-runtime coverage

Runtime Telemetry adapter runtime-config exposure
Claude Real (Anthropic SDK Usage)
Codex Real (OpenAI-style + fallback)
Gemini Stub (isAvailable: false)
Qwen Stub (isAvailable: false)

Base automatically changed from feat/phase-kind-discriminator to refactor/mcp-handlers-complexity April 27, 2026 02:18
josstei added 3 commits April 26, 2026 22:20
… shape

Adds canonical Usage shape (input, output, cached) matching the
existing transition_phase token_usage parameter so adapter outputs
flow into the session-state aggregator without translation.

Adds defineTelemetryAdapter factory with strict spec validation
(runtime in TELEMETRY_RUNTIMES, function signatures) and defensive
ZERO_USAGE fallback when underlying extractUsage returns malformed
shape.
Adds real telemetry adapters for Claude (Anthropic SDK usage envelope)
and Codex (OpenAI-style prompt_tokens/completion_tokens with
input_tokens/output_tokens fallback for older CLI versions). Adds stub
adapters for Gemini and Qwen reporting isAvailable=false until a real
telemetry source is identified.

Each runtime-config.js now exposes its telemetry adapter via a
canonical `telemetry` field so the orchestrator has one resolution
surface.

Extends generator allowlist with RUNTIME_PAYLOAD_FILES so
runtime-specific surfaces beyond runtime-config.js (now telemetry, in
the future tracing or others) flow into detached payloads alongside
the config file. Without this, runtime-config's
require('./telemetry-adapter') would fail to resolve in mirrors.
…t attribution

Extends the token_usage accumulator to attribute spend per agent. Adds
optional agent_name parameter to transition_phase (string for solo
phases or array for multi-agent batches). Falls back to phase.agents
from session state, then to "unknown" when no attribution is
available. Multi-agent phases split usage equally with Math.floor; the
total_input/output/cached fields preserve exact totals so the
per-agent sum may underattribute by up to (n-1) tokens — acceptable
for budget tracking.

Defensive init for legacy state shapes: missing token_usage block,
missing by_agent field, or by_agent that mutated into an array all
fall back to empty {} without throwing. Numeric coercion via
Number(...) || 0 protects against malformed counts in legacy YAML.

Adds agent_name to the transition_phase input schema with oneOf
{string, array} so clients reading tools/list discover the parameter.
Updates orchestration-steps step 25 with telemetry guidance: load the
runtime's telemetry adapter from runtime-config.telemetry, call
extractUsage on the invocation result, omit token_usage entirely when
isAvailable returns false (Gemini/Qwen stub case).

Adds an `phaseAgents` option to the prepareSession test helper for
multi-agent fixtures via direct state mutation, since create_session
input only supports a single agent per phase.
@josstei josstei force-pushed the feat/telemetry-adapter branch from c5fe4d0 to e5ce06f Compare April 27, 2026 02:20
@josstei josstei merged commit 2110955 into refactor/mcp-handlers-complexity Apr 27, 2026
2 checks passed
@josstei josstei deleted the feat/telemetry-adapter branch April 27, 2026 02:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant