feat: per-runtime telemetry adapters + per-agent token attribution#68
Merged
josstei merged 3 commits intoApr 27, 2026
Merged
Conversation
Base automatically changed from
feat/phase-kind-discriminator
to
refactor/mcp-handlers-complexity
April 27, 2026 02:18
… shape Adds canonical Usage shape (input, output, cached) matching the existing transition_phase token_usage parameter so adapter outputs flow into the session-state aggregator without translation. Adds defineTelemetryAdapter factory with strict spec validation (runtime in TELEMETRY_RUNTIMES, function signatures) and defensive ZERO_USAGE fallback when underlying extractUsage returns malformed shape.
Adds real telemetry adapters for Claude (Anthropic SDK usage envelope)
and Codex (OpenAI-style prompt_tokens/completion_tokens with
input_tokens/output_tokens fallback for older CLI versions). Adds stub
adapters for Gemini and Qwen reporting isAvailable=false until a real
telemetry source is identified.
Each runtime-config.js now exposes its telemetry adapter via a
canonical `telemetry` field so the orchestrator has one resolution
surface.
Extends generator allowlist with RUNTIME_PAYLOAD_FILES so
runtime-specific surfaces beyond runtime-config.js (now telemetry, in
the future tracing or others) flow into detached payloads alongside
the config file. Without this, runtime-config's
require('./telemetry-adapter') would fail to resolve in mirrors.
…t attribution
Extends the token_usage accumulator to attribute spend per agent. Adds
optional agent_name parameter to transition_phase (string for solo
phases or array for multi-agent batches). Falls back to phase.agents
from session state, then to "unknown" when no attribution is
available. Multi-agent phases split usage equally with Math.floor; the
total_input/output/cached fields preserve exact totals so the
per-agent sum may underattribute by up to (n-1) tokens — acceptable
for budget tracking.
Defensive init for legacy state shapes: missing token_usage block,
missing by_agent field, or by_agent that mutated into an array all
fall back to empty {} without throwing. Numeric coercion via
Number(...) || 0 protects against malformed counts in legacy YAML.
Adds agent_name to the transition_phase input schema with oneOf
{string, array} so clients reading tools/list discover the parameter.
Updates orchestration-steps step 25 with telemetry guidance: load the
runtime's telemetry adapter from runtime-config.telemetry, call
extractUsage on the invocation result, omit token_usage entirely when
isAvailable returns false (Gemini/Qwen stub case).
Adds an `phaseAgents` option to the prepareSession test helper for
multi-agent fixtures via direct state mutation, since create_session
input only supports a single agent per phase.
c5fe4d0 to
e5ce06f
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements Phase 4 of the gemini-runtime-hardening plan: per-runtime telemetry adapters and per-agent token attribution in
transition_phase. Closes finding F2 (token_usagewas zero across 5 invocations in the captured Gemini test run).Stacks on PR #67 (Phase 2:
phase.kinddiscriminator). Once #67 merges, this PR's base will move automatically.What changed
TelemetryAdapter contract (
src/platforms/shared/adapters/telemetry-adapter-{types,factory}.js):{ input, output, cached }matches the existingtransition_phasetoken_usageparameter — adapter outputs flow into the session-state aggregator without translation.defineTelemetryAdapter({ runtime, extractUsage, isAvailable })enforces typed runtime, function signatures, and defensiveZERO_USAGEfallback when underlyingextractUsagereturns malformed shape.isTelemetryUsagepredicate,TELEMETRY_USAGE_FIELDSandTELEMETRY_RUNTIMESexposed as frozen constants.Per-runtime adapters (
src/platforms/{claude,codex,gemini,qwen}/telemetry-adapter.js):isAvailable: false— Gemini CLI's invoke_agent envelope returns only textual output. Follow-up candidates documented inline.Each
runtime-config.jsnow exposes its adapter via a canonicaltelemetryfield so the orchestrator has one resolution surface.Generator allowlist (
src/generator/payload-builder.js):RUNTIME_PAYLOAD_FILESconstant (runtime-config.js,telemetry-adapter.js) replaces the single hardcoded path. Adding a future per-runtime surface (e.g., atracing-adapter.js) is one line.runtime-config.js'srequire('./telemetry-adapter')resolves in detached payloads (claude/, plugins/maestro/) — without this, generated mirrors would crash at load time.Per-agent attribution in
transition_phase(src/mcp/handlers/session-state-tools.js):agent_nameparameter (string for solo phases or array for multi-agent batches). Falls back tophase.agentsfrom session state, then to'unknown'when no attribution is available.Math.floor. Thetotal_input/output/cachedfields preserve exact totals so the per-agent sum may underattribute by up to (n−1) tokens — acceptable for budget tracking.token_usageblock, missingby_agentfield, orby_agentmutated into an array all fall back to empty{}without throwing.Number(...) || 0coercion protects against malformed counts in legacy YAML.Tool schema (
src/mcp/tool-packs/session/index.js):agent_nameadded withoneOf: [string, array]so clients readingtools/listdiscover the parameter.Orchestration steps (
src/references/orchestration-steps.md):runtime-config.telemetry, callextractUsage(invocationResult), and pass the result astoken_usageplusagent_name. Adapters reportingisAvailable: false(Gemini/Qwen) signal the orchestrator to OMITtoken_usagerather than recording zeros.Test plan
tests/unit/telemetry-adapter-factory.test.js— frozen constants, factory validation,isTelemetryUsageedge cases, malformed-extractUsage fallback, isAvailable error containment.tests/unit/runtime-telemetry-adapters.test.js— Claude full extraction, Codex prompt_tokens preferred over input_tokens, Gemini/Qwen always-zero stubs, runtime-config telemetry export.RUNTIME_PAYLOAD_FILESis frozen and contains bothruntime-config.jsandtelemetry-adapter.js.tests/integration/telemetry-attribution.test.js— single string, array split, phase.agents fallback, empty-array regression guard, "unknown" fallback, missingby_agent, missingtoken_usage, arrayby_agent, omittedtoken_usage, Math.floor underattribution.just check-layersclean.just checkzero drift.Verification limit (mention for reviewers)
The Claude/Codex extraction shapes are plan-assumptions, not live-verified against captured SDK responses. The contract structure is what this PR ships; empirical validation that real invocations produce non-zero
token_usagerequires a follow-up integration run. The Phase 1 fixture pattern (tests/fixtures/runtime-contracts/...) could host real Anthropic/Codex usage envelopes for adapter tests — proposed as a Phase 4 follow-up.Findings addressed
token_usage: { total_input: 0, total_output: 0, total_cached: 0 }across 5 invocations in the original test run. Claude and Codex now extract real usage. Gemini and Qwen explicitly report unavailable so the orchestrator omitstoken_usagerather than recording false zeros.Cross-runtime coverage
isAvailable: false)isAvailable: false)