-
Notifications
You must be signed in to change notification settings - Fork 11
Description
Summary
Add a tokf telemetry sync subcommand that reads invocation events from the local SQLite database and replays any that were not successfully exported to the OTLP backend in real time.
Motivation
The OTel exporter introduced in #85 uses a best-effort model: it waits at most 200 ms for the OTLP flush before giving up (see ADR-0001). Under a slow or temporarily unavailable endpoint, the last invocation's metrics may not reach the backend.
However, every invocation is always written to SQLite first, and the synced_to_otel_at column (added in #85) tracks which rows have been successfully exported. A sync command can replay those rows at any time — from a cron job, a CI post-step, or manually.
Design
Schema (already in place from #85)
-- events table already has:
synced_to_otel_at TEXT -- NULL = not yet syncedCommand
tokf telemetry sync [--since <ISO8601>] [--dry-run] [--limit N]
- Queries
WHERE synced_to_otel_at IS NULL(or--sinceoverride) - Builds raw OTLP HTTP export payloads with correct
start_time_unix_nano/time_unix_nanofrom the storedtimestampcolumn - POSTs directly to the configured OTLP endpoint (no
SdkMeterProvideroverhead, no background thread) - On success: updates
synced_to_otel_at = strftime('%Y-%m-%dT%H:%M:%SZ','now') --dry-run: prints what would be synced without sending--limit N: sync at most N events (for incremental rollouts)
Temporality
Uses Delta temporality (matching the real-time exporter). Each event is a single-invocation delta, so replaying them as historical deltas is semantically correct.
Backend compatibility
- Datadog, Grafana Mimir, New Relic, Honeycomb: accept historical OTLP with timestamps. ✓
- Prometheus Pushgateway: pull-based, rejects historical data. ✗ (document limitation)
Implementation notes
- No
SdkMeterProvider— buildExportMetricsServiceRequestprotobuf directly usingopentelemetry-protocrate (orprost-generated types already pulled in byopentelemetry-otlp) - Sync is idempotent: re-running after a partial failure is safe
- Should respect the same
OTEL_EXPORTER_OTLP_*env vars as the real-time exporter
Acceptance Criteria
-
tokf telemetry syncreplays allsynced_to_otel_at IS NULLevents -
synced_to_otel_atupdated in DB on successful export -
--dry-runflag prints events without sending -
--limit Nlimits batch size - Exit code 0 on success, 1 if any event failed to export
- Works with existing
otel-httpfeature; documents gRPC limitation - Unit tests for payload construction; integration test with local OTel Collector
Relation
Depends on: #85 (schema and synced_to_otel_at column already added)
Related: #87 (tokf telemetry status), #88 (docs)
Referenced in: ADR-0001, consequence "Option C remains open"