Skip to content

feat!(telemetry): OpenTelemetry (OTLP) metrics exporter for token compression tracking #85

@mpecan

Description

@mpecan

Summary

Add an optional OpenTelemetry (OTel) exporter so that enterprises can feed tokf's token compression metrics into their existing observability stacks (Datadog, Grafana, Jaeger, Prometheus, etc.) via the standard OTLP protocol, without building custom integrations.

Motivation

Issue #20 proposed an organization server for centralized tracking. An OTel exporter is a lighter-weight, standards-based alternative that does not require self-hosting a dedicated server. Most enterprises already run an OTel Collector or a compatible backend. By emitting metrics via OTLP, tokf becomes a first-class citizen in existing observability pipelines — teams get dashboards, alerts, and cost attribution "for free."

This also positions tokf within the emerging OTel GenAI semantic conventions (semconv v1.39.0, status: Development), which define standard metrics like gen_ai.client.token.usage and gen_ai.client.operation.duration. While tokf is not an LLM client itself, it operates on LLM-bound context and can align with these conventions where applicable, extending them with tokf-specific attributes for compression effectiveness.

Key use cases:

  • Cost management — understanding how much context compression saves across builds, pipelines, and teams
  • Performance monitoring — tracking filter effectiveness over time, detecting regressions when build output formats change
  • Capacity planning — aggregating token usage across an organization to forecast LLM API costs
  • Compliance & auditing — providing an immutable telemetry trail of token processing for enterprise governance

Background: OTel Rust Ecosystem

The Rust OTel SDK is mature enough for production use:

  • opentelemetry (API crate) — traits and no-op implementation for instrumentation
  • opentelemetry-sdk — real SDK with Metrics SDK, Tracing SDK
  • opentelemetry-otlp — OTLP exporter supporting both gRPC (tonic) and HTTP (reqwest), for metrics, traces, and logs
  • opentelemetry-prometheus — Prometheus metrics exporter

OTLP is the recommended export protocol and is natively supported by all major backends (Datadog, Grafana Cloud, New Relic, Honeycomb, Jaeger, etc.) as well as self-hosted OTel Collectors.

Proposed Design

Feature Flag

Gate the entire feature behind a Cargo feature flag to keep the default binary lean:

[features]
default = []
otel = ["otel-http"]  # default to HTTP for simplicity
otel-http = ["opentelemetry", "opentelemetry-sdk", "opentelemetry-otlp/http-proto", "opentelemetry-otlp/reqwest-blocking-client"]
otel-grpc = ["opentelemetry", "opentelemetry-sdk", "opentelemetry-otlp/grpc-tonic"]

HTTP is recommended as the default — it's simpler, firewall-friendly, and sufficient for CLI tools that export metrics on each invocation rather than as a long-running service.

Metrics to Export

Following the OTel GenAI Semantic Conventions (v1.39.0, Development status) where applicable, and extending with tokf-specific metrics:

Standard GenAI-aligned metrics (reused)

Metric Instrument Unit Description
gen_ai.client.token.usage Histogram {token} Token counts with gen_ai.token.type = input (pre-filter) and output (post-filter)

tokf-specific metrics (custom namespace)

Metric Instrument Unit Description
tokf.filter.input_lines Counter {line} Total lines received before filtering
tokf.filter.output_lines Counter {line} Total lines emitted after filtering
tokf.filter.lines_removed Counter {line} Lines removed by the filter
tokf.compression.ratio Gauge 1 Ratio of output to input (0.0–1.0); lower = more compression
tokf.tokens.saved Counter {token} Cumulative tokens saved across invocations
tokf.filter.duration Histogram s Time spent in the filter pipeline
tokf.filter.invocations Counter {invocation} Number of filter invocations

Attributes (dimensions)

Attribute Type Description Example
tokf.filter.name string The filter that was applied cargo/build, jest/run
tokf.command string The wrapped command cargo build, npx jest
tokf.exit_code int Exit code of the wrapped command 0, 1
tokf.version string tokf version 0.1.8
tokf.pipeline string User-supplied pipeline/job identifier my-ci-job
service.name string Defaults to tokf, configurable tokf

Configuration

OTel export should be entirely opt-in, configurable via ~/.tokf/config.toml, CLI flags, and/or environment variables.

Config file

[telemetry.otlp]
enabled = true
endpoint = "http://localhost:4317"      # OTel Collector gRPC endpoint
protocol = "grpc"                        # "grpc" | "http"
# headers = { "api-key" = "secret" }    # optional auth headers

[telemetry.otlp.resource]
service_name = "tokf"
# deployment.environment = "production"

Environment variables (override config)

Standard OTel env vars should be respected:

# Enable OTLP export
tokf --otel-export

# Configure endpoint (defaults to localhost:4317 for gRPC, localhost:4318 for HTTP)
OTEL_EXPORTER_OTLP_ENDPOINT=https://otel-collector.internal:4317

# Protocol selection
OTEL_EXPORTER_OTLP_PROTOCOL=grpc   # or http/protobuf

# Custom headers (for vendor auth)
OTEL_EXPORTER_OTLP_HEADERS=x-api-key=secret123

# Resource attributes
OTEL_RESOURCE_ATTRIBUTES=service.name=tokf,deployment.environment=production

# Master toggle
TOKF_TELEMETRY_ENABLED=true

# tokf-specific
TOKF_OTEL_PIPELINE=my-ci-job

Default: disabled

When no config is set and TOKF_TELEMETRY_ENABLED is not true, tokf behaves exactly as it does today — no OTel dependencies are loaded, no network calls are made.

Architecture

  • Add a telemetry module with a trait TelemetryReporter and two implementations:
    • NoopReporter — used when OTel is disabled (compiles to nothing when otel feature is off)
    • OtelReporter — initializes the OTel SdkMeterProvider with an OTLP MetricExporter and records metrics
  • The reporter is initialized once at startup based on config and passed to the filter execution path
  • Metrics are recorded after each filter invocation
  • The MeterProvider is shut down gracefully on process exit to flush pending metrics

Graceful Degradation

If the OTel endpoint is unreachable, tokf must never block or slow down the wrapped command. The OTel SDK's periodic exporter handles this (failed exports are logged and retried). tokf should set a short export timeout (e.g. 5s) and move on.

Rust Implementation

Key crates:

  • opentelemetry (v0.28+) — API traits
  • opentelemetry_sdk (v0.28+) — SDK implementation with MeterProvider
  • opentelemetry-otlp (v0.31+) — OTLP exporter via HTTP (http-proto feature) or gRPC (grpc-tonic feature)

Homebrew Distribution Strategy

Homebrew doesn't support Cargo feature flags at install time — each formula produces a single binary. The recommended approach is to build the Homebrew formula with OTel enabled by default, since telemetry export is already gated at runtime behind explicit flags (--otel-export / OTEL_EXPORTER_OTLP_ENDPOINT / TOKF_TELEMETRY_ENABLED).

Recommended: OTel-enabled formula (runtime opt-in)

class Tokf < Formula
  desc "Token context compression for LLM pipelines"
  homepage "https://github.com/mpecan/tokf"
  # ...

  depends_on "rust" => :build

  def install
    system "cargo", "install", *std_cargo_args, "--features", "otel"
  end
end

Users who don't set any OTel environment variables or pass --otel-export will see zero behavioral difference — the OTel code paths are never activated. The only cost is a slightly larger binary (expected ~2-5 MB from the OTel + HTTP client dependency tree).

Alternative: Separate formulae (if binary size is a concern)

If benchmarking shows an unacceptable binary size increase, offer two formulae:

# tokf.rb — lean, no OTel
class Tokf < Formula
  def install
    system "cargo", "install", *std_cargo_args
  end
end

# tokf-otel.rb — with OTel support
class TokfOtel < Formula
  conflicts_with "tokf", because: "both install a `tokf` binary"
  def install
    system "cargo", "install", *std_cargo_args, "--features", "otel"
  end
end
# Users choose at install time:
brew install tokf          # lean
brew install tokf-otel     # with telemetry support

Decision criteria

Measure the binary size delta before deciding:

# Without OTel
cargo build --release && ls -lh target/release/tokf

# With OTel
cargo build --release --features otel && ls -lh target/release/tokf

If the delta is under ~5 MB, the single OTel-enabled formula is the pragmatic choice (this is the pattern ripgrep follows with PCRE2 in its Homebrew formula). If larger, consider the separate formulae approach.

cargo install users

Users installing via cargo install can always choose:

# Lean install
cargo install tokf

# With OTel
cargo install tokf --features otel

# With OTel + gRPC
cargo install tokf --features otel,otel-grpc

Integration Examples

Datadog via OTel Collector

# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      http:
        endpoint: 0.0.0.0:4318
exporters:
  datadog:
    api:
      key: ${DD_API_KEY}
service:
  pipelines:
    metrics:
      receivers: [otlp]
      exporters: [datadog]

Prometheus (direct scrape via push gateway)

OTEL_EXPORTER_OTLP_ENDPOINT=http://prometheus-pushgateway:4318 tokf --otel-export cargo build 2>&1

Grafana Cloud

OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp-gateway-prod.grafana.net/otlp \
OTEL_EXPORTER_OTLP_HEADERS="Authorization=Basic $(echo -n 'instance_id:api_key' | base64)" \
tokf --otel-export cargo build 2>&1

Relationship to #20

This issue offers a complementary approach to the org server proposed in #20:

Both can coexist — the OTel exporter ships metrics to whatever backend the org uses, while the org server (if built) could itself accept OTLP input from tokf instances.

Acceptance Criteria

  • otel Cargo feature flag compiles in OTel dependencies only when enabled
  • TelemetryReporter trait with NoopReporter and OtelReporter implementations
  • OtelReporter initializes SdkMeterProvider with OTLP exporter (gRPC and HTTP)
  • All tokf-specific metrics recorded after each filter invocation
  • Standard gen_ai.client.token.usage metric emitted with correct attributes
  • Standard OTel env vars (OTEL_EXPORTER_OTLP_ENDPOINT, etc.) are respected
  • ~/.tokf/config.toml [telemetry.otlp] section parsed and applied
  • Master toggle TOKF_TELEMETRY_ENABLED controls opt-in
  • Default behavior (no config) = no OTel, no network calls, no extra deps
  • OTel endpoint failures do not block or slow the wrapped command
  • Graceful shutdown flushes pending metrics
  • Homebrew formula builds with OTel enabled (runtime opt-in via --otel-export)
  • Binary size delta with OTel is documented and acceptable
  • Integration test: verify metrics emitted to an in-memory exporter with correct attributes
  • README section documenting OTel setup with example Collector + Grafana/Prometheus config

References


Note: This issue supersedes #82 which covered the same feature.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions