Skip to content

feat: Add Prometheus and OpenTelemetry exporters for middleware metrics #87

@guyernest

Description

@guyernest

Problem

The existing MetricsMiddleware collects metrics but lacks integration with standard observability platforms:

  • No Prometheus exporter for scraping
  • No OpenTelemetry integration for distributed tracing
  • No histogram/percentile metrics (only counters)
  • Manual metrics retrieval required

Production deployments need:

  • Automatic metrics export to monitoring systems
  • Distributed tracing correlation
  • Standard metric formats (Prometheus, OTLP)
  • Histograms for latency distribution

Proposed Solution

Add optional exporters that integrate with MetricsMiddleware and MiddlewareContext:

1. Prometheus Exporter

#[cfg(feature = "prometheus")]
pub struct PrometheusExporter {
    config: PrometheusConfig,
    registry: prometheus::Registry,
}

#[cfg(feature = "prometheus")]
pub struct PrometheusConfig {
    /// HTTP endpoint for /metrics
    pub endpoint: String,
    
    /// Port for metrics server
    pub port: u16,
    
    /// Metric prefix
    pub prefix: String,
    
    /// Include default process metrics
    pub include_process_metrics: bool,
    
    /// Histogram buckets for latency (seconds)
    pub latency_buckets: Vec<f64>,
}

impl Default for PrometheusConfig {
    fn default() -> Self {
        Self {
            endpoint: "/metrics".to_string(),
            port: 9090,
            prefix: "mcp".to_string(),
            include_process_metrics: true,
            latency_buckets: vec![0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1.0, 5.0],
        }
    }
}

Exposed Prometheus metrics:

# Request metrics
mcp_requests_total{method="tools/call",status="success"} 1523
mcp_requests_total{method="tools/call",status="error"} 47
mcp_request_duration_seconds_bucket{method="tools/call",le="0.005"} 892
mcp_request_duration_seconds_bucket{method="tools/call",le="0.01"} 1401
mcp_request_duration_seconds_bucket{method="tools/call",le="0.05"} 1520
mcp_request_duration_seconds_sum{method="tools/call"} 12.456
mcp_request_duration_seconds_count{method="tools/call"} 1570

# Transport metrics
mcp_transport_bytes_sent_total{transport="http"} 1048576
mcp_transport_bytes_received_total{transport="http"} 2097152
mcp_transport_connection_errors_total{transport="http"} 3

# Middleware metrics (from CircuitBreaker, RateLimit, etc.)
mcp_circuit_breaker_state{scope="server",key="https://api.example.com"} 0
mcp_circuit_breaker_failures_total{scope="server",key="https://api.example.com"} 5
mcp_rate_limit_requests_allowed_total{scope="method",key="tools/call"} 9523
mcp_rate_limit_requests_rejected_total{scope="method",key="tools/call"} 47
mcp_rate_limit_current_usage{scope="method",key="tools/call"} 0.85

# Compression metrics (from Issue #86)
mcp_compression_bytes_saved_total{algorithm="gzip"} 5242880
mcp_compression_ratio{algorithm="gzip"} 3.2

# Retry metrics (from Issue #84)
mcp_retries_total{method="tools/call",attempt="1"} 23
mcp_retries_total{method="tools/call",attempt="2"} 8
mcp_retries_total{method="tools/call",attempt="3"} 2

2. OpenTelemetry Exporter

#[cfg(feature = "opentelemetry")]
pub struct OpenTelemetryExporter {
    config: OtelConfig,
    tracer: opentelemetry::global::BoxedTracer,
    meter: opentelemetry::metrics::Meter,
}

#[cfg(feature = "opentelemetry")]
pub struct OtelConfig {
    /// Service name
    pub service_name: String,
    
    /// OTLP endpoint
    pub endpoint: String,
    
    /// Export interval
    pub export_interval: Duration,
    
    /// Trace sampling ratio (0.0 - 1.0)
    pub sample_ratio: f64,
    
    /// Export metrics
    pub export_metrics: bool,
    
    /// Export traces
    pub export_traces: bool,
}

impl Default for OtelConfig {
    fn default() -> Self {
        Self {
            service_name: "mcp-client".to_string(),
            endpoint: "http://localhost:4317".to_string(),
            export_interval: Duration::from_secs(10),
            sample_ratio: 0.1,  // 10% sampling
            export_metrics: true,
            export_traces: true,
        }
    }
}

OpenTelemetry traces:

// Automatic span creation for requests
Span {
    name: "mcp.request",
    attributes: {
        "mcp.method": "tools/call",
        "mcp.request_id": "req-123",
        "mcp.transport": "http",
        "mcp.server_url": "https://api.example.com",
    },
    events: [
        Event { name: "middleware.auth.start", timestamp: ... },
        Event { name: "middleware.auth.complete", timestamp: ... },
        Event { name: "transport.send", timestamp: ... },
        Event { name: "transport.receive", timestamp: ... },
        Event { name: "middleware.retry.attempt", attributes: { "attempt": 2 } },
    ],
    duration: 0.045s,
    status: Ok,
}

Use Cases

1. Prometheus integration

#[cfg(feature = "prometheus")]
use pmcp::observability::PrometheusExporter;

// Start Prometheus metrics server
let prometheus = PrometheusExporter::builder()
    .port(9090)
    .prefix("my_mcp_client")
    .include_process_metrics(true)
    .build()?;

prometheus.start().await?;

// Integrate with MetricsMiddleware
let metrics = MetricsMiddleware::builder()
    .exporter(prometheus.clone())
    .export_interval(Duration::from_secs(15))
    .build();

let client = Client::builder()
    .transport(transport)
    .middleware(metrics)
    .build()?;

// Metrics automatically exported to http://localhost:9090/metrics
// Ready for Prometheus scraping

2. OpenTelemetry integration

#[cfg(feature = "opentelemetry")]
use pmcp::observability::OpenTelemetryExporter;

// Configure OTLP exporter
let otel = OpenTelemetryExporter::builder()
    .service_name("mcp-client")
    .endpoint("http://jaeger:4317")  // Or Tempo, Honeycomb, etc.
    .sample_ratio(0.1)  // 10% sampling
    .build()?;

// Integrate with middleware
let metrics = MetricsMiddleware::builder()
    .exporter(otel.clone())
    .build();

let client = Client::builder()
    .transport(transport)
    .middleware(metrics)
    .build()?;

// Traces and metrics automatically exported
// View in Jaeger, Grafana Tempo, Honeycomb, etc.

3. Combined exporters

// Export to both Prometheus and OpenTelemetry
let prometheus = PrometheusExporter::new(PrometheusConfig::default())?;
let otel = OpenTelemetryExporter::new(OtelConfig::default())?;

let metrics = MetricsMiddleware::builder()
    .exporter(prometheus)
    .exporter(otel)
    .build();

// Metrics available via:
// - Prometheus scraping (pull model)
// - OTLP push (push model)

4. Custom metrics from middleware

// Middleware can record custom metrics via MiddlewareContext
#[async_trait]
impl AdvancedMiddleware for MyCustomMiddleware {
    async fn on_request_with_context(
        &self,
        request: &mut JSONRPCRequest,
        context: &MiddlewareContext,
    ) -> Result<()> {
        // Custom metric automatically exported
        context.record_metric("custom_validation_checks".to_string(), 1.0);
        
        // Custom histogram
        context.record_histogram(
            "payload_size_bytes".to_string(),
            request.params.to_string().len() as f64,
        );
        
        Ok(())
    }
}

// These metrics appear in Prometheus:
// mcp_custom_validation_checks_total 1523
// mcp_payload_size_bytes_bucket{le="1024"} 892
// mcp_payload_size_bytes_sum 15728640

Implementation Plan

Phase 1: Prometheus exporter

  1. Add prometheus feature flag and dependencies
  2. Implement PrometheusExporter with registry
  3. Create HTTP server for /metrics endpoint
  4. Integrate with MetricsMiddleware
  5. Add standard metrics (counters, histograms, gauges)
  6. Add middleware-specific metrics (circuit breaker, rate limit, compression)

Phase 2: OpenTelemetry exporter

  1. Add opentelemetry feature flag and dependencies
  2. Implement OpenTelemetryExporter with OTLP
  3. Add automatic span creation for requests
  4. Add span events for middleware lifecycle
  5. Add metric export via OTLP
  6. Add trace context propagation

Phase 3: MiddlewareContext enhancements

  1. Add record_histogram() method
  2. Add record_gauge() method
  3. Add span context propagation
  4. Add automatic correlation ID injection

Phase 4: Integration and testing

  1. Integration tests with Prometheus server
  2. Integration tests with Jaeger/Tempo
  3. Examples: examples/37_prometheus.rs, examples/38_opentelemetry.rs
  4. Documentation in pmcp-book
  5. Grafana dashboard templates

Benefits

  • Standard integration: Works with existing monitoring infrastructure
  • Production-ready: Histograms, percentiles, distributed tracing
  • Zero-config: Sensible defaults, opt-in features
  • Flexible: Support both pull (Prometheus) and push (OTLP) models
  • Comprehensive: All middleware metrics automatically exported

Feature Flags

[dependencies]
pmcp = { version = "1.8", features = ["prometheus"] }
# or
pmcp = { version = "1.8", features = ["opentelemetry"] }
# or both
pmcp = { version = "1.8", features = ["prometheus", "opentelemetry"] }

References

Related Issues


Priority: Medium
Complexity: Medium
Dependencies: None (integrates with existing MetricsMiddleware)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions