Skip to content

fix GaugeVec reset gap in handle_prometheus_metrics #337

@gimballock

Description

@gimballock

Context

handle_prometheus_metrics in stratum-apps/src/monitoring/http_server.rs calls .reset() on all GaugeVec metrics before repopulating from the snapshot cache:

// Reset per-channel metrics before repopulating
if let Some(ref metric) = state.metrics.sv2_client_channel_hashrate { metric.reset(); }
// ...

This creates a brief window on every /metrics request where all label series are absent. It is the root reason poll_until_metric_gte is needed in integration tests — without polling, a test can hit /metrics during this gap and see the metric missing even though a share was already accepted.

Goal

Eliminate the reset gap so GaugeVec metrics can be asserted directly (without polling) after the snapshot has refreshed.

Suggested approach

Instead of .reset() + repopulate-all, compute the new label set from the snapshot and:

  1. .set() all current label combinations
  2. .remove() only label combinations that are no longer present (stale channels)

This preserves the cardinality-bounding behavior (stale channels are still cleaned up) while eliminating the gap where all labels temporarily disappear.

Related to PR #281.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Todo 📝

    Status

    Todo 📝

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions