prometheusremotewrite: End-to-end Prometheus Native Histogram Support #16120

Reimirno · 2024-10-31T21:02:31Z

TLDR

Telegraf prometheusremotewrite data format parser Prometheus native histogram into one single Telegraf metric (instead of multiple Telegraf metrics), and its serializer should be able to serialize it back to a Prometheus native histogram. Design at the end.

Use Case

We are on a Prometheus stack and is planning to use Telegraf on data ingestion path (for some aggregation). This is a simplified view of our design.

Pods ---(get scraped)---> Agent (Prometheus Agent/Grafana Agent) ---(remote write)---> Telegraf ---(remote write)---> TSDB ....

Some of our metrics are native histogram, a new histogram model introduced by Prometheus. Rather than getting emitted as several metrics (_sum, _count, many _bucket with les) it encodes a protobuf struct and emits a single time series. It not only guarantees atomicity and thus resolves the writing batch problem that's present in classic histogram but also offers better resolution, query accuracy at a lower cost.

I PoC-ed a simple Telegraf ingest and output (aggregation logic not added yet) and put it in out ingestion path. Native histogram metrics are only available in protobuf exposition format - so prometheusremotewrite data format seems the right choice. Important configs are:

[[inputs.http_listener_v2]]
      alias = "prom-ingest"
      service_address = ":9201"
      paths = ["/receive"]
      methods = ["GET", "OPTIONS", "POST", "PUT"]
      data_format = "prometheusremotewrite"

[[outputs.http]]
      alias = "prom-write"
      url = "%(write_url)s"
      timeout = "10s"
      data_format = "prometheusremotewrite"

      [outputs.http.headers]
         Content-Type = "application/x-protobuf"
         Content-Encoding = "snappy"
         X-Prometheus-Remote-Write-Version = "2.0.0"

Expected behavior

http_listener_v2 prometheusremotewrite format: a native histogram should be ingested and parsed into one single Telegraf metric, not breaking its atomicity.
http output prometheusremotewrite format: a native histogram should be written out, if a native histogram is ingested.

How exactly a native histogram metric should be parsed into one single Telegraf metric (data representation) is worth a design, so that it is:

not difficult for writing aggregators (starlark etc) for it
(even better) amenable to existing processors
(even better) reusable logic to openmetrics exponential histogram support

Actual behavior

Currently, support for ingesting native histogram is implemented in this PR: #14952
This causes the parser to break down a native histogram into many Telegraf metrics (_sum _count and many _bucket), as if it is a classic histogram. When getting written out by http output, it serializes into several separate Prometheus metrics, instead of one native histogram. This means all the benefits from native histogram (atomicity, reduced cardinality, better performance) are lost.

Additional info

Proposal:
We need to change how prometheusremotewrite parser handles a prom native histogram. It should parse it into one single Telegraf metric.
We need to change how prometheusremotewrite serializer so that it converts back such an Telegraf metrics to a prom native histogram.

A high-level design:

The text was updated successfully, but these errors were encountered:

Reimirno added the feature request Requests for new plugin and for new features to existing plugins label Oct 31, 2024

Reimirno changed the title ~~End-to-end Prometheus Native Histogram Support~~ prometheusremotewrite: End-to-end Prometheus Native Histogram Support Oct 31, 2024

Reimirno linked a pull request Oct 31, 2024 that will close this issue

feat(parser.prometheusremotewrite, serializer.prometheusremotewrite): Native histogram support end-to-end #16121

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prometheusremotewrite: End-to-end Prometheus Native Histogram Support #16120

prometheusremotewrite: End-to-end Prometheus Native Histogram Support #16120

Reimirno commented Oct 31, 2024 •

edited

Loading

prometheusremotewrite: End-to-end Prometheus Native Histogram Support #16120

prometheusremotewrite: End-to-end Prometheus Native Histogram Support #16120

Comments

Reimirno commented Oct 31, 2024 • edited Loading

TLDR

Use Case

Expected behavior

Actual behavior

Additional info

Reimirno commented Oct 31, 2024 •

edited

Loading