Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The H2 log algorithm for the poll time histogram is too granular for prometheus export #6960

Open
drbrain opened this issue Nov 7, 2024 · 5 comments
Labels
A-tokio Area: The main tokio crate C-feature-request Category: A feature request. M-metrics Module: tokio/runtime/metrics

Comments

@drbrain
Copy link

drbrain commented Nov 7, 2024

In #6896/#6897 the old log algorithm for recording poll time metrics was replaced by the H2 log algorithm. I have not been able to configure this histogram to produce a very limited number of buckets that also cover a reasonable range that I can export to prometheus.

With the legacy algorithm I had this configuration:

rt.enable_metrics_poll_count_histogram()
    .metrics_poll_count_histogram_scale(tokio::runtime::HistogramScale::Log)
    .metrics_poll_count_histogram_resolution(Duration::from_micros(10))
    .metrics_poll_count_histogram_buckets(7);

Which gave me these buckets when exported to prometheus:

0.000016384
0.000032768
0.000065536
0.000131072
0.000262144
0.000524288
0.002097152
0.004194304
+Inf

Distribution of poll times in these buckets was reasonable for the daily traffic pattern for our application.

With the new algorithm I'm unable to replicate the same range (~10µs–~4ms) with as few buckets. The best I've been able to configure is 17 buckets which is too much cardinality for prometheus when combined with the total number of prometheus labels full deployment of the application uses.

To fix this I'd like the legacy algorithm to be un-deprecated and renamed appropriately.

The alternatives are:

  • Use linear buckets. These will have a poor distribution for the poll times my application sees
  • Don't record poll count metrics to prometheus. Which means I won't be able to see this data
@drbrain drbrain added A-tokio Area: The main tokio crate C-feature-request Category: A feature request. labels Nov 7, 2024
@drbrain
Copy link
Author

drbrain commented Nov 7, 2024

I suppose a third alternative would be to allow arbitrary buckets that are determined outside tokio

@Darksonn Darksonn added the M-metrics Module: tokio/runtime/metrics label Nov 8, 2024
@Darksonn
Copy link
Contributor

Darksonn commented Nov 8, 2024

cc @hds @rcoh

@rcoh
Copy link
Contributor

rcoh commented Nov 8, 2024

Rather than the label per bucket, you could re export the histogram by computing summaries on the client and exporting that.

That's how metric.rs treats histograms in any case.

I'll explore if there is a way to have the h2 histogram have an equivalent setting as the old one or if not, expose a non deprecated way to get the old behavior.

rcoh added a commit to rcoh/tokio that referenced this issue Nov 8, 2024
Based on feedback tokio-rs#6960, customers want more flexibility in histogram configuration. The main change here is allowing precision for an H2 Histgoram to go all the way down to zero—there is not fundamental reason this wasn't possible before, but typically H2 histograms stop at P=2.

I've also added another optimization that truncates the number of buckets based on the requested max_value. This can save space by truncating in the middle of a bucket group.

Along the way and in the process of adding more tests, I found a couple of bugs which are fixed here:
1. The `max_value` in LogHistogramBuilder wasn't precisely respected
2. The `max_value` computed by `LogHistogram` was actually incorrect for some n/p combinations. The new version precisely considers bucket layout instead.
rcoh added a commit to rcoh/tokio that referenced this issue Nov 8, 2024
Based on feedback tokio-rs#6960, customers want more flexibility in histogram configuration. The main change here is allowing precision for an H2 Histgoram to go all the way down to zero—there is not fundamental reason this wasn't possible before, but typically H2 histograms stop at P=2.

I've also added another optimization that truncates the number of buckets based on the requested max_value. This can save space by truncating in the middle of a bucket group.

Along the way and in the process of adding more tests, I found a couple of bugs which are fixed here:
1. The `max_value` in LogHistogramBuilder wasn't precisely respected
2. The `max_value` computed by `LogHistogram` was actually incorrect for some n/p combinations. The new version precisely considers bucket layout instead.
rcoh added a commit to rcoh/tokio that referenced this issue Nov 8, 2024
Based on feedback tokio-rs#6960, customers want more flexibility in histogram configuration. The main change here is allowing precision for an H2 Histgoram to go all the way down to zero—there is not fundamental reason this wasn't possible before, but typically H2 histograms stop at P=2.

I've also added another optimization that truncates the number of buckets based on the requested max_value. This can save space by truncating in the middle of a bucket group.

Along the way and in the process of adding more tests, I found a couple of bugs which are fixed here:
1. The `max_value` in LogHistogramBuilder wasn't precisely respected
2. The `max_value` computed by `LogHistogram` was actually incorrect for some n/p combinations. The new version precisely considers bucket layout instead.
@wo401555661
Copy link

That is a smart idea

@rcoh
Copy link
Contributor

rcoh commented Nov 9, 2024

I'm working on #6963 which, among other things, will enable setting p=0 — there is no inherent reason to restrict it to >=2. Here's what buckets will look like in that case:

bucket 0: 0ns..16.384µs (size: 16.384µs)
bucket 1: 16.384µs..32.768µs (size: 16.384µs)
bucket 2: 32.768µs..65.536µs (size: 32.768µs)
bucket 3: 65.536µs..131.072µs (size: 65.536µs)
bucket 4: 131.072µs..262.144µs (size: 131.072µs)
bucket 5: 262.144µs..524.288µs (size: 262.144µs)
bucket 6: 524.288µs..1.048576ms (size: 524.288µs)
bucket 7: 1.048576ms..2.097152ms (size: 1.048576ms)
bucket 8: 2.097152ms..4.194304ms (size: 2.097152ms)
bucket 9: 4.194304ms..18446744073.709551615s (size: 18446744073.705357311s)

rcoh added a commit to rcoh/tokio that referenced this issue Nov 25, 2024
Based on feedback tokio-rs#6960, customers want more flexibility in histogram configuration. The main change here is allowing precision for an H2 Histgoram to go all the way down to zero—there is not fundamental reason this wasn't possible before, but typically H2 histograms stop at P=2.

I've also added another optimization that truncates the number of buckets based on the requested max_value. This can save space by truncating in the middle of a bucket group.

Along the way and in the process of adding more tests, I found a couple of bugs which are fixed here:
1. The `max_value` in LogHistogramBuilder wasn't precisely respected
2. The `max_value` computed by `LogHistogram` was actually incorrect for some n/p combinations. The new version precisely considers bucket layout instead.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate C-feature-request Category: A feature request. M-metrics Module: tokio/runtime/metrics
Projects
None yet
Development

No branches or pull requests

4 participants