Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOLR-17628: Add query quantiles metrics to prometheus endpoint #3164

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

jkmuriithi
Copy link

@jkmuriithi jkmuriithi commented Feb 6, 2025

https://issues.apache.org/jira/browse/SOLR-17628

Description

Modify the implementation of SolrPrometheusFormatter.exportTimer to export a Prometheus summary containing quantile information instead of a single Prometheus gauge. Rename the Timer-based metrics solr_metrics_core_average_request_time and solr_metrics_core_average_searcher_warmup_time to reflect this change. Remove the solr_metrics_core_requests_time Counter metric.

Solution

Prior to this change, Dropwizard Timer metrics (used for core request handlers and searchers) were exported in Prometheus format as single gauges representing the mean of all observations. This PR replaces the existing mean gauge metrics with a summary that includes quantile metrics, the count (number) of observations, and the sum of all observations.

Sample old output:

# TYPE solr_metrics_core_average_request_time gauge
solr_metrics_core_average_request_time{category="ADMIN",collection="example-collection",core="core_example-collection_shard1_replica_n1",handler="/admin/file",replica="replica_n1",shard="shard1"} 0.0
solr_metrics_core_average_request_time{category="ADMIN",collection="example-collection",core="core_example-collection_shard1_replica_n1",handler="/admin/luke",replica="replica_n1",shard="shard1"} 0.0

Sample new output:

# TYPE solr_metrics_core_request_time summary
solr_metrics_core_request_time{category="ADMIN",collection="example-collection",core="core_example-collection_shard1_replica_n1",handler="/admin/file",replica="replica_n1",shard="shard1",quantile="0.5"} 0.0
solr_metrics_core_request_time{category="ADMIN",collection="example-collection",core="core_example-collection_shard1_replica_n1",handler="/admin/file",replica="replica_n1",shard="shard1",quantile="0.75"} 0.0
solr_metrics_core_request_time{category="ADMIN",collection="example-collection",core="core_example-collection_shard1_replica_n1",handler="/admin/file",replica="replica_n1",shard="shard1",quantile="0.99"} 0.0
solr_metrics_core_request_time{category="ADMIN",collection="example-collection",core="core_example-collection_shard1_replica_n1",handler="/admin/file",replica="replica_n1",shard="shard1",quantile="0.999"} 0.0
solr_metrics_core_request_time_count{category="ADMIN",collection="example-collection",core="core_example-collection_shard1_replica_n1",handler="/admin/file",replica="replica_n1",shard="shard1"} 0
solr_metrics_core_request_time_sum{category="ADMIN",collection="example-collection",core="core_example-collection_shard1_replica_n1",handler="/admin/file",replica="replica_n1",shard="shard1"} 0.0
solr_metrics_core_request_time{category="ADMIN",collection="example-collection",core="core_example-collection_shard1_replica_n1",handler="/admin/luke",replica="replica_n1",shard="shard1",quantile="0.5"} 0.0
solr_metrics_core_request_time{category="ADMIN",collection="example-collection",core="core_example-collection_shard1_replica_n1",handler="/admin/luke",replica="replica_n1",shard="shard1",quantile="0.75"} 0.0
solr_metrics_core_request_time{category="ADMIN",collection="example-collection",core="core_example-collection_shard1_replica_n1",handler="/admin/luke",replica="replica_n1",shard="shard1",quantile="0.99"} 0.0
solr_metrics_core_request_time{category="ADMIN",collection="example-collection",core="core_example-collection_shard1_replica_n1",handler="/admin/luke",replica="replica_n1",shard="shard1",quantile="0.999"} 0.0
solr_metrics_core_request_time_count{category="ADMIN",collection="example-collection",core="core_example-collection_shard1_replica_n1",handler="/admin/luke",replica="replica_n1",shard="shard1"} 0
solr_metrics_core_request_time_sum{category="ADMIN",collection="example-collection",core="core_example-collection_shard1_replica_n1",handler="/admin/luke",replica="replica_n1",shard="shard1"} 0.0

Tests

I updated MetricsHandlerTest and SolrPrometheusFormatterTest to align with the changes to exportTimer. ./gradlew test passes on my local machine.

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended, not available for branches on forks living under an organisation)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • I have added documentation for the Reference Guide

Copy link
Contributor

@mlbiscoc mlbiscoc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jkmuriithi for doing this!

@dsmiley maybe you can help review this and hopefully agree this is worth adding? I made this jira because I think this is important missing piece of metrics for prometheus and this PR addresses my poor implementation exporting timer (Should be a summary not a gauge average) but also adds support for creating a summary metric type.

The prometheus exporter doesn't seem to support histograms or summaries so this gets ahead of that curve.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants