From c9e9f2025b061d1d962bdcab3edaa18f6b71dbd7 Mon Sep 17 00:00:00 2001 From: kbatuigas <36839689+kbatuigas@users.noreply.github.com> Date: Wed, 11 Dec 2024 14:29:24 -0500 Subject: [PATCH] Call out counter reset on broker restart plus minor page edits --- modules/manage/partials/monitor-health.adoc | 12 +++++++++++- modules/manage/partials/monitor-redpanda.adoc | 8 ++------ modules/shared/partials/metrics-usage-tip.adoc | 2 +- 3 files changed, 14 insertions(+), 8 deletions(-) diff --git a/modules/manage/partials/monitor-health.adoc b/modules/manage/partials/monitor-health.adoc index 0633aaae2..4821057fb 100644 --- a/modules/manage/partials/monitor-health.adoc +++ b/modules/manage/partials/monitor-health.adoc @@ -2,7 +2,17 @@ This section provides guidelines and example queries using Redpanda's public metrics to optimize your system's performance and monitor its health. -TIP: To help detect and mitigate anomalous system behaviors, capture baseline metrics of your healthy system at different stages (at start-up, under high load, in steady state) so you can set thresholds and alerts according to those baselines. +To help detect and mitigate anomalous system behaviors, capture baseline metrics of your healthy system at different stages (at start-up, under high load, in steady state) so you can set thresholds and alerts according to those baselines. + +[TIP] +==== +For counter type metrics, a broker restart causes the count in tools such as Prometheus and Grafana to reset to zero. Redpanda recommends wrapping counter metrics in a rate query to account for broker restarts, for example: + +[,promql] +---- +rate(redpanda_kafka_records_produced_total[5m]) +---- +==== === Redpanda architecture diff --git a/modules/manage/partials/monitor-redpanda.adoc b/modules/manage/partials/monitor-redpanda.adoc index f5347f97a..f273c01e1 100644 --- a/modules/manage/partials/monitor-redpanda.adoc +++ b/modules/manage/partials/monitor-redpanda.adoc @@ -1,8 +1,6 @@ -Redpanda exports metrics through Prometheus endpoints for you to monitor system health and optimize system performance. +Redpanda exports metrics through two Prometheus endpoints on the Admin API port (default: 9644) for you to monitor system health and optimize system performance. -A Redpanda broker exports public metrics from the xref:reference:public-metrics-reference.adoc[`/public_metrics`] endpoint through the Admin API port (default: 9644). - -Before v22.2, a Redpanda broker provided metrics only through the xref:reference:internal-metrics-reference.adoc[`/metrics`] endpoint through the Admin API port. While Redpanda still provides this endpoint, it includes many internal metrics that are unnecessary for a typical Redpanda user to monitor. Consequently, the `/public_metrics` endpoint was added to provide a smaller set of important metrics that can be queried and ingested more quickly and inexpensively. The `/metrics` endpoint is now referred to as the 'internal metrics' endpoint, and Redpanda recommends that you use it for development, testing, and analysis. +The xref:reference:internal-metrics-reference.adoc[`/metrics`] is a legacy endpoint that includes many internal metrics that are unnecessary for a typical Redpanda user to monitor. The `/metrics` endpoint is also referred to as the 'internal metrics' endpoint, and Redpanda recommends that you use it for development, testing, and analysis. Alternatively, the xref:reference:public-metrics-reference.adoc[`/public_metrics`] endpoint provides a smaller set of important metrics that can be queried and ingested more quickly and inexpensively. include::shared:partial$metrics-usage-tip.adoc[] @@ -25,8 +23,6 @@ This topic covers the following about monitoring Redpanda metrics: https://prometheus.io/[Prometheus^] is a system monitoring and alerting tool. It collects and stores metrics as time-series data identified by a metric name and key/value pairs. -NOTE: Redpanda Data recommends creating monitoring dashboards with `/public_metrics`. - ifdef::env-kubernetes[] To configure Prometheus to monitor Redpanda metrics in Kubernetes, you can use the https://prometheus-operator.dev/[Prometheus Operator^]: diff --git a/modules/shared/partials/metrics-usage-tip.adoc b/modules/shared/partials/metrics-usage-tip.adoc index 807e0cba3..4edf0b2be 100644 --- a/modules/shared/partials/metrics-usage-tip.adoc +++ b/modules/shared/partials/metrics-usage-tip.adoc @@ -1,6 +1,6 @@ [TIP] ==== -Use xref:reference:public-metrics-reference.adoc[/public_metrics] for your primary dashboards for system health. +Use xref:reference:public-metrics-reference.adoc[/public_metrics] for your primary dashboards for monitoring system health. Use xref:reference:internal-metrics-reference.adoc[/metrics] for detailed analysis and debugging. ====