redpanda-data · kbatuigas · Dec 16, 2024 · Dec 11, 2024 · Dec 13, 2024 · Dec 16, 2024
@@ -2,7 +2,17 @@
 
 This section provides guidelines and example queries using Redpanda's public metrics to optimize your system's performance and monitor its health.
 
-TIP: To help detect and mitigate anomalous system behaviors, capture baseline metrics of your healthy system at different stages (at start-up, under high load, in steady state) so you can set thresholds and alerts according to those baselines.
+To help detect and mitigate anomalous system behaviors, capture baseline metrics of your healthy system at different stages (at start-up, under high load, in steady state) so you can set thresholds and alerts according to those baselines.
+
+[TIP]
+====
+For counter type metrics, a broker restart causes the count to reset to zero in tools like Prometheus and Grafana. Redpanda recommends wrapping counter metrics in a rate query to account for broker restarts, for example:
+
+[,promql]
+----
+rate(redpanda_kafka_records_produced_total[5m])
+----
+====
 
 === Redpanda architecture
 

@@ -1,11 +1,9 @@
-Redpanda exports metrics through Prometheus endpoints for you to monitor system health and optimize system performance.
-
-A Redpanda broker exports public metrics from the xref:reference:public-metrics-reference.adoc[`/public_metrics`] endpoint through the Admin API port (default: 9644).
-
-Before v22.2, a Redpanda broker provided metrics only through the xref:reference:internal-metrics-reference.adoc[`/metrics`] endpoint through the Admin API port. While Redpanda still provides this endpoint, it includes many internal metrics that are unnecessary for a typical Redpanda user to monitor. Consequently, the `/public_metrics` endpoint was added to provide a smaller set of important metrics that can be queried and ingested more quickly and inexpensively. The `/metrics` endpoint is now referred to as the 'internal metrics' endpoint, and Redpanda recommends that you use it for development, testing, and analysis.
+Redpanda exports metrics through two endpoints on the Admin API port (default: 9644) for you to monitor system health and optimize system performance.
 
 include::shared:partial$metrics-usage-tip.adoc[]
 
+The xref:reference:internal-metrics-reference.adoc[`/metrics`] endpoint is a legacy endpoint that includes many internal metrics that are unnecessary for a typical Redpanda user to monitor. The `/metrics` endpoint is also referred to as the 'internal metrics' endpoint, and Redpanda recommends that you use it for development, testing, and analysis. Alternatively, the xref:reference:public-metrics-reference.adoc[`/public_metrics`] endpoint provides a smaller set of important metrics that can be queried and ingested more quickly and inexpensively. 
+
 [NOTE]
 ====
 To maximize monitoring performance by minimizing the cardinality of data, some metrics are exported when their underlying features are in use, and are not exported when not in use. For example, a metric for consumer groups, xref:reference:public-metrics-reference.adoc#redpanda_kafka_consumer_group_committed_offset[`redpanda_kafka_consumer_group_committed_offset`], is not exported when no groups are registered.
@@ -25,8 +23,6 @@ This topic covers the following about monitoring Redpanda metrics:
 
 https://prometheus.io/[Prometheus^] is a system monitoring and alerting tool. It collects and stores metrics as time-series data identified by a metric name and key/value pairs.
 
-NOTE: Redpanda Data recommends creating monitoring dashboards with `/public_metrics`.
-
 ifdef::env-kubernetes[]
 
 To configure Prometheus to monitor Redpanda metrics in Kubernetes, you can use the https://prometheus-operator.dev/[Prometheus Operator^]:

@@ -1,6 +1,6 @@
 [TIP]
 ====
-Use xref:reference:public-metrics-reference.adoc[/public_metrics] for your primary dashboards for system health.
+Use xref:reference:public-metrics-reference.adoc[/public_metrics] for your primary dashboards for monitoring system health.
 
 Use xref:reference:internal-metrics-reference.adoc[/metrics] for detailed analysis and debugging.
 ====