Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broker restart resets counters to 0 #913

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 11 additions & 1 deletion modules/manage/partials/monitor-health.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,17 @@

This section provides guidelines and example queries using Redpanda's public metrics to optimize your system's performance and monitor its health.

TIP: To help detect and mitigate anomalous system behaviors, capture baseline metrics of your healthy system at different stages (at start-up, under high load, in steady state) so you can set thresholds and alerts according to those baselines.
To help detect and mitigate anomalous system behaviors, capture baseline metrics of your healthy system at different stages (at start-up, under high load, in steady state) so you can set thresholds and alerts according to those baselines.

[TIP]
kbatuigas marked this conversation as resolved.
Show resolved Hide resolved
====
For counter type metrics, a broker restart causes the count to reset to zero in tools like Prometheus and Grafana. Redpanda recommends wrapping counter metrics in a rate query to account for broker restarts, for example:

[,promql]
----
rate(redpanda_kafka_records_produced_total[5m])
----
====

=== Redpanda architecture

Expand Down
10 changes: 3 additions & 7 deletions modules/manage/partials/monitor-redpanda.adoc
Original file line number Diff line number Diff line change
@@ -1,11 +1,9 @@
Redpanda exports metrics through Prometheus endpoints for you to monitor system health and optimize system performance.

A Redpanda broker exports public metrics from the xref:reference:public-metrics-reference.adoc[`/public_metrics`] endpoint through the Admin API port (default: 9644).

Before v22.2, a Redpanda broker provided metrics only through the xref:reference:internal-metrics-reference.adoc[`/metrics`] endpoint through the Admin API port. While Redpanda still provides this endpoint, it includes many internal metrics that are unnecessary for a typical Redpanda user to monitor. Consequently, the `/public_metrics` endpoint was added to provide a smaller set of important metrics that can be queried and ingested more quickly and inexpensively. The `/metrics` endpoint is now referred to as the 'internal metrics' endpoint, and Redpanda recommends that you use it for development, testing, and analysis.
Redpanda exports metrics through two endpoints on the Admin API port (default: 9644) for you to monitor system health and optimize system performance.

include::shared:partial$metrics-usage-tip.adoc[]

The xref:reference:internal-metrics-reference.adoc[`/metrics`] endpoint is a legacy endpoint that includes many internal metrics that are unnecessary for a typical Redpanda user to monitor. The `/metrics` endpoint is also referred to as the 'internal metrics' endpoint, and Redpanda recommends that you use it for development, testing, and analysis. Alternatively, the xref:reference:public-metrics-reference.adoc[`/public_metrics`] endpoint provides a smaller set of important metrics that can be queried and ingested more quickly and inexpensively.

[NOTE]
====
To maximize monitoring performance by minimizing the cardinality of data, some metrics are exported when their underlying features are in use, and are not exported when not in use. For example, a metric for consumer groups, xref:reference:public-metrics-reference.adoc#redpanda_kafka_consumer_group_committed_offset[`redpanda_kafka_consumer_group_committed_offset`], is not exported when no groups are registered.
Expand All @@ -25,8 +23,6 @@ This topic covers the following about monitoring Redpanda metrics:

https://prometheus.io/[Prometheus^] is a system monitoring and alerting tool. It collects and stores metrics as time-series data identified by a metric name and key/value pairs.

NOTE: Redpanda Data recommends creating monitoring dashboards with `/public_metrics`.

ifdef::env-kubernetes[]

To configure Prometheus to monitor Redpanda metrics in Kubernetes, you can use the https://prometheus-operator.dev/[Prometheus Operator^]:
Expand Down
2 changes: 1 addition & 1 deletion modules/shared/partials/metrics-usage-tip.adoc
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[TIP]
====
Use xref:reference:public-metrics-reference.adoc[/public_metrics] for your primary dashboards for system health.
Use xref:reference:public-metrics-reference.adoc[/public_metrics] for your primary dashboards for monitoring system health.

Use xref:reference:internal-metrics-reference.adoc[/metrics] for detailed analysis and debugging.
====
Loading