✨(backend) add prometheus metrics, liveness and readiness probe endpoints #562

lindenb1 · 2025-01-16T14:45:10Z

Provides Prometheus metrics, a custom metrics exporter, a CIDR filter for monitoring targeted views,
and readiness/liveness probe endpoints for Kubernetes.

Purpose

Simplify operational observability and health checking in containerized environments. By exposing both standard and custom metrics from Django for Prometheus, you can gain deeper insights into application performance. The integrated CIDR filter ensures that only allowed IPs can access the monitoring and probe endpoints of the application, and the built-in readiness/liveness probes provide health checks for Kubernetes to manage pod lifecycles effectively.

Proposal

Integrate these changes to benefit from metrics collection, and streamlined health checks. By centralizing these responsibilities, teams can reduce complexity and maintain a clearer view of application behavior within Kubernetes clusters and similar environments.

Provides Prometheus metrics, a custom metrics exporter, a CIDR filter for monitoring targeted views, and readiness/liveness probe endpoints for Kubernetes. Signed-off-by: lindenb1 <[email protected]>

lindenb1 · 2025-01-20T14:32:09Z

@virgile-dev Hey Virgile, could you please review the Prometheus implementation? At the moment, it is designed as an on/off switch, so it should not interfere significantly (if at all) with your existing code.

In addition to the Prometheus metrics, I have added the standard probes required by Kubernetes (readiness and liveness checks). I also included a CIDR filter decorator so that the metrics, readiness, and liveness endpoints are accessible only from specific networks, such as the private Kubernetes network where the application is running.

securitykernel · 2025-02-25T08:24:48Z

@sampaccoud Can you or @AntoLC please do a review?

sampaccoud · 2025-03-02T21:25:26Z

src/backend/impress/urls.py

+if os.environ.get("MONITORING_PROBING", "False").lower() == "true":
+    urlpatterns.append(
+        path(
+            "probes/liveness/",
+            monitoring_cidr_protected_view(liveness_check),
+            name="liveness-probe",
+        ),
+    )
+    urlpatterns.append(
+        path(
+            "probes/readiness/",
+            monitoring_cidr_protected_view(readiness_check),
+            name="readiness-probe",
+        ),
+    )


My understanding is that this would be redundant with what we already have on /lbheartbeat and /heartbeat with Mozilla's dockerflow.
Can you check before I review the rest?

simonpasquier · 2025-03-18T08:42:46Z

src/backend/core/api/custom_metrics_exporter.py

+        if doc_ages["oldest"]:
+            metrics.append(
+                GaugeMetricFamily(
+                    prefixed_metric_name("oldest_document_date"),


It's recommended to include "timestamp"in the metric name and have a _seconds suffix (from https://prometheus.io/docs/practices/naming/#metric-names)

Suggested change

prefixed_metric_name("oldest_document_date"),

prefixed_metric_name("oldest_document_timestamp_seconds"),

simonpasquier · 2025-03-18T08:43:00Z

src/backend/core/api/custom_metrics_exporter.py

+        if doc_ages["newest"]:
+            metrics.append(
+                GaugeMetricFamily(
+                    prefixed_metric_name("newest_document_date"),


Suggested change

prefixed_metric_name("newest_document_date"),

prefixed_metric_name("newest_document_timestamp_seconds"),

simonpasquier · 2025-03-18T08:43:57Z

src/backend/core/api/custom_metrics_exporter.py

+
+        # -- User document distribution
+        user_distribution_metric = GaugeMetricFamily(
+            prefixed_metric_name("user_document_distribution"),


(suggestion) not sure that the "_distribution" is meaningful?

Suggested change

prefixed_metric_name("user_document_distribution"),

prefixed_metric_name("user_documents"),

AntoLC · 2025-04-29T13:04:49Z

I added the label "keep track", to find interesting PR or issue easily after they are closed.

lindenb1 mentioned this pull request Jan 16, 2025

Prometheus Metrics #455

Open

lindenb1 force-pushed the prometheus-django-metrics branch 20 times, most recently from 4b6f8be to 25b8d52 Compare January 20, 2025 13:04

✨(backend) add prometheus metrics and probe endpoints

755538d

Provides Prometheus metrics, a custom metrics exporter, a CIDR filter for monitoring targeted views, and readiness/liveness probe endpoints for Kubernetes. Signed-off-by: lindenb1 <[email protected]>

lindenb1 force-pushed the prometheus-django-metrics branch from 25b8d52 to 755538d Compare January 20, 2025 13:07

lindenb1 marked this pull request as ready for review January 20, 2025 14:17

sampaccoud reviewed Mar 2, 2025

View reviewed changes

simonpasquier reviewed Mar 18, 2025

View reviewed changes

AntoLC added backend experiment labels Apr 29, 2025

AntoLC added the keep track Label to keep track of interesting things label Apr 29, 2025

AntoLC closed this Apr 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

✨(backend) add prometheus metrics, liveness and readiness probe endpoints #562

✨(backend) add prometheus metrics, liveness and readiness probe endpoints #562

Uh oh!

lindenb1 commented Jan 16, 2025

Uh oh!

lindenb1 commented Jan 20, 2025

Uh oh!

securitykernel commented Feb 25, 2025

Uh oh!

sampaccoud Mar 2, 2025

Uh oh!

simonpasquier Mar 18, 2025

Uh oh!

simonpasquier Mar 18, 2025

Uh oh!

simonpasquier Mar 18, 2025

Uh oh!

AntoLC commented Apr 29, 2025

Uh oh!

Uh oh!

	prefixed_metric_name("oldest_document_date"),
	prefixed_metric_name("oldest_document_timestamp_seconds"),

	prefixed_metric_name("newest_document_date"),
	prefixed_metric_name("newest_document_timestamp_seconds"),

	prefixed_metric_name("user_document_distribution"),
	prefixed_metric_name("user_documents"),

✨(backend) add prometheus metrics, liveness and readiness probe endpoints #562

✨(backend) add prometheus metrics, liveness and readiness probe endpoints #562

Uh oh!

Conversation

lindenb1 commented Jan 16, 2025

Purpose

Proposal

Uh oh!

lindenb1 commented Jan 20, 2025

Uh oh!

securitykernel commented Feb 25, 2025

Uh oh!

sampaccoud Mar 2, 2025

Choose a reason for hiding this comment

Uh oh!

simonpasquier Mar 18, 2025

Choose a reason for hiding this comment

Uh oh!

simonpasquier Mar 18, 2025

Choose a reason for hiding this comment

Uh oh!

simonpasquier Mar 18, 2025

Choose a reason for hiding this comment

Uh oh!

AntoLC commented Apr 29, 2025

Uh oh!

Uh oh!