Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨(backend) add prometheus metrics, liveness and readiness probe endpoints #562

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

lindenb1
Copy link
Collaborator

Provides Prometheus metrics, a custom metrics exporter, a CIDR filter for monitoring targeted views,
and readiness/liveness probe endpoints for Kubernetes.

Purpose

Simplify operational observability and health checking in containerized environments. By exposing both standard and custom metrics from Django for Prometheus, you can gain deeper insights into application performance. The integrated CIDR filter ensures that only allowed IPs can access the monitoring and probe endpoints of the application, and the built-in readiness/liveness probes provide health checks for Kubernetes to manage pod lifecycles effectively.

Proposal

Integrate these changes to benefit from metrics collection, and streamlined health checks. By centralizing these responsibilities, teams can reduce complexity and maintain a clearer view of application behavior within Kubernetes clusters and similar environments.

@lindenb1 lindenb1 mentioned this pull request Jan 16, 2025
@lindenb1 lindenb1 force-pushed the prometheus-django-metrics branch 20 times, most recently from 4b6f8be to 25b8d52 Compare January 20, 2025 13:04
Provides Prometheus metrics, a custom metrics exporter,
a CIDR filter for monitoring targeted views,
and readiness/liveness probe endpoints for Kubernetes.

Signed-off-by: lindenb1 <[email protected]>
@lindenb1 lindenb1 force-pushed the prometheus-django-metrics branch from 25b8d52 to 755538d Compare January 20, 2025 13:07
@lindenb1 lindenb1 marked this pull request as ready for review January 20, 2025 14:17
@lindenb1
Copy link
Collaborator Author

@virgile-dev Hey Virgile, could you please review the Prometheus implementation? At the moment, it is designed as an on/off switch, so it should not interfere significantly (if at all) with your existing code.

In addition to the Prometheus metrics, I have added the standard probes required by Kubernetes (readiness and liveness checks). I also included a CIDR filter decorator so that the metrics, readiness, and liveness endpoints are accessible only from specific networks, such as the private Kubernetes network where the application is running.

@securitykernel
Copy link
Collaborator

@sampaccoud Can you or @AntoLC please do a review?

Comment on lines +37 to +51
if os.environ.get("MONITORING_PROBING", "False").lower() == "true":
urlpatterns.append(
path(
"probes/liveness/",
monitoring_cidr_protected_view(liveness_check),
name="liveness-probe",
),
)
urlpatterns.append(
path(
"probes/readiness/",
monitoring_cidr_protected_view(readiness_check),
name="readiness-probe",
),
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that this would be redundant with what we already have on /lbheartbeat and /heartbeat with Mozilla's dockerflow.
Can you check before I review the rest?

if doc_ages["oldest"]:
metrics.append(
GaugeMetricFamily(
prefixed_metric_name("oldest_document_date"),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's recommended to include "timestamp"in the metric name and have a _seconds suffix (from https://prometheus.io/docs/practices/naming/#metric-names)

Suggested change
prefixed_metric_name("oldest_document_date"),
prefixed_metric_name("oldest_document_timestamp_seconds"),

if doc_ages["newest"]:
metrics.append(
GaugeMetricFamily(
prefixed_metric_name("newest_document_date"),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
prefixed_metric_name("newest_document_date"),
prefixed_metric_name("newest_document_timestamp_seconds"),


# -- User document distribution
user_distribution_metric = GaugeMetricFamily(
prefixed_metric_name("user_document_distribution"),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(suggestion) not sure that the "_distribution" is meaningful?

Suggested change
prefixed_metric_name("user_document_distribution"),
prefixed_metric_name("user_documents"),

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants