Skip to content

Commit

Permalink
Make the daemonset rollout stuck alert configurable.
Browse files Browse the repository at this point in the history
For bigger Kubernetes clusters with bigger node churn (for instance,
cloud clusters with spot nodes), the daemonset rollouts often get stuck
for longer than just 15 minutes. Since the alert might easily misfire
even in cases where the delay is legitimate.

This PR introduces configurable `for` value to allow for customization.
As a default, the original value `15m` is left, so the only real
difference would be a slight change in the alert message formatting.

Signed-off-by: Milan Plzik <[email protected]>
  • Loading branch information
mplzik committed Nov 18, 2024
1 parent bdbf7f4 commit 58acafd
Showing 1 changed file with 3 additions and 2 deletions.
5 changes: 3 additions & 2 deletions alerts/apps_alerts.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ local utils = import '../lib/utils.libsonnet';
_config+:: {
kubeStateMetricsSelector: error 'must provide selector for kube-state-metrics',
kubeJobTimeoutDuration: error 'must provide value for kubeJobTimeoutDuration',
kubeDaemonSetRolloutStuckFor: '15m',
namespaceSelector: null,
prefixedNamespaceSelector: if self.namespaceSelector != null then self.namespaceSelector + ',' else '',
},
Expand Down Expand Up @@ -204,10 +205,10 @@ local utils = import '../lib/utils.libsonnet';
severity: 'warning',
},
annotations: {
description: 'DaemonSet {{ $labels.namespace }}/{{ $labels.daemonset }} has not finished or progressed for at least 15 minutes.',
description: 'DaemonSet {{ $labels.namespace }}/{{ $labels.daemonset }} has not finished or progressed for at least %(kubeDaemonSetRolloutStuckFor)s.' % $._config,
summary: 'DaemonSet rollout is stuck.',
},
'for': '15m',
'for': $._config.kubeDaemonSetRolloutStuckFor,
},
{
expr: |||
Expand Down

0 comments on commit 58acafd

Please sign in to comment.