Skip to content

Commit

Permalink
Make the daemonset rollout stuck alert configurable. (#989)
Browse files Browse the repository at this point in the history
* Make the daemonset rollout stuck alert configurable.

For bigger Kubernetes clusters with bigger node churn (for instance,
cloud clusters with spot nodes), the daemonset rollouts often get stuck
for longer than just 15 minutes. Since the alert might easily misfire
even in cases where the delay is legitimate.

This PR introduces configurable `for` value to allow for customization.
As a default, the original value `15m` is left, so the only real
difference would be a slight change in the alert message formatting.

Signed-off-by: Milan Plzik <[email protected]>

* Fix the tests.

Signed-off-by: Milan Plzik <[email protected]>

---------

Signed-off-by: Milan Plzik <[email protected]>
  • Loading branch information
mplzik authored Nov 27, 2024
1 parent 72a1a23 commit a3fbf21
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 6 deletions.
5 changes: 3 additions & 2 deletions alerts/apps_alerts.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ local utils = import '../lib/utils.libsonnet';
_config+:: {
kubeStateMetricsSelector: error 'must provide selector for kube-state-metrics',
kubeJobTimeoutDuration: error 'must provide value for kubeJobTimeoutDuration',
kubeDaemonSetRolloutStuckFor: '15m',
namespaceSelector: null,
prefixedNamespaceSelector: if self.namespaceSelector != null then self.namespaceSelector + ',' else '',
},
Expand Down Expand Up @@ -204,10 +205,10 @@ local utils = import '../lib/utils.libsonnet';
severity: 'warning',
},
annotations: {
description: 'DaemonSet {{ $labels.namespace }}/{{ $labels.daemonset }} has not finished or progressed for at least 15 minutes.',
description: 'DaemonSet {{ $labels.namespace }}/{{ $labels.daemonset }} has not finished or progressed for at least %(kubeDaemonSetRolloutStuckFor)s.' % $._config,
summary: 'DaemonSet rollout is stuck.',
},
'for': '15m',
'for': $._config.kubeDaemonSetRolloutStuckFor,
},
{
expr: |||
Expand Down
8 changes: 4 additions & 4 deletions tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -822,7 +822,7 @@ tests:
severity: warning
exp_annotations:
summary: "DaemonSet rollout is stuck."
description: 'DaemonSet monitoring/node-exporter has not finished or progressed for at least 15 minutes.'
description: 'DaemonSet monitoring/node-exporter has not finished or progressed for at least 15m.'
runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedaemonsetrolloutstuck
- eval_time: 34m
alertname: KubeDaemonSetRolloutStuck
Expand Down Expand Up @@ -878,7 +878,7 @@ tests:
severity: warning
exp_annotations:
summary: "DaemonSet rollout is stuck."
description: 'DaemonSet monitoring/node-exporter has not finished or progressed for at least 15 minutes.'
description: 'DaemonSet monitoring/node-exporter has not finished or progressed for at least 15m.'
runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedaemonsetrolloutstuck
- eval_time: 34m
alertname: KubeDaemonSetRolloutStuck
Expand Down Expand Up @@ -909,7 +909,7 @@ tests:
severity: warning
exp_annotations:
summary: "DaemonSet rollout is stuck."
description: 'DaemonSet monitoring/node-exporter has not finished or progressed for at least 15 minutes.'
description: 'DaemonSet monitoring/node-exporter has not finished or progressed for at least 15m.'
runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedaemonsetrolloutstuck
- eval_time: 34m
alertname: KubeDaemonSetRolloutStuck
Expand Down Expand Up @@ -940,7 +940,7 @@ tests:
severity: warning
exp_annotations:
summary: "DaemonSet rollout is stuck."
description: 'DaemonSet monitoring/node-exporter has not finished or progressed for at least 15 minutes.'
description: 'DaemonSet monitoring/node-exporter has not finished or progressed for at least 15m.'
runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubedaemonsetrolloutstuck
- eval_time: 36m
alertname: KubeDaemonSetRolloutStuck
Expand Down

0 comments on commit a3fbf21

Please sign in to comment.