You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've never encountered ksonnet before, so I'm not sure if I can translate that job file in a timely fashion. I'm also not sure whether it should be added to this existing file or if it warrants putting it in a completely separate file for jobs. I would appreciate any guidance or suggestions.
groups:
- name: job.rules
rules:
- alert: CronJobRunning
expr: time() -kube_cronjob_next_schedule_time > 3600
for: 1h
labels:
severity: warning
annotations:
description: CronJob {{$labels.namespaces}}/{{$labels.cronjob}} is taking more than 1h to complete
summary: CronJob didn't finish after 1h
- alert: JobCompletion
expr: kube_job_spec_completions - kube_job_status_succeeded > 0
for: 1h
labels:
severity: warning
annotations:
description: Job completion is taking more than 1h to complete
cronjob {{$labels.namespaces}}/{{$labels.job}}
summary: Job {{$labels.job}} didn't finish to complete after 1h
- alert: JobFailed
expr: kube_job_status_failed > 0
for: 1h
labels:
severity: warning
annotations:
description: Job {{$labels.namespaces}}/{{$labels.job}} failed to complete
summary: Job failed
The text was updated successfully, but these errors were encountered:
At $work we have CronJobs that take more than 1h to complete. We know it's a problem, but we really only care that the Job starts every 24h and successfully completes.
Nice, thanks folks! I'll take a look at the blog post and see what we can do. PRs are always welcome, I'm very much of the opinions something is better than nothing, and we then have a base to iterate off.
Hello, as mentioned in kubernetes slack on
#monitoring-mixin
channel, I don't see any thing equivalent to:I've never encountered ksonnet before, so I'm not sure if I can translate that job file in a timely fashion. I'm also not sure whether it should be added to this existing file or if it warrants putting it in a completely separate file for jobs. I would appreciate any guidance or suggestions.
The text was updated successfully, but these errors were encountered: