Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Job and CronJob rules #24

Open
StevenACoffman opened this issue Jun 5, 2018 · 3 comments
Open

Add Job and CronJob rules #24

StevenACoffman opened this issue Jun 5, 2018 · 3 comments
Labels

Comments

@StevenACoffman
Copy link
Contributor

StevenACoffman commented Jun 5, 2018

Hello, as mentioned in kubernetes slack on #monitoring-mixin channel, I don't see any thing equivalent to:

I've never encountered ksonnet before, so I'm not sure if I can translate that job file in a timely fashion. I'm also not sure whether it should be added to this existing file or if it warrants putting it in a completely separate file for jobs. I would appreciate any guidance or suggestions.

groups:
- name: job.rules
  rules:
  - alert: CronJobRunning
    expr: time() -kube_cronjob_next_schedule_time > 3600
    for: 1h
    labels:
      severity: warning
    annotations:
      description: CronJob {{$labels.namespaces}}/{{$labels.cronjob}} is taking more than 1h to complete
      summary: CronJob didn't finish after 1h

  - alert: JobCompletion
    expr: kube_job_spec_completions - kube_job_status_succeeded  > 0
    for: 1h
    labels:
      severity: warning
    annotations:
      description: Job completion is taking more than 1h to complete
        cronjob {{$labels.namespaces}}/{{$labels.job}}
      summary: Job {{$labels.job}} didn't finish to complete after 1h

  - alert: JobFailed
    expr: kube_job_status_failed  > 0
    for: 1h
    labels:
      severity: warning
    annotations:
      description: Job {{$labels.namespaces}}/{{$labels.job}} failed to complete
      summary: Job failed
@adamdecaf
Copy link
Contributor

adamdecaf commented Jun 5, 2018

At $work we have CronJobs that take more than 1h to complete. We know it's a problem, but we really only care that the Job starts every 24h and successfully completes.

https://medium.com/@tristan_96324/prometheus-k8s-cronjob-alerts-94bee7b90511

That article talks about ways to join metrics to build alerts off cron jobs. I've been meaning to try converting those over to this project.

@tomwilkie
Copy link
Member

Nice, thanks folks! I'll take a look at the blog post and see what we can do. PRs are always welcome, I'm very much of the opinions something is better than nothing, and we then have a base to iterate off.

Copy link

This issue has not had any activity in the past 30 days, so the
stale label has been added to it.

  • The stale label will be removed if there is new activity
  • The issue will be closed in 7 days if there is no new activity
  • Add the keepalive label to exempt this issue from the stale check action

Thank you for your contributions!

@github-actions github-actions bot added the stale label Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants