Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alerts are getting fire after every minute #4253

Open
amolngt opened this issue Feb 13, 2025 · 1 comment
Open

Alerts are getting fire after every minute #4253

amolngt opened this issue Feb 13, 2025 · 1 comment

Comments

@amolngt
Copy link

amolngt commented Feb 13, 2025

Hi all,

  1. i want same alert(alert rule) to be fire after 5 min, currently i am getting same alert (alert rule) after every one minute for same '{{ $value }}'.
    if the threshold cross and value changes, it fires multiple alerts having same alert rule thats fine. But with same '{{ $value }}' it should fire alerts after 5 min. same alert rule with same value should not get fire for next 5 min. how to get this ??

  2. even if application is not down, it sends alerts every 1 min. how to debug this i am using below exp:- alert: "Instance Down" expr: up == 0
    whats is for, keep_firing_for and evaluation_interval ?

prometheus.yml

global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.

alerting:
alertmanagers:

- static_configs:
- targets:
- ip:port

rule_files:

- "alerts_rules.yml"

scrape_configs:

- job_name: "prometheus"
  static_configs:
  - targets: ["ip:port"]

alertmanager.yml

global:
resolve_timeout: 5m
route:
group_wait: 5s
group_interval: 5m
repeat_interval: 15m
receiver: webhook_receiver
receivers:

- name: webhook_receiver
  webhook_configs:
  - url: 'http://ip:port'
    send_resolved: false

alerts_rules.yml

groups:
- name: instance_alerts
  rules:
  - alert: "Instance Down"
    expr: up == 0
    # for: 30s
    # keep_firing_for: 30s
    labels:
      severity: "Critical"
    annotations:
      summary: "Endpoint {{ $labels.instance }} down"
      description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 30 sec."

- name: rabbitmq_alerts
  rules:
    - alert: "Consumer down for last 1 min"
      expr: rabbitmq_queue_consumers == 0
      # for: 1m
      # keep_firing_for: 30s
      labels:
        severity: Critical
      annotations:
        summary: "shortify | '{{ $labels.queue }}' has no consumers"
        description: "The queue '{{ $labels.queue }}' in vhost '{{ $labels.vhost }}' has zero consumers for more than 30 sec. Immediate attention is required."


    - alert: "Total Messages > 10k in last 1 min"
      expr: rabbitmq_queue_messages > 10000
      # for: 1m
      # keep_firing_for: 30s
      labels:
        severity: Critical
      annotations:
        summary: "'{{ $labels.queue }}' has total '{{ $value }}' messages for more than 1 min."
        description: |
          Queue {{ $labels.queue }} in RabbitMQ has total {{ $value }} messages for more than 1 min.

@amolngt amolngt changed the title prometheus alerting - alerts are getting fire after every minute Alerts are getting fire after every minute Feb 13, 2025
@grobinson-grafana
Copy link
Contributor

Please use https://groups.google.com/g/prometheus-users for help. GitHub issues are for feature requests and bug reports.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants