Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Changes to alerts require reload of prometheus #64

Open
B1ue-W01f opened this issue Jun 29, 2021 · 2 comments
Open

[BUG] Changes to alerts require reload of prometheus #64

B1ue-W01f opened this issue Jun 29, 2021 · 2 comments
Labels
bug Something isn't working

Comments

@B1ue-W01f
Copy link

B1ue-W01f commented Jun 29, 2021

Your setup

Formula commit hash / release tag

n/a

Versions reports (master & minion)

n/a

Pillar / config used

prometheus:
  extra_files:
    apache_rules:
      file: service_rules/apache
      component: alertmanager
      config:
        groups:
          - name: 'apache.rules'
            rules:
              - alert: ApacheDown
                expr: apache_up == 0
                for: 0m
                labels:
                  severity: critical
                annotations:
                  summary: {% raw %} Apache down (instance {{ $labels.instance }}) {% endraw %}
                  description: {% raw %} "Apache down\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}" {% endraw %}
              - alert: ApacheWorkersLoad
                expr: (sum by (instance) (apache_workers{state="busy"}) / sum by (instance) (apache_scoreboard) ) * 100 > 80
                for: 2m
                labels:
                  severity: warning
                annotations:
                  summary: {% raw %} Apache workers load (instance {{ $labels.instance }}) {% endraw %}
                  description: {% raw %} "Apache workers in busy state approach the max workers count 80% workers busy on {{ $labels.instance }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}" {% endraw %}
              - alert: ApacheRestart
                expr: apache_uptime_seconds_total / 60 < 1
                for: 0m
                labels:
                  severity: warning
                annotations:
                  summary: {% raw %} Apache restart (instance {{ $labels.instance }}) {% endraw %}
                  description: {% raw %} "Apache has just been restarted.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}" {% endraw %}

Bug details

Describe the bug

Changing the extra_files alerts appears to result in a restart of the alertmanager service but needs to restart the prometheus process too otherwise changes arent updated in prometheus.

Steps to reproduce the bug

  1. Highstate pillar with alerts.
  2. Remove an alert from the pillar
  3. Re highstate pillar
  4. Note alert has not been removed from prometheus
  5. Restart prometheus
  6. Note now alert has been removed

Expected behaviour

Prometheus service should be restarted on change to extra_files / alerts

Attempts to fix the bug

None yet.

@B1ue-W01f B1ue-W01f added the bug Something isn't working label Jun 29, 2021
@mdschmitt
Copy link
Contributor

I think the problem here is just misconfiguration.

Your pillar has component: alertmanager present for apache_rules. Thing is, rules aren't dealt with by Alertmanager, they're dealt with by Prometheus itself and Alertmanager is just used to fire off alerts. Remove the component part of this (so as to use the default prometheus value), or set it to component: prometheus. That will make Prometheus reload instead of Alertmanager and you should be set to jet.

@mdschmitt
Copy link
Contributor

It looks like pillar.example is misleading in this regard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants