Skip to content

apps sc: S3 quota alerts per bucket #2568

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Mlundm
Copy link

@Mlundm Mlundm commented Jun 19, 2025

Warning

This is a public repository, ensure not to disclose:

  • personal data beyond what is necessary for interacting with this pull request, nor
  • business confidential information, such as customer names.

What kind of PR is this?

Required: Mark one of the following that is applicable:

  • kind/feature
  • kind/improvement
  • kind/deprecation
  • kind/documentation
  • kind/clean-up
  • kind/bug
  • kind/other

Optional: Mark one or more of the following that are applicable:

Important

Breaking changes should be marked kind/admin-change or kind/dev-change depending on type
Critical security fixes should be marked with kind/security

  • kind/admin-change
  • kind/dev-change
  • kind/security
  • [kind/adr](set-me)

What does this PR do / why do we need this PR?

...

Information to reviewers

  • Would be happy if anyone got any suggestions/improvements to the schema titles/descriptions for the new buckets section.

  • I changed how the S3 Bucket Alert expression works in general because of how it was leaving "gaps" of data when you query the expression. This was due to while testing I noticed how changing the for: 1h to for: 10m would make the alert start pending but then return to Normal since there was no data (when it should have triggered). If you see any issue with it let me know. See below for example.

  • New query/expression
    image

  • Old query/expression
    image

  • Example per bucket alerts (sc-config)

prometheus:
  s3BucketAlerts:
    objects:
      enabled: true
    size:
      enabled: true
    buckets:
      - name: marcus-v2-thanos
        size:
          enabled: true
          percent: 80
          sizeQuotaGB: 1000
        objects:
          enabled: true
          percent: 80
          count: 1638400
    exclude:
      - marcus-v2-thanos

Screenshot from 2025-06-19 16-06-53

Checklist

  • Proper commit message prefix on all commits
  • Change checks:
    • The change is transparent
    • The change is disruptive
    • The change requires no migration steps
    • The change requires migration steps
    • The change updates CRDs
    • The change updates the config and the schema
  • Documentation checks:
  • Metrics checks:
    • The metrics are still exposed and present in Grafana after the change
    • The metrics names didn't change (Grafana dashboards and Prometheus alerts required no updates)
    • The metrics names did change (Grafana dashboards and Prometheus alerts required an update)
  • Logs checks:
    • The logs do not show any errors after the change
  • PodSecurityPolicy checks:
    • Any changed Pod is covered by Kubernetes Pod Security Standards
    • Any changed Pod is covered by Gatekeeper Pod Security Policies
    • The change does not cause any Pods to be blocked by Pod Security Standards or Policies
  • NetworkPolicy checks:
    • Any changed Pod is covered by Network Policies
    • The change does not cause any dropped packets in the NetworkPolicy Dashboard
  • Audit checks:
    • The change does not cause any unnecessary Kubernetes audit events
    • The change requires changes to Kubernetes audit policy
  • Falco checks:
    • The change does not cause any alerts to be generated by Falco
  • Bug checks:
    • The bug fix is covered by regression tests

@Mlundm Mlundm force-pushed the marcus/specific-quota-alerts branch 2 times, most recently from 0c15241 to d5674b3 Compare June 19, 2025 14:34
@Mlundm Mlundm marked this pull request as ready for review June 19, 2025 14:35
@Mlundm Mlundm requested a review from a team as a code owner June 19, 2025 14:35
Copy link
Contributor

@anders-elastisys anders-elastisys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice work, I think the templating, config and schema looks great, added some potential improvements and one concern regarding the Bucket36hActivityCheck alert.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another alert Bucket36hActivityCheck defined in this file which also uses the exclude list which we probably still would want for the specified buckets, should perhaps the exclusion for buckets listed in s3BucketAlerts.buckets be done in the templating instead of manually adding them to the exclude list, or, should we also have configurable alerts per bucket for Bucket36hActivityCheck as well? 🤔

Copy link
Author

@Mlundm Mlundm Jun 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I added templating for the exclusion of buckets so it creates a combined list for .exclude and names of .buckets. I don't think separate alerts per bucket for the activity check is necessary since there is not any configureable value there like number of objects or size.

Was trying to make a helper function for the templating but ended up spending too much time with debugging templating map/list issues due to that 😩

Ended up just doing it in the template file, which I think was just fine since it might have been overkill with a helper function either way.

Copy link
Contributor

@Zash Zash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Schema looks good, with one tweak.

@Mlundm Mlundm force-pushed the marcus/specific-quota-alerts branch from d5674b3 to 94502b0 Compare June 24, 2025 10:16
@Mlundm Mlundm requested review from anders-elastisys and Zash June 24, 2025 12:25
Copy link
Contributor

@anders-elastisys anders-elastisys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just added some comments about updating comments 😅

@@ -450,6 +450,18 @@ prometheus:
percent: 80
count: 1638400
exclude: []
# Custom per-bucket alerts
buckets: []
# - name: <cluster>-thanos # Also add name to exclude from regular alerts.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the most recent changes you do not need to add bucket names to the exclude list right?

Suggested change
# - name: <cluster>-thanos # Also add name to exclude from regular alerts.
# - name: <cluster>-thanos # This gets excluded from regular object storage alerts

@@ -18,6 +18,16 @@ s3BucketAlerts:
percent: 80
count: 1638400
exclude: []
buckets: []
# - name: <cluster>-thanos # Also add name to exclude from regular alerts.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here:

Suggested change
# - name: <cluster>-thanos # Also add name to exclude from regular alerts.
# - name: <cluster>-thanos # This gets excluded from regular object storage alerts

@Mlundm
Copy link
Author

Mlundm commented Jun 26, 2025

Added some ops people if they have some opinions on the alerts 🙂

@Mlundm Mlundm removed the request for review from linus-astrom June 26, 2025 13:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[2] S3 quota alerts per bucket
3 participants