Skip to content

observability(alerts): define Sentry ownership and alert rules #1739

Description

@JSONbored

Part of #998. Related: #1002, #1202, and #1667.

Context

More Sentry coverage is only useful if the signals are routed and noisy classes are controlled. We need a small, intentional alert set for production-impacting failures and clear ownership/runbook notes for what to do when they fire.

This is not a broad log-ingestion project. Keep alerts focused on operational failures that need maintainer action.

Requirements

  • Define alert rules for production-only new issues/regressions and selected operational event classes.
  • Cover at least: REES source-map upload failure, REES analyzer degradation spike, self-host job dead-lettering, relay event drops, gate/check-run permission gaps, AI provider exhaustion, and scheduled monitor misses.
  • Set thresholds so one-off fail-open noise does not page, but persistent failures are visible quickly.
  • Define owner/routing expectations for each alert class.
  • Add a short runbook for first checks, likely causes, and where to verify recovery.
  • Keep alert configuration maintainer-only; do not require contributor workflow changes.

Deliverables

  • Sentry alert rule inventory and desired thresholds.
  • Ownership/routing notes for the JSONbored org/project setup.
  • Runbook docs for each high-priority alert class.
  • Optional configuration-as-code or script if Sentry's API support is stable enough for our use.
  • Smoke validation plan for each alert class.

Acceptance criteria

  • New production regressions and persistent operational failures route to the maintainer with actionable context.
  • Expected fail-open/transient paths do not create chronic alert noise.
  • Every alert has a documented first-response path and recovery verification step.

Metadata

Metadata

Assignees

Labels

maintainer-onlyWork to be completed solely by jsonbored - yields no gittensor points.

Projects

Status
Todo

Relationships

None yet

Development

No branches or pull requests

Issue actions