Part of #998. Related: #1002, #1202, and #1667.
Context
More Sentry coverage is only useful if the signals are routed and noisy classes are controlled. We need a small, intentional alert set for production-impacting failures and clear ownership/runbook notes for what to do when they fire.
This is not a broad log-ingestion project. Keep alerts focused on operational failures that need maintainer action.
Requirements
- Define alert rules for production-only new issues/regressions and selected operational event classes.
- Cover at least: REES source-map upload failure, REES analyzer degradation spike, self-host job dead-lettering, relay event drops, gate/check-run permission gaps, AI provider exhaustion, and scheduled monitor misses.
- Set thresholds so one-off fail-open noise does not page, but persistent failures are visible quickly.
- Define owner/routing expectations for each alert class.
- Add a short runbook for first checks, likely causes, and where to verify recovery.
- Keep alert configuration maintainer-only; do not require contributor workflow changes.
Deliverables
- Sentry alert rule inventory and desired thresholds.
- Ownership/routing notes for the JSONbored org/project setup.
- Runbook docs for each high-priority alert class.
- Optional configuration-as-code or script if Sentry's API support is stable enough for our use.
- Smoke validation plan for each alert class.
Acceptance criteria
- New production regressions and persistent operational failures route to the maintainer with actionable context.
- Expected fail-open/transient paths do not create chronic alert noise.
- Every alert has a documented first-response path and recovery verification step.
Part of #998. Related: #1002, #1202, and #1667.
Context
More Sentry coverage is only useful if the signals are routed and noisy classes are controlled. We need a small, intentional alert set for production-impacting failures and clear ownership/runbook notes for what to do when they fire.
This is not a broad log-ingestion project. Keep alerts focused on operational failures that need maintainer action.
Requirements
Deliverables
Acceptance criteria