Bug 1942867 - Add an alert management system for telemetry alerts. #9015

gmierz · 2025-10-09T12:30:51Z

This set of patches adds a new alert management system for telemetry alerts. The commits below attempt to split up the system into some logical chunks with newer commits building on previous ones.

Some generic base and utility classes are added directly to the auto_perf_sheriffing folder. These are not specific to telemetry alerting and could be used in other performance sheriffing automation.

The concrete classes for telemetry alert management are found in the treeherder/perf/auto_perf_sheriffing folder. These are then integrated into the telemetry detection code in Sherlock through the TelemetryAlertManager and run from TelemetryAlertManager.manage_alerts.

The manage_alerts method is defined generically in the AlertManager class. It starts by updating the DB with any changes made in telemetry bugs in Bugzilla - this is only for their resolutions at the moment. After this, bugs are filed for the alerts that are generated for any probes that specify a bug should be filed (by setting the monitor.alert field to True in their probe definition). Once bugs are filed, modifications are made to these bugs and any existing bugs as needed. This currently only modifies the see_also field to associate all bugs filed for the same detection range together - in other words, all the bugs that are part of the same PerformanceTelemetryAlertSummary. At the end of this "bug handling" phase, emails are produced for any alerts that request it (either bugs are produced or emails, but never both to reduce spamming). Finally, it's possible that either the bug modifications or emails fail. In that case, we have a "house keeping" stage where we do retries of the failed alerts on a daily basis.

For treeherder-admins, the relevant changes will be in the first commit where I am adding a new env field to capture the BUG_COMMENTER_API_KEY being set locally. This is needed for testing the bug modification aspect of the management system.

gmierz · 2025-10-17T14:35:40Z

Here's a sample bug that is filed by this: https://bugzilla.mozilla.org/show_bug.cgi?id=1993145

treeherder/perf/auto_perf_sheriffing/base_email_manager.py

treeherder/perf/auto_perf_sheriffing/base_alert_manager.py

treeherder/perf/auto_perf_sheriffing/base_bug_manager.py

treeherder/perf/auto_perf_sheriffing/bug_searcher.py

treeherder/perf/auto_perf_sheriffing/telemetry_alerting/email_manager.py

treeherder/perf/auto_perf_sheriffing/telemetry_alerting/alert_manager.py

Andrej1198

Thanks for addressing my concerns, LGTM

…ment.

…g telemetry bugs.

…s, and bugs.

gmierz · 2025-10-29T14:54:52Z

Another DB migration has landed before this one, so I had to remake the ones we're doing here.

…ozilla#9015) This patch adds a new alert management system for telemetry alerts. Some generic base and utility classes are added directly to the auto_perf_sheriffing folder. These are not specific to telemetry alerting and could be used in other performance sheriffing automation. The concrete classes for telemetry alert management are found in the treeherder/perf/auto_perf_sheriffing folder. These are then integrated into the telemetry detection code in Sherlock through the TelemetryAlertManager and run from TelemetryAlertManager.manage_alerts. The manage_alerts method is defined generically in the AlertManager class. It starts by updating the DB with any changes made in telemetry bugs in Bugzilla - this is only for their resolutions at the moment. After this, bugs are filed for the alerts that are generated for any probes that specify a bug should be filed (by setting the monitor.alert field to True in their probe definition). Once bugs are filed, modifications are made to these bugs and any existing bugs as needed. This currently only modifies the see_also field to associate all bugs filed for the same detection range together - in other words, all the bugs that are part of the same PerformanceTelemetryAlertSummary. At the end of this "bug handling" phase, emails are produced for any alerts that request it (either bugs are produced or emails, but never both to reduce spamming). Finally, it's possible that either the bug modifications or emails fail. In that case, we have a "house keeping" stage where we do retries of the failed alerts on a daily basis.

gmierz requested review from a team, beatrice-acasandrei and esanuandra as code owners October 9, 2025 12:30

gmierz force-pushed the telemetry-alert-manager-comp branch from 44fab00 to 0e26ca1 Compare October 9, 2025 12:51

gmierz requested a review from Andrej1198 October 9, 2025 15:43

Andrej1198 reviewed Oct 24, 2025

View reviewed changes

gmierz force-pushed the telemetry-alert-manager-comp branch from 94d89a2 to 4f7ecc0 Compare October 24, 2025 16:08

gmierz requested a review from Andrej1198 October 24, 2025 16:10

Andrej1198 approved these changes Oct 24, 2025

View reviewed changes

beatrice-acasandrei approved these changes Oct 28, 2025

View reviewed changes

gmierz added 19 commits October 29, 2025 10:43

Get BUG_COMMENTER_API_KEY from .env file if it exists.

05a694c

Update telemetry alert models with additional information.

a0f97cf

Add pytest fixtures for unit tests for generic classes.

ea235b8

Add base EmailManager with unit tests.

c85db71

Add base BugManager with unit tests.

86ecc0a

Add base AlertManager with unit tests.

58a548f

Add base BugSearcher with unit tests.

e570f67

Add init files, and pytest fixtures for concrete telemetry unit tests.

6c0d7b5

Add file that contains utility methods and constants for alert manage…

77c4c9a

…ment.

Add a TelemetryProbe data class to reperesent telemetry probes.

a675659

Add a concrete TelemetryAlert class to represent telemetry alerts.

3615424

Add TelemetryEmailManager for handling telemetry alert emails.

4b32afe

Add TelemetryBugManager for handling filing, commenting, and modifyin…

1069d9d

…g telemetry bugs.

Add TelemetryAlertModifier to handle mirroring bug updates into the DB.

9bea0e6

Add TelemetryBugModifier to handle telemetry bug modifications.

ea574db

Add TelemetryAlertManager for the overall management of alerts, email…

f8e312a

…s, and bugs.

Integrate alert management into telemetry change detection.

00ce333

Don't send alert emails for non-alerting probes.

a5cd28c

Rename get_email_func to get_notify_func.

739307e

gmierz added 4 commits October 29, 2025 10:43

Include alerts to modify as output from modifiers.

a9b9aa7

Remove question-mark from query URL, and reduce line lengths.

7a994d0

Replace query usage with filter in BugSearcher.

dd9cc6d

Rework DB migrations into a single one.

6f2e463

gmierz force-pushed the telemetry-alert-manager-comp branch from 4f7ecc0 to 6f2e463 Compare October 29, 2025 14:54

gmierz merged commit 5bcfc9c into mozilla:master Oct 29, 2025
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug 1942867 - Add an alert management system for telemetry alerts. #9015

Bug 1942867 - Add an alert management system for telemetry alerts. #9015

Uh oh!

gmierz commented Oct 9, 2025

Uh oh!

gmierz commented Oct 17, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Andrej1198 left a comment

Uh oh!

gmierz commented Oct 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Bug 1942867 - Add an alert management system for telemetry alerts. #9015

Bug 1942867 - Add an alert management system for telemetry alerts. #9015

Uh oh!

Conversation

gmierz commented Oct 9, 2025

Uh oh!

gmierz commented Oct 17, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Andrej1198 left a comment

Choose a reason for hiding this comment

Uh oh!

gmierz commented Oct 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants