Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

monitor failing jobs #2740

Open
majamassarini opened this issue Mar 5, 2025 · 1 comment
Open

monitor failing jobs #2740

majamassarini opened this issue Mar 5, 2025 · 1 comment
Labels
area/general Related to whole service, not a specific part/integration. complexity/single-task Regular task, should be done within days. kind/internal Doesn't affect users directly, may be e.g. infrastructure, DB related.

Comments

@majamassarini
Copy link
Member

Not always a failed job is marked as a failed celery task.

Image

Image

To automatically detect this kind of situations we should create new variables (successful builds/tests, failed builds/tests), collect and send them to the pushgateway (as we do for the queued and started builds/tests). And raise an alert when the number of failures is near 100% on a broad time frame (10 minutes?). Or something similar.

@lbarcziova
Copy link
Member

Not always a failed job is marked as a failed celery task.

I think this is correct behaviour, because the task successfully finishes.

To automatically detect this kind of situations we should create new variables (successful builds/tests, failed builds/tests), collect and send them to the pushgateway (as we do for the queued and started builds/tests).

We could maybe just adjust the existing metrics, e.g. copr_builds_finished/test_runs_finished to have status labels.

@nforro nforro added complexity/single-task Regular task, should be done within days. area/general Related to whole service, not a specific part/integration. kind/internal Doesn't affect users directly, may be e.g. infrastructure, DB related. labels Mar 6, 2025
@nforro nforro moved this from new to backlog in Packit Kanban Board Mar 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/general Related to whole service, not a specific part/integration. complexity/single-task Regular task, should be done within days. kind/internal Doesn't affect users directly, may be e.g. infrastructure, DB related.
Projects
Status: backlog
Development

No branches or pull requests

3 participants