-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add backoff strategy to auto retry of failing CI jobs #289
Comments
I can indeed see a bunch of lines like this in the log:
Then:
Then:
Then:
The retry threshold is not reinitialized because it is a count that is provided by GitLab. It doesn't distinguish who did the retry. However, adopting an exponential backoff strategy based on the number of retries should be quite doable. In this case, we may be able to increase the maximum number of retries. I don't think that it is necessary to write an algorithm as complex as the one proposed by Google though. We do have some exponential backoff strategies in place in other parts of the code already: Lines 2594 to 2621 in 77b3898
|
Thanks @Zimmi48 for your feedback! 👍👍
OK, but then I don't see why it would be impossible to implement the feature I was suggesting: assuming the let should_auto_retry count =
(count - 1) mod (max_number_of_retry + 1) < max_number_of_retry as a result
IMO, it would be great to be able to combine this modular-arithmetic idea with the (5s, 25s, 125s) backoff strategy you mention. WDYT? |
Relying on modular arithmetic for this purpose is really smart 😮 |
Thanks @Zimmi48 ! BTW while we are on it, even if we are sure there will be no "looping", e.g. if let should_auto_retry count =
let c = count - 1 in
let m = max_number_of_retry + 1 in
let q, r = c / m, c mod m in
r < max_number_of_retry && q < max_number_of_retries_series but YMMV! |
Copied from Zulip:
|
FTR, today (2024-07-02) in a 1-job pipeline, gitlab.inria.fr failed with: https://gitlab.inria.fr/math-comp/docker-mathcomp/-/jobs/4509247
Restarted the job manually |
FYI @Zimmi48 : @palmskog and I we got new 502 errors at I believe this form of the error message is not covered in Lines 510 to 521 in 53ef64c
do you think we should add Edit: I opened a PR with a more general regexp. |
Salut @Zimmi48
FYI despite the line https://github.com/coq/bot/blob/master/src/actions.ml#L545
over the 49 jobs of docker-mathcomp https://gitlab.inria.fr/math-comp/docker-mathcomp/-/pipelines/833107/
there's always a dozen of jobs that fail, and that we have to retry by hand
cf. the list of all jobs https://gitlab.inria.fr/math-comp/docker-mathcomp/-/pipelines/833107/builds
Three questions:
|| test "unexpected status code .*: 401 Unauthorized"
has indeed be used according to the logs, e.g. after this job deploy_1_2.0.0-coq-8.16 https://gitlab.inria.fr/math-comp/docker-mathcomp/-/jobs/3279705With kind regards
The text was updated successfully, but these errors were encountered: