You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When the server is shortly unavailable during a heartbeat, the client currently panics and quits. Instead it should retry for up to some time (the task timeout duration) and only then really quit.
This should make server restarts less risky.
Additionally; currently when the heartbeat fails the go process exits, but the subprocess does not. This may ultimately cause the task to be completed twice, unexpectedly. We should either keep trying until the task is successfully marked failed or succeeded..
The text was updated successfully, but these errors were encountered:
dhrp
linked a pull request
Jan 12, 2024
that will
close
this issue
FWIW: Judging from the logs, the moulin server took 3m30s to become available again this morning. This would suggest that a retry window of, say, 5 minutes would have been sufficient (for this morning's specific incident at least)
When the server is shortly unavailable during a heartbeat, the client currently panics and quits. Instead it should retry for up to some time (the task timeout duration) and only then really quit.
This should make server restarts less risky.
Additionally; currently when the heartbeat fails the go process exits, but the subprocess does not. This may ultimately cause the task to be completed twice, unexpectedly. We should either keep trying until the task is successfully marked failed or succeeded..
The text was updated successfully, but these errors were encountered: