Don't immediately quit when server is unavailable on the a heartbeat #7

dhrp · 2024-01-09T12:13:21Z

When the server is shortly unavailable during a heartbeat, the client currently panics and quits. Instead it should retry for up to some time (the task timeout duration) and only then really quit.

This should make server restarts less risky.

Additionally; currently when the heartbeat fails the go process exits, but the subprocess does not. This may ultimately cause the task to be completed twice, unexpectedly. We should either keep trying until the task is successfully marked failed or succeeded..

Metamess · 2024-11-13T09:58:04Z

FWIW: Judging from the logs, the moulin server took 3m30s to become available again this morning. This would suggest that a retry window of, say, 5 minutes would have been sufficient (for this morning's specific incident at least)

dhrp linked a pull request Jan 12, 2024 that will close this issue

add retryPolicy stub #14

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't immediately quit when server is unavailable on the a heartbeat #7

Don't immediately quit when server is unavailable on the a heartbeat #7

dhrp commented Jan 9, 2024 •

edited

Loading

Metamess commented Nov 13, 2024

Don't immediately quit when server is unavailable on the a heartbeat #7

Don't immediately quit when server is unavailable on the a heartbeat #7

Comments

dhrp commented Jan 9, 2024 • edited Loading

Metamess commented Nov 13, 2024

dhrp commented Jan 9, 2024 •

edited

Loading