-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Return DriverStatus::timeout when appropriate #1135
Conversation
…nstead of complete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Given that it's changing behavior (though, in fact, now doing what it was supposed to do),I think added a short entry to the Changelog would be good before merging.
It seems the macos test consistently fails -- whereas it passed for #1047 that just went in yesterday. |
The sparse-advection example returns |
I can change it to return |
Another option would be to add a an error handler to the
Then the |
I think this is a good idea. |
I added an error handler to |
@@ -152,7 +152,7 @@ DriverStatus EvolutionDriver::Execute() { | |||
pmesh->UserWorkAfterLoop(pmesh, pinput, tm); | |||
} | |||
|
|||
DriverStatus status = DriverStatus::complete; | |||
DriverStatus status = tm.KeepGoing() ? DriverStatus::timeout : DriverStatus::complete; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused now.
How can we ever set this to timeout
here?
Above if (status != TaskListStatus::complete) {
means we're already returning with failed.
Otherwise, there's just one break in the while loop (checking with final
), and with final is a clean exit and not a timeout or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CheckSignalFlags
returns OutputSignal::final
if any of the signalflag
s are non-zero, which I think catches the wall time alarm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. My bad. I misinterpreted timeout
as a failure mode (e.g., a communication timeout when we hit the max number of receives tried).
In AthenaPK, we're currently not even passing the result of the driver (which I should probably update).
I'll push a minor update to the changelog momentarily (indicating that this is potentially breaking downstream if the driver status is passed as final return code, which now will result in a failed run despite the run having exited cleanly) and then enable automerge.
PR Summary
I don't think
DriverStatus::timeout
was ever returned -- onlyfailed
andcomplete
were. This simply checks if the driver loop exited and iftm.KeepGoing()
returnstrue
, then returnDriverStatus::timeout
instead ofDriverStatus::complete
.Useful if downstream codes want to do something different for a
timeout
.Breaking behavior
If a downstream code passes the driver status as final exit status, then this new version will result in a failed application run even though the code exited gracefully (following the walltime limit).
PR Checklist