-
Notifications
You must be signed in to change notification settings - Fork 115
Nexus cancellation types #981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
bugbot run |
aac8dcd
to
d5039c2
Compare
d5039c2
to
134072e
Compare
bugbot run |
The test_cancellation_type tests were hanging indefinitely when run with the time-skipping environment, causing CI failures. These tests work fine with the regular environment but have compatibility issues with time-skipping, similar to other Nexus tests. Fixes the CI timeout failures seen on PR #981.
The test_cancellation_type tests were hanging indefinitely when run with the time-skipping environment, causing CI failures. These tests work fine with the regular environment but have compatibility issues with time-skipping, similar to other Nexus tests. Fixes the CI timeout failures seen on PR #981.
652f2b7
to
771aa96
Compare
aeb1b39
to
e1319f5
Compare
d58f06b
to
314b0f3
Compare
314b0f3
to
5f1f8c9
Compare
5f1f8c9
to
fc9af07
Compare
0e9143c
to
b1090f3
Compare
# We want the cancel handler to be invoked, so this workflow must not close before | ||
# then. | ||
await self.cancel_handler_released.wait() | ||
# TODO: there is technically a race now between (1) caller server writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to just note somewhere that nobody should use this as a safe example, just in case someone stumbles upon it.
[ | ||
(caller_wf, EventType.EVENT_TYPE_NEXUS_OPERATION_CANCEL_REQUESTED), | ||
(caller_wf, EventType.EVENT_TYPE_NEXUS_OPERATION_CANCEL_REQUEST_FAILED), | ||
(handler_wf, EventType.EVENT_TYPE_WORKFLOW_EXECUTION_COMPLETED), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I'm following correctly, even if wait_for_cancellation_completed
just works like wait_for_cancellation_requested
, this will sometimes succeed, yes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for bringing this up. And also related to your comment above about my comment about the race condition. Basically I hadn't quite pushed hard enough on this corner of the tests. I've now added an additional wait on a future/signal, so that the WAIT_REQUESTED
assertion
assert op_cancel_request_failed < caller_op_future_resolved < handler_wf_completed
never evaluates equal to the WAIT_COMPLETED
assertion
assert handler_wf_completed < caller_op_future_resolved
and removed the race condition TODO.
c90c07b
to
8dad318
Compare
8dad318
to
3c6088a
Compare
Implements cancellation types for Python Nexus. This PR is mainly tests. These provide coverage for the state machine changes in temporalio/sdk-core#977. See temporalio/sdk-core#911 (comment)
The gist for anyone who needs a refresher is:
Abandon
- caller workflow never even send a request cancel command. The state machine is such that the operation handle future is resolved with a cancellation exception in a second activation in the same WFTTryCancel
- same asAbandon
regarding the immediate resolution of the future, but do emit the cancel nexus op request command (maybe it could also have been called RequestAsync or something)For the next two, one has to keep two concepts clearly distinct: the Nexus cancel handler invocation, and cancellation of the nexus op itself. The typical case is that the handler issues a
RequestCancelWorkflowExecution
to the workflow that's backing the nexus operation and then when that workflow is eventually cancelled, that triggers cancellation of the nexus op itself (the worfklow reaching a terminal state triggers some nexus stuff server-side)WaitRequested
- caller workflow issues the cancel request command, and waits for an event saying that the cancel handler invocation is complete (that event is the new event for which Rust state machine changes were made)WaitCompleted
- caller workflow issues the cancel request command and waits for the operation to be cancelled (isn't affected by the new event)And then finally, it's not one new event; there's two because the cancel handler might succeed or fail. Under
WaitRequested
the fail event also resolves the nexus operation handle future in the caller workflow, but as a normal nexus error, not as cancelled.