Skip to content

Nexus cancellation types #981

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

dandavison
Copy link
Contributor

@dandavison dandavison commented Jul 20, 2025

Implements cancellation types for Python Nexus. This PR is mainly tests. These provide coverage for the state machine changes in temporalio/sdk-core#977. See temporalio/sdk-core#911 (comment)

The gist for anyone who needs a refresher is:

  • Abandon - caller workflow never even send a request cancel command. The state machine is such that the operation handle future is resolved with a cancellation exception in a second activation in the same WFT

  • TryCancel - same as Abandon regarding the immediate resolution of the future, but do emit the cancel nexus op request command (maybe it could also have been called RequestAsync or something)

For the next two, one has to keep two concepts clearly distinct: the Nexus cancel handler invocation, and cancellation of the nexus op itself. The typical case is that the handler issues a RequestCancelWorkflowExecution to the workflow that's backing the nexus operation and then when that workflow is eventually cancelled, that triggers cancellation of the nexus op itself (the worfklow reaching a terminal state triggers some nexus stuff server-side)

  • WaitRequested - caller workflow issues the cancel request command, and waits for an event saying that the cancel handler invocation is complete (that event is the new event for which Rust state machine changes were made)

  • WaitCompleted - caller workflow issues the cancel request command and waits for the operation to be cancelled (isn't affected by the new event)

And then finally, it's not one new event; there's two because the cancel handler might succeed or fail. Under WaitRequested the fail event also resolves the nexus operation handle future in the caller workflow, but as a normal nexus error, not as cancelled.

@dandavison
Copy link
Contributor Author

bugbot run

cursor[bot]

This comment was marked as outdated.

@dandavison dandavison force-pushed the dan-0001-nexus-cancellation-types branch from aac8dcd to d5039c2 Compare July 21, 2025 23:38
@dandavison dandavison force-pushed the dan-0001-nexus-cancellation-types branch from d5039c2 to 134072e Compare August 3, 2025 03:58
@dandavison
Copy link
Contributor Author

bugbot run

cursor[bot]

This comment was marked as outdated.

dandavison added a commit that referenced this pull request Aug 3, 2025
The test_cancellation_type tests were hanging indefinitely when run with
the time-skipping environment, causing CI failures. These tests work fine
with the regular environment but have compatibility issues with time-skipping,
similar to other Nexus tests.

Fixes the CI timeout failures seen on PR #981.
dandavison added a commit that referenced this pull request Aug 3, 2025
The test_cancellation_type tests were hanging indefinitely when run with
the time-skipping environment, causing CI failures. These tests work fine
with the regular environment but have compatibility issues with time-skipping,
similar to other Nexus tests.

Fixes the CI timeout failures seen on PR #981.
@dandavison dandavison force-pushed the dan-0001-nexus-cancellation-types branch from 652f2b7 to 771aa96 Compare August 3, 2025 09:09
@dandavison dandavison marked this pull request as ready for review August 4, 2025 18:20
@dandavison dandavison requested a review from a team as a code owner August 4, 2025 18:20
@dandavison dandavison force-pushed the dan-0001-nexus-cancellation-types branch from aeb1b39 to e1319f5 Compare August 4, 2025 18:26
cursor[bot]

This comment was marked as outdated.

@dandavison dandavison force-pushed the dan-0001-nexus-cancellation-types branch 2 times, most recently from d58f06b to 314b0f3 Compare August 13, 2025 18:19
cursor[bot]

This comment was marked as outdated.

@dandavison dandavison force-pushed the dan-0001-nexus-cancellation-types branch from 314b0f3 to 5f1f8c9 Compare August 15, 2025 19:40
@dandavison dandavison marked this pull request as draft August 15, 2025 22:51
@dandavison dandavison force-pushed the dan-0001-nexus-cancellation-types branch from 5f1f8c9 to fc9af07 Compare August 18, 2025 16:29
@dandavison dandavison marked this pull request as ready for review August 18, 2025 16:33
@dandavison dandavison force-pushed the dan-0001-nexus-cancellation-types branch 3 times, most recently from 0e9143c to b1090f3 Compare August 19, 2025 10:18
# We want the cancel handler to be invoked, so this workflow must not close before
# then.
await self.cancel_handler_released.wait()
# TODO: there is technically a race now between (1) caller server writing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to just note somewhere that nobody should use this as a safe example, just in case someone stumbles upon it.

[
(caller_wf, EventType.EVENT_TYPE_NEXUS_OPERATION_CANCEL_REQUESTED),
(caller_wf, EventType.EVENT_TYPE_NEXUS_OPERATION_CANCEL_REQUEST_FAILED),
(handler_wf, EventType.EVENT_TYPE_WORKFLOW_EXECUTION_COMPLETED),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I'm following correctly, even if wait_for_cancellation_completed just works like wait_for_cancellation_requested, this will sometimes succeed, yes?

Copy link
Contributor Author

@dandavison dandavison Aug 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bringing this up. And also related to your comment above about my comment about the race condition. Basically I hadn't quite pushed hard enough on this corner of the tests. I've now added an additional wait on a future/signal, so that the WAIT_REQUESTED assertion

assert op_cancel_request_failed < caller_op_future_resolved < handler_wf_completed

never evaluates equal to the WAIT_COMPLETED assertion

assert handler_wf_completed < caller_op_future_resolved

and removed the race condition TODO.

@dandavison dandavison force-pushed the dan-0001-nexus-cancellation-types branch 2 times, most recently from c90c07b to 8dad318 Compare August 20, 2025 20:10
@dandavison dandavison force-pushed the dan-0001-nexus-cancellation-types branch from 8dad318 to 3c6088a Compare August 21, 2025 08:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants