-
Notifications
You must be signed in to change notification settings - Fork 78
[DO NOT MERGE] Use Temporal Failures for Nexus APIs #627
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| ChildWorkflowExecutionFailureInfo child_workflow_execution_failure_info = 12; | ||
| NexusOperationFailureInfo nexus_operation_execution_failure_info = 13; | ||
| NexusHandlerFailureInfo nexus_handler_failure_info = 14; | ||
| NexusSDKOperationFailureInfo nexus_sdk_operation_failure_info = 15; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am worried about adding this, can we confirm this error won't end up as the cause of a nexus operation failure in history?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the idea is that it would only be used in non-workflow callers and in handlers. But it may still show up in the cause chain in workflow, just not the top level failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would this be part of the cause chain in a workflow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since any failure can appear as a cause, there's nothing preventing users from setting OperationError as the cause of an ApplicationError or HandlerError. I don't expect this to typically happen though.
Maybe when we have non-workflow callers and your operation handler uses the client to forward the operation and the handler wraps that error. Here's an example:
func (*myHandler) StartOperation(...) (...) {
sc := temporalnexus.GetClient(...).NexusServiceClient(...)
_, err := sc.StartOperation(...)
return fmt.Errorf("failed to forward call: %w", err)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's true, my main concern is we don't change the current error chains type users see in workflows. Can we confirm that will remain the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, those will stay the same except for that we won't construct an implicit ApplicationError as the cause when users call construct a handler error with a message and both caller and handler have been upgraded.
|
|
||
| // Representation of the Nexus SDK OperationError object. | ||
| message NexusSDKOperationFailureInfo { | ||
| string state = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is "state" here? May need just a quick comment about what's accepted or what it refers to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's failed | canceled per the Nexus spec. We leave it as a string to allow adding statuses on the Nexus side without requiring a Temporal upgrade.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Worth adding a comment here saying this is an enum defined in the Nexus spec
| // aip.dev/not-precedent: Not following linter rules. --) | ||
| google.protobuf.Timestamp scheduled_time = 2; | ||
|
|
||
| Capabilities capabilities = 100; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who is responsible for setting this? I assume the server correct? So is this Nexus Request message only expected to ever be constructed by the server? If this is a Temporal-server-set-only thing, should it instead be on PollNexusTaskQueueResponse?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either of these are constructed only by the server. No strong opinion on where this should go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of being on the poll response myself, but I similarly do not have a strong opinion
| UnsuccessfulOperationError operation_error = 3; | ||
| // The operation completed unsuccessfully (failed or canceled). | ||
| // Failure object must contain a NexusSDKOperationFailureInfo object. | ||
| temporal.api.failure.v1.Failure failure = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May want to say if this is set, the operation_error is ignored, but not that important
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's already a oneof so that's not possible.
0066531 to
225b5bc
Compare
Also add a failure representation of the Nexus SDK's OperationError.
225b5bc to
7b57017
Compare
Also add a failure representation of the Nexus SDK's OperationError.
The main motivation for this change is for consistency with the rest of the Temporal APIs and to make payload visiting in proxies simpler.
I do not intend to merge this PR until I have the corresponding Go SDK and server PRs.
I do want to get early feedback on this direction before I polish the work on the two remaining PRs.