-
Notifications
You must be signed in to change notification settings - Fork 270
Defer updating workflow completion metrics until completion accepted by server #2112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
909b34e to
0b7f8f1
Compare
|
|
||
| // WorkflowExecutionContext represents one instance of workflow execution state in memory. Lock must be obtained before | ||
| // calling into any methods. | ||
| WorkflowExecutionContext interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is in internal_public but I don't see it referenced in internalbindings or having any side effects of my changing this interface (some methods already accepted parameters that couldn't be instantiated externally anyways).
@Quinn-With-Two-Ns - any concerns with the altering of this interface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this interface even used? My IDE can't find a us of it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same, I doubt it is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 don't see it being used anywhere
| if errors.As(workflowContext.err, &canceledErr) { | ||
| // Workflow canceled | ||
| metricsHandler.Counter(metrics.WorkflowCanceledCounter).Inc(1) | ||
| metricCounterToIncrement = metrics.WorkflowCanceledCounter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you check if the existing test coverage for these completion metrics is sufficient to catch any potential bugs here? (I can check if you haven't just let me know)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not check, only confirmed existing tests pass and added my check w/ unhandled command that wouldn't have passed before. Doing a quick glance, I don't think any of these completion metrics have any coverage. Do you feel that new tests covering these existing metrics should be in this PR? I don't mind doing it if I must, will just add some time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should have test coverage to make sure these metrics are still being emitted correctly. Unfortunately we lack test coverage of some basic metrics in the Go SDK so we have to keep an eye out. I don't think it adds a lot of complexity to your test to iterate through the few other ways a workflow can complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do as part of this PR (may not be right away though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added test
yuandrew
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, deferring to quinn to approve after the tests he mentioned are added
| switch request := completedRequest.(type) { | ||
| if taskCompletion == nil { | ||
| // should not happen | ||
| panic("unknown request type from ProcessWorkflowTask()") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we panic here instead of logging an error? Seems like Core's dbg_panic would be nice here, where we can panic for tests but only log an error when running an real worker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we panic here instead of logging an error?
Because we always did, see the default case of the type switch below, I just also put it up here to check the situation where taskCompletion could be nil. I have no opinion on whether we should panic though.
What was changed
workflowTaskCompletionstruct to represent workflow task completion instead ofinterface{}applyCompletionMetricsfunc to be called when the task completion is acceptedapplyCompletionMetricswith the workflow completed metrics to be applied on task completion acceptedChecklist