-
Notifications
You must be signed in to change notification settings - Fork 126
fix data race in module reload tests #5433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix data race in module reload tests #5433
Conversation
cli/progress_manager.go
Outdated
| pm.currentSpinner.Success(" " + prefix + message + elapsed) | ||
| pm.currentSpinner = nil | ||
| // Give the spinner goroutine time to finish to avoid race conditions | ||
| time.Sleep(10 * time.Millisecond) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know what the root cause is? I feel this isn't a fix but rather just a guess at making something less likely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops I did not mean to commit this. It was troubleshooting originally. I'll remove this and the other two below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks reasonable to me, modulo Dan's comment.
| map[string]any{ | ||
| moduleFlagPath: manifestPath, generalFlagPartID: "part-123", | ||
| moduleBuildFlagNoBuild: true, moduleFlagLocal: true, | ||
| generalFlagNoProgress: true, // Disable progress spinner to avoid race conditions in tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems the progress spinner is considered harmful. Did we identify what the race is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I missed that part of your comment earlier! It looks like the race condition is internal to the pterm library, specifically in the Success() call. I tried adding a mutex wrapper around all calls to the library, for example the following method. It still resulted in a data race:
func (s *synchronizedSpinner) Success(message ...any) {
s.mu.Lock()
defer s.mu.Unlock()
if s.spinner == nil {
return
}
s.spinner.Success(message...)
}
I found an open GH issue related to this the problem pterm/pterm#482.
When running the CLI outside of tests, there isn't any issues (I tested it a lot over the past 2 weeks) while running viam module reload.
Will think about it. There isn't really good libs for this in golang.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, all cleaned up and made sure tests passed in --race mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm bought in on the approach. Now I'm just curious why there's a bunch of other code changes as part of the PR.
cli/module_reload_test.go
Outdated
| t.Run("addsServiceWhenMissing", func(t *testing.T) { | ||
| part, _ := vc.getRobotPart("id") | ||
| _, err := addShellService(cCtx, vc, logging.NewTestLogger(t), part.Part, false) | ||
| // Create isolated setup for this subtest to avoid shared state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I buy that the underlying library is the cause and that disabling it serves our goals here.
What's with these other changes? Were there other races not reported by the failure I saw that this fixes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I was too aggressive with trying out different mechanisms to avoid the race. My bad, I will cleanup my PR and delete everything except setting the --no-progress flag.
cli/module_reload_test.go
Outdated
| ) | ||
|
|
||
| err = reloadModuleActionInner(cCtx, vc, parseStructFromCtx[reloadModuleArgs](cCtx), logger, false) | ||
| // Create isolated logger for this subtest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this is done elsewhere -- but do we know what the consequence is of not creating a separate logger? And just using the top-level test logger?
My last PR introduced data racing issue in tests that wasn't seen during development. This PR suppresses the status spinners while tests are running. There are separate tests that exist in
progress_manager_test.goto test the spinner functionality, so I think its fine to not test as the higher level.