Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Struct members passed to sync/atomic calls need to be 64-bit aligned on ARM and MIPS-32 #158

Open
shaunco opened this issue Nov 20, 2021 · 5 comments

Comments

@shaunco
Copy link
Contributor

shaunco commented Nov 20, 2021

Per the Go docs for sync/atomic at https://pkg.go.dev/sync/atomic#pkg-note-BUG:

On ARM, 386, and 32-bit MIPS, it is the caller's responsibility to arrange for 64-bit alignment of 64-bit words accessed atomically. The first word in a variable or in an allocated struct, array, or slice can be relied upon to be 64-bit aligned.

taskq uses sync/atomic operations on quite a few struct members that are 32 bits in size, which in turn makes them 32-bit aligned in memory. Under heavy stress on ARM64 and MIPS-32 platforms, we have seen quite a few instances of this causing issues.

Since none of the int32/uint32 values are repeated (that is, they are all just a single value rather than in arrays), the easiest fix for this is to simply make them int64 values and update the corresponding sync/atomic calls. Making this change has minimal impact on overall memory usage - approximately 100 bytes (25 single instance int32/uint32 variables change to int64/uint64).

@vmihailenco
Copy link
Owner

it is the caller's responsibility to arrange for 64-bit alignment of 64-bit words accessed atomically

That is why we use 32-bit words (uint32), not uint64 as you suggest. AFAIK uint32 can be safely used with atomic.

@shaunco
Copy link
Contributor Author

shaunco commented Nov 21, 2021

Unfortunately it is NOT safe due to the bug, which is why the sync/atomic docs point that out. The very first uint32 in a struct is safe to pass, but the remaining ones alternate between being aligned on a 32-bit boundary and being aligned on a 64-bit boundary.

Slightly more info in https://pkg.go.dev/golang.org/x/tools/go/analysis/passes/atomicalign?utm_source=godoc (not part of go vet yet)

@vmihailenco
Copy link
Owner

I believe what you are quoting only applies to 64-bit numbers. Previously, I had to switch from uint64 to uint32 to make the code work.

You could try to reproduce the problem with race detector. It is easily reproducible on 386 with uint64, but uint32 passes the check.

@shaunco
Copy link
Contributor Author

shaunco commented Nov 21, 2021

We were seeing very frequent memory corruption leading to pretty random panics on ARM Cortex-A72 running 64-bit Yocto Dunfell (3.1) with the 32-bit atomic operations and variables. We made this change in July and have not had any memory corruption or random panics on ARM since. We had sat on it for the last 5 months so we could keep observing production systems with the change to ensure this fixed things.

That said, looking at the atomicalign source code, they do only check for use of the 64-bit atomic methods.

Let this PR sit for this week and I will switch back to the 32-bit atomic methods and variables to see if the corruption and random panics come back.

@vmihailenco
Copy link
Owner

👍 Just in case, you can easily run race detector for 386 platform:

env GOOS=linux GOARCH=386 go test -race

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants