diff --git a/text/3301-atomic-memcpy.md b/text/3301-atomic-memcpy.md new file mode 100644 index 00000000000..946a0f26666 --- /dev/null +++ b/text/3301-atomic-memcpy.md @@ -0,0 +1,291 @@ +- Feature Name: `atomic_memcpy` +- Start Date: 2022-08-14 +- RFC PR: [rust-lang/rfcs#3301](https://github.com/rust-lang/rfcs/pull/3301) +- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) + +# Summary + +This is a proposal to add `AtomicPerByte`, to represent _tearable atomics_. +This makes it possible to properly implement a _sequence lock_ in Rust. + +# The Problem + +It's currently not possible to implement an efficient and perfectly +(theoretically) correct sequence lock in Rust. + +Unlike most locking mechanisms, a sequence lock doesn't prevent a race +to access the data it projects. +Instead, it detects a race only after the load operation already happened, +and retries it if the load operation raced with a write operation. + +A sequence lock in Rust looks something like this: + +```rust +// Incomplete example + +pub struct SeqLock { + seq: AtomicUsize, + data: UnsafeCell, +} + +unsafe impl Sync for SeqLock {} + +impl SeqLock { + /// Safety: Only call from one thread. + pub unsafe fn write(&self, value: T) { + self.seq.fetch_add(1, Relaxed); + write_data(&mut self.data, value, Release); + self.seq.fetch_add(1, Release); + } + + pub fn read(&self) -> T { + loop { + let s1 = self.seq.load(Acquire); + let data = read_data(&self.data, Acquire); + let s2 = self.seq.load(Relaxed); + if s1 & 1 == 0 && s1 == s2 { + return unsafe { assume_valid(data) }; + } + } + } +} +``` + +The `write_data` and `read_data` calls can happen concurrently. +The `write` method increments the counter before and after, +such that the counter is odd during `write_data`. +The `read` function will repeat `read_data` until +the counter was identical and even both before and after reading. +This way, `assume_valid` is only ever called on data that was +not the result of a race. + +A big question is how to implement `write_data`, `read_data`, and `assume_valid` +in Rust in an efficient way while satisfying the memory model. + +The somewhat popular `seqlock` crate and similar implementations found in the ecosystem +all use a regular non-atomic write (preceded by an atomic fence) for writing, +and `ptr::read_volatile` (followed by an atomic fence) for reading. +This "works" "fine", but is technically undefined behavior. + +The C++ and Rust memory model doesn't allow for data races, +so doesn't allow for a data race to be detected after the fact; +that's too late. + +All of the data would have to be written and read through +atomic operations to prevent a data race. +We don't need the atomicity of the data as a whole though; +it's fine if there's tearing, since we re-start on a race anyway. + +Additionally, memory fences technically only "interact" with atomic operations, not with volatile operations. + +# The C++ Solution + +C++'s [P1478] proposes the addition of these two functions to the C++ standard +library to solve this problem: + +```cpp +void *atomic_load_per_byte_memcpy(void *dest, const void *source, size_t, memory_order); +void *atomic_store_per_byte_memcpy(void *dest, const void *source, size_t, memory_order); +``` + +The first one is effectively a series of `AtomicU8::load`s followed by a memory fence, +while the second one is basically a memory fence followed by series of `AtomicU8::store`s. +Except the implementation can be much more efficient. +The implementation is allowed to load/store the bytes in any order, +and doesn't have to operate on individual bytes. + +The memory order can only be relaxed, acquire (for load), and release (for store). +Sequentially consistent ordering for these operations is disallowed, +since it's not obvious what that means for these tearable operations. + +# The Rust Solution + +While C++'s solution can be easily copy-pasted into Rust with a nearly identical signature, +it wouldn't fit with the rest of our atomic APIs. + +All our atomic operations happen through the `Atomic*` types, +and we don't have atomic operations that operate on raw pointers. +(Other than as unstable intrinsics.) + +Adding this functionality as a variant on `copy_nonatomic`, similar to the C++ solution, +would not be very ergonomic an can easily result in subtle bugs causing undefined behavior. + +Instead, I propose to add a `AtomicPerByte` type +similar to our existing atomic types: a `Sync` storage for a `T` +that can be written to and read from by multiple threads concurrently. + +The `SeqLock` implementation above would use this type instead of an `UnsafeCell`. +It'd no longer need an unsafe `Sync` implementation, +since the `AtomicPerByte` type can be shared between threads safely. + +This type has a (safe!) `store` method consuming a `T`, +and a (safe!) `load` method producing a `MaybeUninit`. +The `MaybeUninit` type is used to represent the potentially invalid state +the data might be in, since it might be the result of tearing during a race. + +Only after confirming that there was no race and the data is valid +can one safely use `MaybeUninit::assume_init` to get the actual `T` out. + +```rust +pub struct SeqLock { + seq: AtomicUsize, + data: AtomicPerByte, +} + +impl SeqLock { + /// Safety: Only call from one thread. + pub unsafe fn write(&self, value: T) { + self.seq.fetch_add(1, Relaxed); + self.data.store(value, Release); + self.seq.fetch_add(1, Release); + } + + pub fn read(&self) -> T { + loop { + let s1 = self.seq.load(Acquire); + let data = self.data.load(Acquire); + let s2 = self.seq.load(Relaxed); + if s1 & 1 == 0 && s1 == s2 { + return unsafe { data.assume_init() }; + } + } + } +} +``` + +# Full API Overview + +The `AtomicPerByte` type can be thought of as +the `Sync` (data race free) equivalent of `MaybeUninit`. +It can contain a `T`, but it might be invalid in various ways +due to concurrent store operations. +Its interface resembles a mix of the interfaces of `MaybeUninit` and the atomic types. + +```rust +#[repr(transparent)] +struct AtomicPerByte { inner: UnsafeCell> } + +unsafe impl Sync for AtomicPerByte {} + +impl AtomicPerByte { + pub const fn new(value: T) -> Self; + pub const fn uninit() -> Self; + + pub fn store(&self, value: T, ordering: Ordering); + pub fn load(&self, ordering: Ordering) -> MaybeUninit; + + pub fn store_from(&self, src: &MaybeUninit, ordering: Ordering); + pub fn load_to(&self, dest: &mut MaybeUninit, ordering: Ordering); + + pub fn store_from_slice(this: &[Self], src: &[MaybeUninit], ordering: Ordering); + pub fn load_to_slice(this: &[Self], dest: &mut [MaybeUninit], ordering: Ordering); + + pub const fn into_inner(self) -> MaybeUninit; + + pub const fn as_ptr(&self) -> *const T; + pub const fn as_mut_ptr(&self) -> *mut T; + + pub const fn get_mut(&mut self) -> &mut MaybeUninit; + pub const fn get_mut_slice(this: &mut [Self]) -> &mut [MaybeUninit]; + + pub const fn from_mut(value: &mut MaybeUninit) -> &mut Self; + pub const fn from_mut_slice(slice: &mut [MaybeUninit]) -> &mut [Self]; +} + +impl Debug for AtomicPerByte; +impl From> for AtomicPerByte; +``` + +Note how the entire interface is safe. +All potential unsafety is captured by the use of `MaybeUninit`. + +The load functions panic if the `ordering` is not `Relaxed` or `Acquire`. +The store functions panic if the `ordering` is not `Relaxed` or `Release`. +The slice functions panic if the slices are not of the same length. + +# Drawbacks + +- In order for this to be efficient, we need an additional intrinsic hooking into + special support in LLVM. (Which LLVM needs to have anyway for C++.) + +- It's not immediately obvious this type behaves like a `MaybeUninit`, + making it easy to forget to manually drop any values that implement `Drop`. + + This could be solved by requiring `T: Copy`, or by using a better name for this type. (See alternatives below.) + + Very clear documentation might be enough, though. + +- `MaybeUninit` today isn't as ergonomic as it should be. + + For a simple `Copy` type like `u8` it might be nicer to be able to use types like `&[u8]` + rather than `&[MaybeUninit]`, etc. + (But that's a larger problem affecting many other things, like `MaybeUninit`'s interface, + `Read::read_buf`, etc. Maybe this should be solved separately.) + +# Alternatives + +- Instead of a type, this could all be just two functions on raw pointers, + such as something like `std::ptr::copy_nonoverlaping_load_atomic_per_byte`. + + This means having to use `UnsafeCell` and more unsafe code wherever this functionality is used. + + It'd be inconsistent with the other atomic operations. + We don't have e.g. `std::ptr::load_atomic` that operates on pointers either. + +- Require `T: Copy` for `AtomicPerByte`, such that we don't need to worry about + duplicating non-`Copy` data. + + There are valid use cases with non-`Copy` data, though, such as [in crossbeam-deque](https://github.com/crossbeam-rs/crossbeam/blob/2d9e7e0f81d3dd3efb1975b6379ea8b05fcf9bdd/crossbeam-deque/src/deque.rs#L60-L78). + Also, not all "memcpy'able" data is always marked as `Copy` (e.g. to prevent implicit copies). + +- Leave this to the ecosystem, outside of the standard library. + + Since this requires special compiler support, a community crate would + have to use (platform specific) inline assembly + or (probably technically unsound) hacks like volatile operations. + +- Use a new `MaybeTorn` instead of `MaybeUninit`. + + `AtomicPerByte` doesn't _have_ to support uninitialized bytes, + but it does need a wrapper type to represent potentially torn values. + + If Rust had a `MaybeTorn`, we could make it possible to load types like `[bool; _]` or even `f32` without any unsafe code, + since, for those types, combining bytes from different values always results in a valid value. + + However, the use cases for this are very limited, it would require a new trait to mark the types for which this is valid, + and it makes the API a lot more complicated or verbose to use. + + Also, such a API for safely handling torn values can be built on top of the proposed API, + so we can leave that to a (niche) ecosystem crate. + +- Don't allow an uninitialized state. + + Even if we use `MaybeUninit` to represent a 'potentially torn value', + we could still attempt to design an API where we do not allow an uninitialized state. + + It might seem like that results in a much simpler API with `MaybeUninit` replaced by `T` in + methods like `into_inner()` and `get_mut()`, but that is not the case: + + As long as `store()` can be called concurrently by multiple threads, + it is not only the `load()` method that can result in a torn value, + since the `AtomicPerByte` object itself might end up storing a torn value. + + Therefore, even if we disallow uninitialized values, + every method will still have `MaybeUninit` in its signature, + at which point we lose basically all benefits of removing the uninitialized state. + + Removing the uninitialized state does result in a big downside for users who need to add that state back, + as the interface of a `AtomicPerByte>` would result in doubly wrapped `MaybeUninit>` in many places, + which is can be quite unergonomic and confusing. + +# Unresolved questions + +- Should we require `T: Copy`? + + There might be some valid use cases for non-`Copy` data, + but it's easy to accidentally cause undefined behavior by using `load` + to make an extra copy of data that shouldn't be copied. + +- Naming: `AtomicPerByte`? `TearableAtomic`? `NoDataRace`? `NotQuiteAtomic`? + +[P1478]: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p1478r7.html