From 9168c06741c93cd2c3b1c7b459a1815b0c4275ac Mon Sep 17 00:00:00 2001 From: Chris Beck Date: Mon, 27 Feb 2023 23:56:11 -0700 Subject: [PATCH] improve performance for slice zeroization (issue #743) The purpose of the change is to make calls to `x.as_mut_slice().zeroize()` considerably faster, particularly for types like `[u8; n]`. The reason it becomes faster is that the call to `volatile_set` before this change appears not to be easily optimizable, and (for example) leads to setting bytes one at a time, instead of the compiler consolidating them into SIMD instructions. In the modified code, we don't use `volatile_set`, we instead loop over the slice setting the elements to `Default::default()`, and to ensure that the writes are not optimized out, we use an empty asm block. There is discussion of the correct asm options to use here in the issue. Because the asm block potentially reads from the pointer and makes a syscall of some kind, the compiler cannot optimize out the zeroizing, or it could cause observable side-effects. In the improved code, we only create such an optimization barrier once, rather than after each byte that it is written. The call to `atomic_fence()` is not changed. --- This change may help give users a way to improve performance, if they have to zeroize very large objects, or, frequently have to zeroize many small objects. We tested code-gen here in godbolt (in addition to the tests posted in the github issue) and found that this change is typically enough for llvm to start adding in SIMD optimizations that zero many bytes at once. --- zeroize/src/lib.rs | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/zeroize/src/lib.rs b/zeroize/src/lib.rs index 36631223..afb252a6 100644 --- a/zeroize/src/lib.rs +++ b/zeroize/src/lib.rs @@ -471,7 +471,17 @@ where // object for at least `self.len()` elements of type `Z`. // `self.len()` is also not larger than an `isize`, because of the assertion above. // The memory of the slice should not wrap around the address space. - unsafe { volatile_set(self.as_mut_ptr(), Z::default(), self.len()) }; + for z in self.iter_mut() { + *z = Z::default(); + } + unsafe { + core::arch::asm!( + "/* {ptr} */", + ptr = in(reg) self.as_mut_ptr(), + options(nostack, readonly, preserves_flags), + ); + } + atomic_fence(); } }