-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Safety concerns regarding compiler optimizations that introduce FPU/SIMD code that may bypass kernel_fpu_begin/end() guards designed to prevent the kernel from clobbering userspace FP/SIMD register context #95
Comments
I'm afraid most of what you're citing above is over my head. I think I understand at a conceptual level, but not enough to know the implications of adding flags. @sirlucjan @AdelKS @montjoie ? |
Hey! If you check this line in the kernel's make file https://github.com/gregkh/linux/blob/master/arch/x86/Makefile#L70 #
# Prevent GCC from generating any FP code by mistake.
#
# This must happen before we try the -mpreferred-stack-boundary, see:
#
# https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53383
#
KBUILD_CFLAGS += -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx
KBUILD_RUSTFLAGS += -Ctarget-feature=-sse,-sse2,-sse3,-ssse3,-sse4.1,-sse4.2,-avx,-avx2 It is already adding the safeguards you are referring to, so yeah the patch in this repo does not actually end up enabling those the CPU instructions. This repo would've been flooded with issues long time ago haha. |
Thanks for pointing that out, @AdelKS |
I just went and checked the patch @jevinskie referenced and he is aware of those lines in the It doesn't seem to be documented in GCC's docs but Here's an example in Godbolt |
I lost a comment mentioning I did a -march=native -Ofast -ftree-vectorize -save-temps build with a defconfig trying to goad gcc into being super aggressive and didn’t see any ymm or zmm registers generated. Still need to test with an allyes config. I had a suspicion that a -mno-avx might inhibit further SIMD usage as @AdelKS points out but I’ll have to dig deeper into gcc and clang to see if that’s truly/always the case. |
I also figured some sort of issue would have popped up by now but I’m just a bit paranoid. |
I think the Linux kernel is relying on the fact that |
I was wondering how -mno-avx ended up in there when -march=core2 and -march=atom (alias for Bonnell) are the latest specific µarchs the kernel provides options for and they don’t have AVX. Turns out it was paranoia. Still digging into if -mno-EARLIERFEATURE implies all of the -mno-LATERFEATUREs in both GCC and Clang. I hope to have a definitive answer today since chasing down and adding all of them would be a pain (there are a lot) and it is a moving target. Not to mention old compilers will probably scoff at -mno-avx10.1 so you’d have to introduce build time make calls to see if they’re supported by the compiler (as used to be the case for -mno-avx before the minimum GCC/Clang versions were bumped and checking for support was unnecessary). Though I imagine users of this patch set tend to use newer compilers as well. ;) |
We are also disabling avx2 and avx512 via the makefile for savety, since we provide x86-64-v3 and x86-64-v4 kernels: We havent noticed any performance degradation from this, yet. |
Howdy!
Nifty patch set! I’d like to use it since I yak-shaved essentially the same thing for Rocket Lake a couple days ago before I found this repo.
I am concerned though that without adding extra feature disable flags like
-mno-avx2 -mno-avx512f
flags, the compilers might sneak in use of these instructions and bypass thekernel_fpu_begin/end()
guards that, to my understanding, are used to ensure the kernel doesn’t clobber the userspace FPU/SIMD context. That is, in-kernel use of FPU/SIMD must be explicitly handled via these calls (orkernel_fpu_begin_mask()
) to ensure the userspace context is saved and restored before return to userland. Other related identifiers areTIF_NEED_FPU_LOAD
,fpregs_mark_activate()
,switch_fpu_return()
,__cpu_invalidate_fpregs_state()
, etc.And example of kernel code using AVX2 illustrating the careful preservation of userspace context can be found in
nft_set_pipapo_avx2.c
’snft_pipapo_avx2_lookup()
.While it would be cool to let the compiler run wild and utilize all the extensions available to it for a given µarch, I think that would require compiler modifications/plugins to call the appropriate kernel API’s when it wants to use instructions that modify the FPU/SIMD register context. That doesn’t exist today and it’s unclear to me if it can be correctly inferred where it would be safe to use given that it seems that use of those APIs disables preemption in at least some cases. Additionally, even if it was possible, it could end up being that many more context switches into the kernel would start saving/restoring the FPU/SIMD context and the overhead of that would negate any gains achieved by the use of the extensions in kernel mode.
Nevertheless, even if the use of
-mo-xxx
flags prevents use of automatic SIMD optimizations in the kernel, the-march
flags should still provide some benefit from better scheduling models and use of other features like enabling fastrep movsb
codegen that don’t clobber FPU/SIMD context.Please reference my patch (modifications to
cflags
andrustflags
inx86/Makefile
and a modification tocflags
inx86/Makefile_32.cpu
that involves amake
call tune
that I haven’t dived into and groked yet) for what I hope is a safe way to implement these-march
enhancements, at least for therocketlake
case (though I could very well be missing other necessary-mno-xxx
flags!).The text was updated successfully, but these errors were encountered: