Selectively disable fused-multiply add (FMA) instructions on GPU? #2120
-
Hi, I am benchmarking my hydrodynamics code that uses AMReX. I have noticed that the default CUDA floating point compiler settings lead to a directional asymmetry in 2D problems that should be exactly symmetric. Setting However, disabling FMA causes a ~30% performance hit. It appears that I only need to disable FMA instructions for specific kernels that combine information from multiple directions, while leaving the Riemann solver kernels free to use FMA. Is this possible to do in CUDA (or any other architecture)? |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 9 replies
-
In GNU Make, the rule for compiling
@ax3l can probably tell you the proper way of doing this in cmake. |
Beta Was this translation helpful? Give feedback.
-
GCC does have a pragma for controlling optimization of individual functions. https://gcc.gnu.org/onlinedocs/gcc/Function-Specific-Option-Pragmas.html#Function-Specific-Option-Pragmas. But nvcc may not have it. @maxpkatz |
Beta Was this translation helpful? Give feedback.
-
From the NVCC documentation, you could write the multiply/add via intrinsics: |
Beta Was this translation helpful? Give feedback.
In GNU Make, the rule for compiling
.cpp
files is here https://github.com/AMReX-Codes/amrex/blob/development/Tools/GNUMake/Make.rules#L197. It's a generic rule. You can add your own rule in your own make file to override it for a specific file. For example,@ax3l can probably tell you the proper way of doing this in cmake.