Selectively disable fused-multiply add (FMA) instructions on GPU? #2120

BenWibking · 2021-06-23T03:33:55Z

BenWibking
Jun 23, 2021
Collaborator

Hi,

I am benchmarking my hydrodynamics code that uses AMReX. I have noticed that the default CUDA floating point compiler settings lead to a directional asymmetry in 2D problems that should be exactly symmetric. Setting --fmad=false fixes the issue and leads to results that are exactly symmetric to every digit in 2D.

However, disabling FMA causes a ~30% performance hit. It appears that I only need to disable FMA instructions for specific kernels that combine information from multiple directions, while leaving the Riemann solver kernels free to use FMA. Is this possible to do in CUDA (or any other architecture)?

Answered by WeiqunZhang

Jun 23, 2021

In GNU Make, the rule for compiling .cpp files is here https://github.com/AMReX-Codes/amrex/blob/development/Tools/GNUMake/Make.rules#L197. It's a generic rule. You can add your own rule in your own make file to override it for a specific file. For example,

$(objEXETempDir)/riemann.o: riemann.cpp
    $(CXX) ......

@ax3l can probably tell you the proper way of doing this in cmake.

View full answer

WeiqunZhang · 2021-06-23T04:01:33Z

WeiqunZhang
Jun 23, 2021
Maintainer

In GNU Make, the rule for compiling .cpp files is here https://github.com/AMReX-Codes/amrex/blob/development/Tools/GNUMake/Make.rules#L197. It's a generic rule. You can add your own rule in your own make file to override it for a specific file. For example,

$(objEXETempDir)/riemann.o: riemann.cpp
    $(CXX) ......

@ax3l can probably tell you the proper way of doing this in cmake.

3 replies

BenWibking Jun 23, 2021
Collaborator Author

I am currently only setup to build with CMake for this code, so if @ax3l could point me to the documentation on how to do this in CMake, that would be very helpful.

BenWibking Jun 23, 2021
Collaborator Author

Ah, nevermind, this actually won't work for my code. All of my kernels are defined by function templates instantiated within (as it turns out) a single .cpp file, so file-by-file setting of the compilation options doesn't help my code.

ax3l Jun 24, 2021
Collaborator

Ah I see, then pragmas (below) are probably the right way to be selective.

For the record, individual file flags can be defined via set_source_files_properties in CMake, e.g.

set_source_files_properties(foo.cpp PROPERTIES COMPILE_FLAGS -Wno-effc++)

https://stackoverflow.com/a/13639476/2719194

The flag value can be potentially combined with a generator expression, filtering only for specific compilers that understand the flag.

WeiqunZhang · 2021-06-23T04:12:33Z

WeiqunZhang
Jun 23, 2021
Maintainer

GCC does have a pragma for controlling optimization of individual functions. https://gcc.gnu.org/onlinedocs/gcc/Function-Specific-Option-Pragmas.html#Function-Specific-Option-Pragmas. But nvcc may not have it. @maxpkatz

5 replies

BenWibking Jun 23, 2021
Collaborator Author

If there is an nvcc __attribute__ for this as well, could you consider creating an AMReX-portable version by adding a AMREX_NO_FMA macro?

BenWibking Jun 23, 2021
Collaborator Author

For gcc on x86-64, it appears that the easiest way to do this is to add an attribute to the function declaration:
__attribute__ ((__target__ ("no-fma")))

BenWibking Jun 23, 2021
Collaborator Author

Just for reference, this attribute is also supported by Clang when building for x86.

maximumcats Jun 24, 2021
Collaborator

As far as I am aware, nvcc does not have equivalent functionality.

BenWibking Jun 24, 2021
Collaborator Author

Ah, that's unfortunate :/ Is there a way to submit a feature request for this? It would be extremely useful for me, but probably also for many AMReX codes.

philip-blakely · 2021-06-23T07:26:38Z

philip-blakely
Jun 23, 2021

From the NVCC documentation, you could write the multiply/add via intrinsics:
https://docs.nvidia.com/cuda/floating-point/index.html#controlling-fused-multiply-add (Section 4.3)
although this seems time consuming to put in place, and will make your code less readable.

1 reply

BenWibking Jun 23, 2021
Collaborator Author

Yes, I agree that this is not a very practical solution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Selectively disable fused-multiply add (FMA) instructions on GPU? #2120

{{title}}

Replies: 3 comments 9 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Selectively disable fused-multiply add (FMA) instructions on GPU? #2120

BenWibking Jun 23, 2021 Collaborator

Replies: 3 comments · 9 replies

WeiqunZhang Jun 23, 2021 Maintainer

BenWibking Jun 23, 2021 Collaborator Author

BenWibking Jun 23, 2021 Collaborator Author

ax3l Jun 24, 2021 Collaborator

WeiqunZhang Jun 23, 2021 Maintainer

BenWibking Jun 23, 2021 Collaborator Author

BenWibking Jun 23, 2021 Collaborator Author

BenWibking Jun 23, 2021 Collaborator Author

maximumcats Jun 24, 2021 Collaborator

BenWibking Jun 24, 2021 Collaborator Author

philip-blakely Jun 23, 2021

BenWibking Jun 23, 2021 Collaborator Author

BenWibking
Jun 23, 2021
Collaborator

Replies: 3 comments 9 replies

WeiqunZhang
Jun 23, 2021
Maintainer

BenWibking Jun 23, 2021
Collaborator Author

BenWibking Jun 23, 2021
Collaborator Author

ax3l Jun 24, 2021
Collaborator

WeiqunZhang
Jun 23, 2021
Maintainer

BenWibking Jun 23, 2021
Collaborator Author

BenWibking Jun 23, 2021
Collaborator Author

BenWibking Jun 23, 2021
Collaborator Author

maximumcats Jun 24, 2021
Collaborator

BenWibking Jun 24, 2021
Collaborator Author

philip-blakely
Jun 23, 2021

BenWibking Jun 23, 2021
Collaborator Author