[SYCL] Add support for -foffload-fp32-prec-div/sqrt options. #15836

zahiraam · 2024-10-23T15:39:51Z

Add support for options -f[no]-offload-fp32-prec-div and -f[no-]-offload-fp32-prec-sqrt.
These options are added to allow users to control whether fdiv and sqrt operations in offload device code are required to return correctly rounded results. In order to communicate this to the device code, we need the front end to generate IR that reflects the choice.

When the correctly rounded setting is used, we can just generate the fdiv instruction and llvm.sqrt intrinsic, because these operations are required to be correctly rounded by default in LLVM IR.

When the result is not required to be correctly rounded, the front end should generate a call to the llvm.fpbuiltin.fdiv or llvm.fpbuiltin.sqrt intrinsic with the fpbuiltin-max-error attribute set. For single precision fdiv, the setting should be 2.5. For single-precision sqrt, the setting should be 3.0.

If the -ffp-accuracy option is used, we should issue warnings if the settings conflict with an explicitly set -foffload-fp32-prec-div or -foffload-fp32-prec-sqrt option.

to be applied to OpenMP too.

clang/lib/Driver/ToolChains/Clang.cpp

mdtoguchi · 2024-10-29T18:08:06Z

clang/lib/Driver/ToolChains/Clang.cpp

+    if (Args.getLastArg(options::OPT_fno_offload_fp32_prec_div))
+      CmdArgs.push_back("-fno-offload-fp32-prec-div");
+    else
+      CmdArgs.push_back("-foffload-fp32-prec-div");


Suggested change

if (Args.getLastArg(options::OPT_fno_offload_fp32_prec_div))

CmdArgs.push_back("-fno-offload-fp32-prec-div");

else

CmdArgs.push_back("-foffload-fp32-prec-div");

if (!Args.hasFlag(option::OPT_foffload_fp32_prec_div,

option::OPT_fno_offload_fp32_prec_div, true))

CmdArgs.push_back("-fno-offload-fp32-prec-div");

Since -foffload-fp32-prec-div is default

mdtoguchi · 2024-10-29T18:08:26Z

clang/lib/Driver/ToolChains/Clang.cpp

+    if (Args.getLastArg(options::OPT_fno_offload_fp32_prec_sqrt))
+      CmdArgs.push_back("-fno-offload-fp32-prec-sqrt");
+    else
+      CmdArgs.push_back("-foffload-fp32-prec-sqrt");


similar comment to above.

elizabethandrews

@premanandrao can you review this please?

function instead of adding a JobAction to handle it.

MrSidims

LGTM assuming that the code doesn't affect C stdlib's div function, see the comment above.

MrSidims · 2024-11-21T20:05:06Z

clang/test/CodeGenSYCL/offload-fp32-div-sqrt.cpp

+      // ROUNDED-SQRT-PREC-DIV: call reassoc nnan ninf nsz arcp afn float @llvm.fpbuiltin.sqrt.f32(float {{.*}}) #[[ATTR_SQRT:[0-9]+]]
+      // ROUNDED-DIV-PREC-SQRT: call reassoc nnan ninf nsz arcp afn spir_func nofpclass(nan inf) float @sqrt(float noundef nofpclass(nan inf) {{.*}})
+      // ROUNDED-DIV-ROUNDED-SQRT-FAST: call reassoc nnan ninf nsz arcp afn float @llvm.fpbuiltin.sqrt.f32(float {{.*}}) #[[ATTR_SQRT:[0-9]+]]
+      // LOW-PREC-DIV: call float @llvm.fpbuiltin.sqrt.f32(float {{.*}}) #[[ATTR_SQRT_LOW:[0-9]+]]


Just want to check if I understand this correctly. In this case we pass: -fno-offload-fp32-prec-div -ffp-builtin-accuracy=high flags. And 1.0 ULP fpbuiltin-error attribute for llvm.fpbuiltin.sqrt.f32 was generated in response of ffp-builtin-accuracy=high flag and we don't expect here precise calculations as high != precise, right?

That's correct.

clang/lib/CodeGen/CGCall.cpp

zahiraam · 2024-11-25T16:12:02Z

The FE work has been approved. But this is an attempt to fix the LIT test DeviceLib/cmath_test.cpp. I am not really sure the change is correct. Will revert it LIT fail persists.

zahiraam · 2025-01-31T12:32:18Z

@MrSidims Your PR fixed the issue with cmath_test.cp. Thanks!
@intel/llvm-gatekeepers Can this be merged in please? Thanks.

MrSidims · 2025-02-17T13:02:08Z

clang/test/Driver/offload-fp32-div-sqrt.cu

+
+// CHECK-NOT: "-foffload-fp32-prec-div"
+// CHECK-NOT: "-foffload-fp32-prec-sqrt"
+// FPACC: "-ffp-builtin-accuracy=high"


@zahiraam @mdtoguchi hi, I'm revisiting some parts of the feature implementation, to ensure, that everything works right. Here I have a question: if I'm not mistaken -ffp-builtin-accuracy=high means, that all fp-builtins will be calculated with high precision. Shouldn't we restrict it to only sqrt/div fpbuiltins (see example in clang/test/Driver/offload-fp32-div-sqrt.cpp from above)?

@npmiller I see that long ago you have added a similar option "fsycl-fp32-prec-sqrt" that currently works for CUDA and HIP. I do believe the appropriate option added in this patch should be merged with your option (if you agree or don't agree with it please place your thoughts in #17033 ), but I've also noted, that way of handling those options are very different. In your patch you pass to the compiler an alias to -prec-sqrt=true to compiler resulting in metadata (that, I guess), is handled by CUDA backend. This patch instead makes clang to generate llvm.fpbuiltin.sqrt with max-error=0.5 to be generated in the response of this option, which is being lowered to llvm.sqrt by code added here , which seem to be also a correct approach: https://godbolt.org/z/5TKxbzcKE . If we decide to merge the options, guess we would either need to remove one of the mechanisms for CUDA or duplicate them, what would be your preference?

@MrSidims The options -f[no]offload-fp32-prec-div/sqrt are only available when using -fsycl. In a cuda environment the options have no effect.

I think it would make sense to merge the options.

However, we need to keep the existing way the flag is implemented, because CUDA ships with a bitcode library that we link against our kernels, which provides implementations for some of the math built-ins.

That library makes use of the metadata setup by the current way the flag is handled to pick out between precise and approx versions of sqrt in different math built-ins. So we could maybe use llvm.fpbuiltin.sqrt for direct calls to sqrt from the kernel, but we'll still need to setup the metadata for the bitcode library, so it's probably easier to keep it as-is.

Note that there's a similar mechanism for HIP, although I'm not sure we use it as much, but we probably should keep the current setup there to make sure the metadata is set properly.

@npmiller thanks!

@zahiraam may be I'm not appropriately putting '=' between CUDA and NVPTX environments for SYCL or missing something else. But we had observed a regression in one of CUDA math tests with this very patch, hence the options have affect on it, aren't they (at least -fno-offload-fp32-prec)?

@MrSidims If the expected behavior for the failing test is to generate llvm-fpbuiltin.sqrt.f32 and llvm-fpbuiltin.fdiv.f32, then we should have fno-offload-fp32-prec-div/sqrt expand to cc1 option fno-offload-fp32-prec-div/sqrt when compiling cuda code. Currently that is the case only when compiling sycl code.

@zahiraam I mean that in this very PR in this pre-commit run the following test: SYCL :: sycl-in-tree/DeviceLib/~cmath_test.cpp in CUDA environment has failed due to llvm.fpbuiltin.fdiv.f32 not having lowering in AltMathLib. Thus I make a conclusion, that these options like they are defined now has affect on CUDA, aren't they? What I was trying to understand if such affect is limited only to intrinsics generation or "// FPACC: "-ffp-builtin-accuracy=high"" (from this test) is also part of this effect.

Discussed offline. f[no]-offload-fp32-prec-div/sqrt is not responsible for any further options propagation for CUDA and HIP. It only affects whether llvm.fpbuiltin.div/sqrt with max-error=2.5/3.0 is generated or 'standard' LLVM instructions/intrinsics/builtins will be used.

zahiraam added 2 commits October 23, 2024 08:38

Add support for -ftarget-prec-div/sqrt options.

f8caf83

Added fast-math run lines to LIT tests.

00ffb5a

zahiraam requested a review from mdtoguchi October 23, 2024 19:11

zahiraam temporarily deployed to WindowsCILock October 23, 2024 19:12 — with GitHub Actions Inactive

zahiraam requested review from jcranmer-intel and gmlueck October 23, 2024 19:12

zahiraam temporarily deployed to WindowsCILock October 23, 2024 20:34 — with GitHub Actions Inactive

Renamed the options accordingly.

795dd38

zahiraam changed the title ~~Add support for -ftarget-prec-div/sqrt options.~~ Add support for -foffload-fp32-prec-div/sqrt options. Oct 24, 2024

zahiraam had a problem deploying to WindowsCILock October 24, 2024 15:09 — with GitHub Actions Error

Fix format.

78a9005

zahiraam temporarily deployed to WindowsCILock October 24, 2024 15:21 — with GitHub Actions Inactive

zahiraam temporarily deployed to WindowsCILock October 24, 2024 17:29 — with GitHub Actions Inactive

Changed the place where the options are added in order for the options

50e71c0

to be applied to OpenMP too.

zahiraam marked this pull request as ready for review October 28, 2024 17:25

zahiraam requested review from a team as code owners October 28, 2024 17:25

zahiraam temporarily deployed to WindowsCILock October 28, 2024 17:26 — with GitHub Actions Inactive

zahiraam had a problem deploying to WindowsCILock October 28, 2024 19:51 — with GitHub Actions Error

Fix format.

54f2409

zahiraam temporarily deployed to WindowsCILock October 28, 2024 21:34 — with GitHub Actions Inactive

zahiraam temporarily deployed to WindowsCILock October 29, 2024 00:53 — with GitHub Actions Inactive

zahiraam changed the title ~~Add support for -foffload-fp32-prec-div/sqrt options.~~ [SYCL] Add support for -foffload-fp32-prec-div/sqrt options. Oct 29, 2024

mdtoguchi reviewed Oct 29, 2024

View reviewed changes

clang/lib/Driver/ToolChains/Clang.cpp Outdated Show resolved Hide resolved

mdtoguchi reviewed Oct 29, 2024

View reviewed changes

Addresed review comments.

bdf78d7

zahiraam temporarily deployed to WindowsCILock October 29, 2024 20:24 — with GitHub Actions Inactive

elizabethandrews reviewed Oct 29, 2024

View reviewed changes

zahiraam temporarily deployed to WindowsCILock October 29, 2024 21:42 — with GitHub Actions Inactive

Put the code to handle the options in RenderFloatingPointOptions

8cd6d8b

function instead of adding a JobAction to handle it.

zahiraam had a problem deploying to WindowsCILock November 21, 2024 16:24 — with GitHub Actions Failure

MrSidims self-requested a review November 21, 2024 17:27

zahiraam temporarily deployed to WindowsCILock November 21, 2024 18:08 — with GitHub Actions Inactive

MrSidims reviewed Nov 21, 2024

View reviewed changes

Renamed div to fdiv to avoid confusion.

f2fb8b2

zahiraam temporarily deployed to WindowsCILock November 22, 2024 13:33 — with GitHub Actions Inactive

zahiraam temporarily deployed to WindowsCILock November 22, 2024 14:53 — with GitHub Actions Inactive

MrSidims approved these changes Nov 24, 2024

View reviewed changes

This is an attempt to fix the DeviceLib/cmath_test.cpp issue.

83c9b31

zahiraam requested a review from a team as a code owner November 25, 2024 16:10

zahiraam had a problem deploying to WindowsCILock November 25, 2024 16:11 — with GitHub Actions Failure

zahiraam temporarily deployed to WindowsCILock November 25, 2024 17:57 — with GitHub Actions Inactive

Removing the latest change that attempted to fix the LIT issue.

0efc825

zahiraam temporarily deployed to WindowsCILock December 2, 2024 16:10 — with GitHub Actions Inactive

zahiraam temporarily deployed to WindowsCILock December 2, 2024 18:52 — with GitHub Actions Inactive

Merge remote-tracking branch 'origin/sycl' into TargetPrecOption

e1de775

zahiraam had a problem deploying to WindowsCILock January 15, 2025 14:21 — with GitHub Actions Failure

zahiraam added 2 commits January 15, 2025 13:26

Merge remote-tracking branch 'origin/sycl' into TargetPrecOption

34f07cc

Fix sync error.

e18930f

zahiraam temporarily deployed to WindowsCILock January 16, 2025 18:34 — with GitHub Actions Inactive

zahiraam temporarily deployed to WindowsCILock January 16, 2025 19:52 — with GitHub Actions Inactive

MrSidims mentioned this pull request Jan 21, 2025

[SYCL][NVPTX] Set default fdiv and sqrt for llvm.fpbuiltin #16714

Merged

Merge remote-tracking branch 'origin/sycl' into TargetPrecOption

410856d

zahiraam temporarily deployed to WindowsCILock January 30, 2025 18:32 — with GitHub Actions Inactive

zahiraam temporarily deployed to WindowsCILock January 30, 2025 19:57 — with GitHub Actions Inactive

martygrant merged commit 5823125 into intel:sycl Jan 31, 2025
16 checks passed

KseniyaTikhomirova mentioned this pull request Feb 12, 2025

Verification results differ across vendors' GPUs #16636

Closed

MrSidims reviewed Feb 17, 2025

View reviewed changes

[SYCL] Add support for -foffload-fp32-prec-div/sqrt options. #15836

[SYCL] Add support for -foffload-fp32-prec-div/sqrt options. #15836

Uh oh!

Conversation

zahiraam commented Oct 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elizabethandrews left a comment

Choose a reason for hiding this comment

Uh oh!

MrSidims left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zahiraam commented Nov 25, 2024

Uh oh!

zahiraam commented Jan 31, 2025

Uh oh!

Uh oh!

MrSidims Feb 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zahiraam commented Oct 23, 2024 •

edited

Loading

MrSidims Feb 17, 2025 •

edited

Loading