-
Notifications
You must be signed in to change notification settings - Fork 738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] Add support for -foffload-fp32-prec-div/sqrt options. #15836
base: sycl
Are you sure you want to change the base?
Conversation
to be applied to OpenMP too.
if (!strcmp(A->getValue(), "fast")) { | ||
CmdArgs.push_back("-fno-offload-fp32-prec-div"); | ||
CmdArgs.push_back("-fno-offload-fp32-prec-sqrt"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we allow users to override with -foffload-fp32-prec-div|sqrt
?
if (!strcmp(A->getValue(), "fast")) { | |
CmdArgs.push_back("-fno-offload-fp32-prec-div"); | |
CmdArgs.push_back("-fno-offload-fp32-prec-sqrt"); | |
} | |
if (!strcmp(A->getValue(), "fast")) { | |
if (!Args.hasFlag(option::OPT_foffload_fp32_prec_div, | |
option::OPT_fno_offload_fp32_prec_div, false)) | |
CmdArgs.push_back("-fno-offload-fp32-prec-div"); | |
if (!Args.hasFlag(option::OPT_foffload_fp32_prec_sqrt, | |
option::OPT_fno_offload_fp32_prec_sqrt, false)) | |
CmdArgs.push_back("-fno-offload-fp32-prec-sqrt"); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure. I would think that users could choose to compile with:
clang -fsycl -ffp-model=fast -foffload-fp32-prec-sqrt hello.cpp
or:
clang -fsycl -foffload-fp32-prec-sqrt -ffp-model=fast hello.cpp
These shouldn't give the same result. In the first one, the sqrt results are precise. In the second one, they are rounded.
I think that's just following the last command wins rule. In which case we need a compilated process here to find the order in which the options interact with one another.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... If that's the case we may want to integrate the logic into where all of the other FP model options are being manipulated in the larger for loop here:
llvm/clang/lib/Driver/ToolChains/Clang.cpp
Line 2994 in bdf78d7
static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D, |
-cc1
option under the IsDeviceOffloading
condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay and that would work for OpenMP too!
if (Args.getLastArg(options::OPT_fno_offload_fp32_prec_div)) | ||
CmdArgs.push_back("-fno-offload-fp32-prec-div"); | ||
else | ||
CmdArgs.push_back("-foffload-fp32-prec-div"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (Args.getLastArg(options::OPT_fno_offload_fp32_prec_div)) | |
CmdArgs.push_back("-fno-offload-fp32-prec-div"); | |
else | |
CmdArgs.push_back("-foffload-fp32-prec-div"); | |
if (!Args.hasFlag(option::OPT_foffload_fp32_prec_div, | |
option::OPT_fno_offload_fp32_prec_div, true)) | |
CmdArgs.push_back("-fno-offload-fp32-prec-div"); |
Since -foffload-fp32-prec-div
is default
if (Args.getLastArg(options::OPT_fno_offload_fp32_prec_sqrt)) | ||
CmdArgs.push_back("-fno-offload-fp32-prec-sqrt"); | ||
else | ||
CmdArgs.push_back("-foffload-fp32-prec-sqrt"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar comment to above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@premanandrao can you review this please?
function instead of adding a JobAction to handle it.
OPTION(OffloadFp32PrecDiv, bool, 1, ComplexRange) | ||
OPTION(OffloadFp32PrecSqrt, bool, 1, OffloadFp32PrecDiv) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OPTION(OffloadFp32PrecDiv, bool, 1, ComplexRange) | |
OPTION(OffloadFp32PrecSqrt, bool, 1, OffloadFp32PrecDiv) | |
OPTION(OffloadFP32PrecDiv, bool, 1, ComplexRange) | |
OPTION(OffloadFP32PrecSqrt, bool, 1, OffloadFP32PrecDiv) |
LANGOPT(OffloadFp32PrecDiv, 1, 1, "Return correctly rounded results of fdiv") | ||
LANGOPT(OffloadFp32PrecSqrt, 1, 1, "Return correctly rounded results of sqrt") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LANGOPT(OffloadFp32PrecDiv, 1, 1, "Return correctly rounded results of fdiv") | |
LANGOPT(OffloadFp32PrecSqrt, 1, 1, "Return correctly rounded results of sqrt") | |
LANGOPT(OffloadFP32PrecDiv, 1, 1, "Return correctly rounded results of fdiv") | |
LANGOPT(OffloadFP32PrecSqrt, 1, 1, "Return correctly rounded results of sqrt") |
@@ -1157,6 +1157,22 @@ defm cx_fortran_rules: BoolOptionWithoutMarshalling<"f", "cx-fortran-rules", | |||
NegFlag<SetFalse, [], [ClangOption, CC1Option], "Range reduction is disabled " | |||
"for complex arithmetic operations">>; | |||
|
|||
defm offload_fp32_prec_div: BoolOption<"f", "offload-fp32-prec-div", | |||
LangOpts<"OffloadFp32PrecDiv">, DefaultTrue, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LangOpts<"OffloadFp32PrecDiv">, DefaultTrue, | |
LangOpts<"OffloadFP32PrecDiv">, DefaultTrue, |
Group<f_Group>; | ||
|
||
defm offload_fp32_prec_sqrt: BoolOption<"f", "offload-fp32-prec-sqrt", | ||
LangOpts<"OffloadFp32PrecSqrt">, DefaultTrue, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LangOpts<"OffloadFp32PrecSqrt">, DefaultTrue, | |
LangOpts<"OffloadFP32PrecSqrt">, DefaultTrue, |
clang/lib/CodeGen/CGBuiltin.cpp
Outdated
!LangOpts.FPAccuracyVal.empty() || !LangOpts.OffloadFp32PrecDiv || | ||
!LangOpts.OffloadFp32PrecSqrt) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
!LangOpts.FPAccuracyVal.empty() || !LangOpts.OffloadFp32PrecDiv || | |
!LangOpts.OffloadFp32PrecSqrt) { | |
!LangOpts.FPAccuracyVal.empty() || !LangOpts.OffloadFP32PrecDiv || | |
!LangOpts.OffloadFP32PrecSqrt) { |
// DEFINE: -fsycl-is-device -emit-llvm -triple spirv32-unknown-unknown | ||
|
||
// DEFINE: %{common_opts_spirv64} = -internal-isystem %S/Inputs \ | ||
// DEFINE: -fsycl-is-device -emit-llvm -triple spirv32-unknown-unknown |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// DEFINE: -fsycl-is-device -emit-llvm -triple spirv32-unknown-unknown | |
// DEFINE: -fsycl-is-device -emit-llvm -triple spirv64-unknown-unknown |
// DEFINE: -fsycl-is-device -emit-llvm -triple spirv32-unknown-unknown | ||
|
||
// DEFINE: %{common_opts_spir} = -internal-isystem %S/Inputs \ | ||
// DEFINE: -fsycl-is-device -emit-llvm -triple spirv32-unknown-unknown |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// DEFINE: -fsycl-is-device -emit-llvm -triple spirv32-unknown-unknown | |
// DEFINE: -fsycl-is-device -emit-llvm -triple spir32-unknown-unknown |
// DEFINE: -fsycl-is-device -emit-llvm -triple spirv32-unknown-unknown | ||
|
||
// DEFINE: %{common_opts_spir64} = -internal-isystem %S/Inputs \ | ||
// DEFINE: -fsycl-is-device -emit-llvm -triple spirv32-unknown-unknown |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// DEFINE: -fsycl-is-device -emit-llvm -triple spirv32-unknown-unknown | |
// DEFINE: -fsycl-is-device -emit-llvm -triple spir64-unknown-unknown |
clang/lib/CodeGen/CGCall.cpp
Outdated
if (Name == "fdiv" && !getLangOpts().OffloadFp32PrecDiv) | ||
FPAccuracyVal = "2.5"; | ||
if (!FPAccuracyVal.empty()) | ||
FuncAttrs.addAttribute("fpbuiltin-max-error", FPAccuracyVal); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is the combination supposed to work? If the condition in 1894 was true, would two fpbuiltin-max-error attributes get added? Once in 1898 and again in 1907?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the condition in 1894 is satisfied, then the FuncAttrs.size() != 0)
; we will not get into this code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, maybe I wasn't clear before. Let me type out what I am asking:
if (FuncAttrs.attrs().size() == 0) {
StringRef FPAccuracyVal;
if (!getLangOpts().FPAccuracyVal.empty()) {
...
FPAccuracyVal = llvm::fp::getAccuracyForFPBuiltin(...);
FuncAttrs.addAttribute("fpbuiltin-max-error", FPAccuracyVal); // #Attr here 1
...
}
if (Name == "sqrt" && !getLangOpts().OffloadFp32PrecSqrt)
FPAccuracyVal = "3.0";
if (Name == "fdiv" && !getLangOpts().OffloadFp32PrecDiv)
FPAccuracyVal = "2.5";
if (!FPAccuracyVal.empty())
FuncAttrs.addAttribute("fpbuiltin-max-error", FPAccuracyVal); // #Attr here 2
Couldn't you get into the size == 0
block, set FPAccuracyVal, add the attribute (#1), and if name is one of sqrt
or fdiv
, set FPAccuracyVal again, and then add the attribute again (#2)?
Is this combination supposed to work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh! good catch. I think this will fix it.
@@ -1781,7 +1781,6 @@ void Clang::RenderTargetOptions(const llvm::Triple &EffectiveTriple, | |||
switch (TC.getArch()) { | |||
default: | |||
break; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inadvertent line removal?
clang/lib/Driver/ToolChains/Clang.h
Outdated
@@ -55,6 +55,9 @@ class LLVM_LIBRARY_VISIBILITY Clang : public Tool { | |||
const llvm::opt::ArgList &Args, | |||
llvm::opt::ArgStringList &CmdArgs, | |||
bool KernelOrKext) const; | |||
void AddSPIRTargetArgs(const llvm::opt::ArgList &Args, | |||
llvm::opt::ArgStringList &CmdArgs, const JobAction &JA, | |||
const Driver &D) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes not needed anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it should go to this PR (guess it should). The feature uses SPV_INTEL_fp_max_error SPIR-V extension. Currently it's enabled only for AOT CPU compilation. So now we should also enable it when the options are passed.
@MrSidims Are you saying these new options should only be enabled with |
Thanks @mdtoguchi for the explanation. I will add the extension to this PR. |
@zahiraam I believe https://github.com/intel/llvm/blob/sycl/clang/lib/Driver/ToolChains/Clang.cpp#L10742 should be changed to something like: |
// RUN: %clang -target x86_64-unknown-linux-gnu -fsycl --no-offload-new-driver -fsycl-targets=spir64_x86_64-unknown-unknown -fno-offload-fp32-prec-div -fno-offload-fp32-prec-sqrt %s -### 2>&1 \ | ||
// RUN: | FileCheck %s -check-prefixes=CHECK-CPU | ||
// RUN: %clang -target x86_64-unknown-linux-gnu -fsycl --no-offload-new-driver -fsycl-targets=spir64_x86_64-unknown-unknown -foffload-fp32-prec-sqrt %s -### 2>&1 \ | ||
// RUN: | FileCheck %s -check-prefixes=CHECK-CPU-NFPME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To reduce the duplication of all of the extensions, can you do something like: -check-prefixes=CHECK-CPU,CHECK-CPU-NFPME
Here, CHECK-CPU
does not add the fp_max_error, but is rather checked with a CHECK-CPU-NFPME-NOT
@@ -129,3 +186,110 @@ | |||
// CHECK-CPU-SAME:,+SPV_KHR_non_semantic_info | |||
// CHECK-CPU-SAME:,+SPV_KHR_cooperative_matrix | |||
// CHECK-CPU-SAME:,+SPV_INTEL_fp_max_error" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// CHECK-CPU-SAME:,+SPV_INTEL_fp_max_error" | |
// CHECK-CPU-FPME:,+SPV_INTEL_fp_max_error" |
To match the suggested change with -check-prefixes
above.
// CHECK-CPU-NFPME: llvm-spirv{{.*}}"-spirv-allow-unknown-intrinsics=llvm.genx.,llvm.fpbuiltin" | ||
// CHECK-CPU-NFPME-SAME: {{.*}}"-spirv-ext=-all | ||
// CHECK-CPU-NFPME-SAME:,+SPV_EXT_shader_atomic_float_add | ||
// CHECK-CPU-NFPME-SAME:,+SPV_EXT_shader_atomic_float_min_max | ||
// CHECK-CPU-NFPME-SAME:,+SPV_KHR_no_integer_wrap_decoration,+SPV_KHR_float_controls | ||
// CHECK-CPU-NFPME-SAME:,+SPV_KHR_expect_assume,+SPV_KHR_linkonce_odr | ||
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_subgroups,+SPV_INTEL_media_block_io | ||
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_device_side_avc_motion_estimation | ||
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_fpga_loop_controls | ||
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_unstructured_loop_controls,+SPV_INTEL_fpga_reg | ||
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_blocking_pipes,+SPV_INTEL_function_pointers | ||
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_kernel_attributes,+SPV_INTEL_io_pipes | ||
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_inline_assembly,+SPV_INTEL_arbitrary_precision_integers | ||
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_float_controls2 | ||
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_vector_compute,+SPV_INTEL_fast_composite | ||
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_arbitrary_precision_fixed_point | ||
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_arbitrary_precision_floating_point | ||
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_variable_length_array,+SPV_INTEL_fp_fast_math_mode | ||
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_long_constant_composite | ||
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_arithmetic_fence | ||
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_cache_controls | ||
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_fpga_buffer_location | ||
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_fpga_argument_interfaces | ||
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_fpga_invocation_pipelining_attributes | ||
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_fpga_latency_control | ||
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_task_sequence | ||
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_token_type | ||
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_bfloat16_conversion | ||
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_joint_matrix | ||
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_hw_thread_queries | ||
// CHECK-CPU-NFPME-SAME:,+SPV_KHR_uniform_group_instructions | ||
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_masked_gather_scatter | ||
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_tensor_float32_conversion | ||
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_optnone | ||
// CHECK-CPU-NFPME-SAME:,+SPV_KHR_non_semantic_info | ||
// CHECK-CPU-NFPME-SAME:,+SPV_KHR_cooperative_matrix | ||
// CHECK-CPU-NFPME-NOT:,+SPV_INTEL_fp_max_error" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// CHECK-CPU-NFPME: llvm-spirv{{.*}}"-spirv-allow-unknown-intrinsics=llvm.genx.,llvm.fpbuiltin" | |
// CHECK-CPU-NFPME-SAME: {{.*}}"-spirv-ext=-all | |
// CHECK-CPU-NFPME-SAME:,+SPV_EXT_shader_atomic_float_add | |
// CHECK-CPU-NFPME-SAME:,+SPV_EXT_shader_atomic_float_min_max | |
// CHECK-CPU-NFPME-SAME:,+SPV_KHR_no_integer_wrap_decoration,+SPV_KHR_float_controls | |
// CHECK-CPU-NFPME-SAME:,+SPV_KHR_expect_assume,+SPV_KHR_linkonce_odr | |
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_subgroups,+SPV_INTEL_media_block_io | |
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_device_side_avc_motion_estimation | |
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_fpga_loop_controls | |
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_unstructured_loop_controls,+SPV_INTEL_fpga_reg | |
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_blocking_pipes,+SPV_INTEL_function_pointers | |
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_kernel_attributes,+SPV_INTEL_io_pipes | |
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_inline_assembly,+SPV_INTEL_arbitrary_precision_integers | |
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_float_controls2 | |
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_vector_compute,+SPV_INTEL_fast_composite | |
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_arbitrary_precision_fixed_point | |
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_arbitrary_precision_floating_point | |
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_variable_length_array,+SPV_INTEL_fp_fast_math_mode | |
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_long_constant_composite | |
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_arithmetic_fence | |
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_cache_controls | |
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_fpga_buffer_location | |
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_fpga_argument_interfaces | |
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_fpga_invocation_pipelining_attributes | |
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_fpga_latency_control | |
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_task_sequence | |
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_token_type | |
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_bfloat16_conversion | |
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_joint_matrix | |
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_hw_thread_queries | |
// CHECK-CPU-NFPME-SAME:,+SPV_KHR_uniform_group_instructions | |
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_masked_gather_scatter | |
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_tensor_float32_conversion | |
// CHECK-CPU-NFPME-SAME:,+SPV_INTEL_optnone | |
// CHECK-CPU-NFPME-SAME:,+SPV_KHR_non_semantic_info | |
// CHECK-CPU-NFPME-SAME:,+SPV_KHR_cooperative_matrix | |
// CHECK-CPU-NFPME-NOT:,+SPV_INTEL_fp_max_error" | |
// CHECK-CPU-NFPME-NOT:,+SPV_INTEL_fp_max_error" |
To match up with the -check-prefixes
suggestion above.
// CHECK-FPGA-HW-FPME-SAME:,+SPV_INTEL_fpga_dsp_control | ||
// CHECK-FPGA-HW-FPME-SAME:,+SPV_INTEL_fpga_memory_accesses | ||
// CHECK-FPGA-HW-FPME-SAME:,+SPV_INTEL_fpga_memory_attributes | ||
// CHECK-FPGA-HW-FPME-SAME:,+SPV_INTEL_fp_max_error" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar changes here as suggested above to reduce string redundancy.
if (IsCPU && hasNoOffloadFP32PrecOption(TCArgs) || | ||
!IsCPU && shouldUseOffloadFP32PrecOption(TCArgs)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (IsCPU && hasNoOffloadFP32PrecOption(TCArgs) || | |
!IsCPU && shouldUseOffloadFP32PrecOption(TCArgs)) { | |
if ((IsCPU && hasNoOffloadFP32PrecOption(TCArgs)) || | |
shouldUseOffloadFP32PrecOption(TCArgs)) { |
I believe the option settings should always trigger regardless if doing AOT for CPU.
// DEFINE: -fsycl-is-device -emit-llvm -triple spirv-unknown-unknown | ||
|
||
// DEFINE: %{common_opts_spir64} = -internal-isystem %S/Inputs \ | ||
// DEFINE: -fsycl-is-device -emit-llvm -triple spirv64-unknown-unknown |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
common_opts_spir64
seems identical to common_opts_spirv64
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SPV_INTEL_fp_max_error related changes LGTM
} | ||
}; | ||
|
||
auto ParseFPAccOption = [&](StringRef Val, bool &NoOffloadFlag) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto ParseFPAccOption = [&](StringRef Val, bool &NoOffloadFlag) { | |
auto parseFPAccOption = [&](StringRef Val, bool &NoOffloadFlag) { |
Function naming should start with lowercase.
Add support for options
-f[no]-offload-fp32-prec-div
and-f[no-]-offload-fp32-prec-sqrt
.These options are added to allow users to control whether
fdiv
andsqrt
operations in offload device code are required to return correctly rounded results. In order to communicate this to the device code, we need the front end to generate IR that reflects the choice.When the correctly rounded setting is used, we can just generate the
fdiv
instruction andllvm.sqrt
intrinsic, because these operations are required to be correctly rounded by default in LLVM IR.When the result is not required to be correctly rounded, the front end should generate a call to the
llvm.fpbuiltin.fdiv
orllvm.fpbuiltin.sqrt
intrinsic with thefpbuiltin-max-error
attribute set. For single precisionfdiv
, the setting should be2.5
. For single-precision sqrt, the setting should be3.0
.If the -ffp-accuracy option is used, we should issue warnings if the settings conflict with an explicitly set
-foffload-fp32-prec-div
or-foffload-fp32-prec-sqrt
option.