-
Notifications
You must be signed in to change notification settings - Fork 13.3k
[OpenMP] Fix target data region codegen being omitted for device pass #85218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-flang-openmp Author: Akash Banerjee (TIFitis) ChangesFull diff: https://github.com/llvm/llvm-project/pull/85218.diff 2 Files Affected:
diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index f6cf358119fb71..bd58f9ebba1c21 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -4791,8 +4791,11 @@ OpenMPIRBuilder::InsertPointTy OpenMPIRBuilder::createTargetData(
return InsertPointTy();
// Disable TargetData CodeGen on Device pass.
- if (Config.IsTargetDevice.value_or(false))
+ if (Config.IsTargetDevice.value_or(false)){
+ if(BodyGenCB)
+ Builder.restoreIP(BodyGenCB(Builder.saveIP(), BodyGenTy::NoPriv));
return Builder.saveIP();
+ }
Builder.restoreIP(CodeGenIP);
bool IsStandAlone = !BodyGenCB;
diff --git a/openmp/libomptarget/test/offloading/fortran/target-nested-target-data.f90 b/openmp/libomptarget/test/offloading/fortran/target-nested-target-data.f90
new file mode 100644
index 00000000000000..5508690c82ddd9
--- /dev/null
+++ b/openmp/libomptarget/test/offloading/fortran/target-nested-target-data.f90
@@ -0,0 +1,31 @@
+! Basic offloading test of arrays with provided lower
+! and upper bounds as specified by OpenMP's sectioning
+! REQUIRES: flang
+! UNSUPPORTED: nvptx64-nvidia-cuda-LTO
+! UNSUPPORTED: aarch64-unknown-linux-gnu
+! UNSUPPORTED: aarch64-unknown-linux-gnu-LTO
+! UNSUPPORTED: x86_64-pc-linux-gnu
+! UNSUPPORTED: x86_64-pc-linux-gnu-LTO
+
+! RUN: %libomptarget-compile-fortran-run-and-check-generic
+program main
+ integer :: A(10), B(10), C(10)
+
+ do I = 1, 10
+ A(I) = 1
+ B(I) = 2
+ end do
+ !$omp target data map(to: A, B) map(alloc: C)
+ !$omp target map(from: C)
+ do I = 1, 10
+ C(I) = A(I) + B(I) ! assigns 3, A:1 + B:2
+ end do
+ !$omp end target
+ !$omp target update from(C) ! updates C device -> host
+ !$omp end target data
+
+ print *, C ! should be all 3's
+
+end program
+
+! CHECK: 3 3 3 3 3 3 3 3 3 3
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM, minus the code formatter issues! Please do wait on a secondary review though. It would also be very cool to have another runtime test or two as well, just to see if we can dig up any other issues if they exist or get a little more code coverage. But that's only if you wish to do so, I imagine there's other more critical things to be done elsewhere.
Thanks Akash, it looks ok for me as well, but I'm missing one OMPIRBuilder test to check not just the overall behavior but also the IR produced (llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp). |
@skatrak I've added a test to the OpenMPIRBuilderTest file. Let me know if it's how you wanted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Akash, LGTM. Could you add a brief commit description before you merge?
Could you elaborate on this? What code associated with the target data is required on the device? That doesn't connect in my mind and I'm worried this hides a different problem. |
Take the following test:
We don't need to emit any code for target data on the device pass, but the target directive nested inside it needs device pass codegen. By not having the BodyGenCB callback, we were omitting this codegen which this patch brings back. Note that both Clang and Flang don't actually emit any code for the target data region unless it is target code. The following example still doesn't produce any code for the device pass related to target data or it's nested region.
|
This seems to be yet another manifestation of this problem: #85239, #84611. It appears the workaround implemented in this patch wouldn't be needed (at least for Flang) after solving the problem of skipping host operations during device compilation while still processing target regions nested inside them. |
…llvm#85218) This patch enables the BodyCodeGen callback to still trigger for the TargetData nested region during the device pass. There maybe Target code nested within the TargetData region for which this is required. Also add tests for the same.
I think I agree with @skatrak, this looks like a narrow workaround. |
This patch enables the BodyCodeGen callback to still trigger for the TargetData nested region during the device pass. There maybe Target code nested within the TargetData region for which this is required.
Also add tests for the same.