[MLIR][OpenMP] Skip host omp ops when compiling for the target device #85239

jsjodin · 2024-03-14T15:06:37Z

This patch separates the lowering dispatch for host and target devices. For the target device, if the current operation is not a top-level operation (e.g. omp.target) or is inside a target device code region it will be ignored, since it belongs to the host code.

This is an alternative approach to #84611, the new test in this PR was taken from there.

llvmbot · 2024-03-14T15:07:09Z

@llvm/pr-subscribers-flang-openmp
@llvm/pr-subscribers-mlir
@llvm/pr-subscribers-mlir-llvm

@llvm/pr-subscribers-mlir-openmp

Author: Jan Leyonberg (jsjodin)

Changes

This patch separates the lowering dispatch for host and target devices. For the target device, if the current operation is not a top-level operation (e.g. omp.target) or is inside a target device code region it will be ignored, since it belongs to the host code.

This is an alternative approach to #84611, the new test in this PR was taken from there.

Full diff: https://github.com/llvm/llvm-project/pull/85239.diff

6 Files Affected:

(modified) mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+182-111)
(modified) mlir/test/Target/LLVMIR/omptarget-parallel-wsloop.mlir (+3-3)
(modified) mlir/test/Target/LLVMIR/omptarget-teams-llvm.mlir (+1-1)
(modified) mlir/test/Target/LLVMIR/omptarget-wsloop-collapsed.mlir (+1-1)
(modified) mlir/test/Target/LLVMIR/omptarget-wsloop.mlir (+2-2)
(added) mlir/test/Target/LLVMIR/openmp-tark-target-device.mlir (+27)

diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
index bef227f2c583ed..9ad0b99078005b 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
@@ -2922,6 +2922,178 @@ convertDeclareTargetAttr(Operation *op, mlir::omp::DeclareTargetAttr attribute,
   return success();
 }
 
+static bool isInternalTargetDeviceOp(Operation *op) {
+  // Assumes no reverse offloading
+  if (op->getParentOfType<omp::TargetOp>())
+    return true;
+
+  auto parentFn = op->getParentOfType<LLVM::LLVMFuncOp>();
+  if (auto declareTargetIface =
+          llvm::dyn_cast<mlir::omp::DeclareTargetInterface>(
+              parentFn.getOperation()))
+    if (declareTargetIface.isDeclareTarget() &&
+        declareTargetIface.getDeclareTargetDeviceType() !=
+            mlir::omp::DeclareTargetDeviceType::host)
+      return true;
+
+  return false;
+}
+
+/// Given an OpenMP MLIR operation, create the corresponding LLVM IR
+/// (including OpenMP runtime calls).
+static LogicalResult
+convertCommonOperation(Operation *op, llvm::IRBuilderBase &builder,
+                       LLVM::ModuleTranslation &moduleTranslation) {
+
+  llvm::OpenMPIRBuilder *ompBuilder = moduleTranslation.getOpenMPBuilder();
+
+  return llvm::TypeSwitch<Operation *, LogicalResult>(op)
+      .Case([&](omp::BarrierOp) {
+        ompBuilder->createBarrier(builder.saveIP(), llvm::omp::OMPD_barrier);
+        return success();
+      })
+      .Case([&](omp::TaskwaitOp) {
+        ompBuilder->createTaskwait(builder.saveIP());
+        return success();
+      })
+      .Case([&](omp::TaskyieldOp) {
+        ompBuilder->createTaskyield(builder.saveIP());
+        return success();
+      })
+      .Case([&](omp::FlushOp) {
+        // No support in Openmp runtime function (__kmpc_flush) to accept
+        // the argument list.
+        // OpenMP standard states the following:
+        //  "An implementation may implement a flush with a list by ignoring
+        //   the list, and treating it the same as a flush without a list."
+        //
+        // The argument list is discarded so that, flush with a list is treated
+        // same as a flush without a list.
+        ompBuilder->createFlush(builder.saveIP());
+        return success();
+      })
+      .Case([&](omp::ParallelOp op) {
+        return convertOmpParallel(op, builder, moduleTranslation);
+      })
+      .Case([&](omp::ReductionOp reductionOp) {
+        return convertOmpReductionOp(reductionOp, builder, moduleTranslation);
+      })
+      .Case([&](omp::MasterOp) {
+        return convertOmpMaster(*op, builder, moduleTranslation);
+      })
+      .Case([&](omp::CriticalOp) {
+        return convertOmpCritical(*op, builder, moduleTranslation);
+      })
+      .Case([&](omp::OrderedRegionOp) {
+        return convertOmpOrderedRegion(*op, builder, moduleTranslation);
+      })
+      .Case([&](omp::OrderedOp) {
+        return convertOmpOrdered(*op, builder, moduleTranslation);
+      })
+      .Case([&](omp::WsLoopOp) {
+        return convertOmpWsLoop(*op, builder, moduleTranslation);
+      })
+      .Case([&](omp::SimdLoopOp) {
+        return convertOmpSimdLoop(*op, builder, moduleTranslation);
+      })
+      .Case([&](omp::AtomicReadOp) {
+        return convertOmpAtomicRead(*op, builder, moduleTranslation);
+      })
+      .Case([&](omp::AtomicWriteOp) {
+        return convertOmpAtomicWrite(*op, builder, moduleTranslation);
+      })
+      .Case([&](omp::AtomicUpdateOp op) {
+        return convertOmpAtomicUpdate(op, builder, moduleTranslation);
+      })
+      .Case([&](omp::AtomicCaptureOp op) {
+        return convertOmpAtomicCapture(op, builder, moduleTranslation);
+      })
+      .Case([&](omp::SectionsOp) {
+        return convertOmpSections(*op, builder, moduleTranslation);
+      })
+      .Case([&](omp::SingleOp op) {
+        return convertOmpSingle(op, builder, moduleTranslation);
+      })
+      .Case([&](omp::TeamsOp op) {
+        return convertOmpTeams(op, builder, moduleTranslation);
+      })
+      .Case([&](omp::TaskOp op) {
+        return convertOmpTaskOp(op, builder, moduleTranslation);
+      })
+      .Case([&](omp::TaskGroupOp op) {
+        return convertOmpTaskgroupOp(op, builder, moduleTranslation);
+      })
+      .Case<omp::YieldOp, omp::TerminatorOp, omp::ReductionDeclareOp,
+            omp::CriticalDeclareOp>([](auto op) {
+        // `yield` and `terminator` can be just omitted. The block structure
+        // was created in the region that handles their parent operation.
+        // `reduction.declare` will be used by reductions and is not
+        // converted directly, skip it.
+        // `critical.declare` is only used to declare names of critical
+        // sections which will be used by `critical` ops and hence can be
+        // ignored for lowering. The OpenMP IRBuilder will create unique
+        // name for critical section names.
+        return success();
+      })
+      .Case([&](omp::ThreadprivateOp) {
+        return convertOmpThreadprivate(*op, builder, moduleTranslation);
+      })
+      .Case<omp::DataOp, omp::EnterDataOp, omp::ExitDataOp, omp::UpdateDataOp>(
+          [&](auto op) {
+            return convertOmpTargetData(op, builder, moduleTranslation);
+          })
+      .Case([&](omp::TargetOp) {
+        return convertOmpTarget(*op, builder, moduleTranslation);
+      })
+      .Case<omp::MapInfoOp, omp::DataBoundsOp, omp::PrivateClauseOp>(
+          [&](auto op) {
+            // No-op, should be handled by relevant owning operations e.g.
+            // TargetOp, EnterDataOp, ExitDataOp, DataOp etc. and then
+            // discarded
+            return success();
+          })
+      .Default([&](Operation *inst) {
+        return inst->emitError("unsupported OpenMP operation: ")
+               << inst->getName();
+      });
+}
+
+static LogicalResult
+convertInternalTargetOp(Operation *op, llvm::IRBuilderBase &builder,
+                        LLVM::ModuleTranslation &moduleTranslation) {
+  return convertCommonOperation(op, builder, moduleTranslation);
+}
+
+static LogicalResult
+convertTopLevelTargetOp(Operation *op, llvm::IRBuilderBase &builder,
+                        LLVM::ModuleTranslation &moduleTranslation) {
+  return llvm::TypeSwitch<Operation *, LogicalResult>(op)
+      .Case<omp::DataOp, omp::EnterDataOp, omp::ExitDataOp, omp::UpdateDataOp>(
+          [&](auto op) {
+            return convertOmpTargetData(op, builder, moduleTranslation);
+          })
+      .Case([&](omp::TargetOp) {
+        return convertOmpTarget(*op, builder, moduleTranslation);
+      })
+      // Skip omp ops that are not legal top level ops for the target device
+      .Case<omp::BarrierOp, omp::TaskwaitOp, omp::TaskyieldOp, omp::FlushOp,
+            omp::ParallelOp, omp::ReductionOp, omp::MasterOp, omp::CriticalOp,
+            omp::OrderedRegionOp, omp::OrderedOp, omp::WsLoopOp,
+            omp::SimdLoopOp, omp::AtomicReadOp, omp::AtomicWriteOp,
+            omp::AtomicUpdateOp, omp::AtomicCaptureOp, omp::SectionsOp,
+            omp::SingleOp, omp::TeamsOp, omp::TaskOp, omp::TaskGroupOp,
+            omp::YieldOp, omp::TerminatorOp, omp::ReductionDeclareOp,
+            omp::ThreadprivateOp, omp::DistributeOp, omp::MapInfoOp,
+            omp::DataBoundsOp, omp::CriticalDeclareOp>(
+          [&](auto op) {
+            return success();
+          })
+      .Default([&](Operation *inst) {
+        return inst->emitError("unsupported OpenMP operation: ")
+               << inst->getName();
+      });
+}
+
 namespace {
 
 /// Implementation of the dialect interface that converts operations belonging
@@ -2937,8 +3109,8 @@ class OpenMPDialectLLVMIRTranslationInterface
   convertOperation(Operation *op, llvm::IRBuilderBase &builder,
                    LLVM::ModuleTranslation &moduleTranslation) const final;
 
-  /// Given an OpenMP MLIR attribute, create the corresponding LLVM-IR, runtime
-  /// calls, or operation amendments
+  /// Given an OpenMP MLIR attribute, create the corresponding LLVM-IR,
+  /// runtime calls, or operation amendments
   LogicalResult
   amendOperation(Operation *op, ArrayRef<llvm::Instruction *> instructions,
                  NamedAttribute attribute,
@@ -3043,116 +3215,15 @@ LogicalResult OpenMPDialectLLVMIRTranslationInterface::convertOperation(
     LLVM::ModuleTranslation &moduleTranslation) const {
 
   llvm::OpenMPIRBuilder *ompBuilder = moduleTranslation.getOpenMPBuilder();
+  if (ompBuilder->Config.isTargetDevice()) {
+    if (isInternalTargetDeviceOp(op)) {
+      return convertInternalTargetOp(op, builder, moduleTranslation);
+    } else {
+      return convertTopLevelTargetOp(op, builder, moduleTranslation);
+    }
+  }
 
-  return llvm::TypeSwitch<Operation *, LogicalResult>(op)
-      .Case([&](omp::BarrierOp) {
-        ompBuilder->createBarrier(builder.saveIP(), llvm::omp::OMPD_barrier);
-        return success();
-      })
-      .Case([&](omp::TaskwaitOp) {
-        ompBuilder->createTaskwait(builder.saveIP());
-        return success();
-      })
-      .Case([&](omp::TaskyieldOp) {
-        ompBuilder->createTaskyield(builder.saveIP());
-        return success();
-      })
-      .Case([&](omp::FlushOp) {
-        // No support in Openmp runtime function (__kmpc_flush) to accept
-        // the argument list.
-        // OpenMP standard states the following:
-        //  "An implementation may implement a flush with a list by ignoring
-        //   the list, and treating it the same as a flush without a list."
-        //
-        // The argument list is discarded so that, flush with a list is treated
-        // same as a flush without a list.
-        ompBuilder->createFlush(builder.saveIP());
-        return success();
-      })
-      .Case([&](omp::ParallelOp op) {
-        return convertOmpParallel(op, builder, moduleTranslation);
-      })
-      .Case([&](omp::ReductionOp reductionOp) {
-        return convertOmpReductionOp(reductionOp, builder, moduleTranslation);
-      })
-      .Case([&](omp::MasterOp) {
-        return convertOmpMaster(*op, builder, moduleTranslation);
-      })
-      .Case([&](omp::CriticalOp) {
-        return convertOmpCritical(*op, builder, moduleTranslation);
-      })
-      .Case([&](omp::OrderedRegionOp) {
-        return convertOmpOrderedRegion(*op, builder, moduleTranslation);
-      })
-      .Case([&](omp::OrderedOp) {
-        return convertOmpOrdered(*op, builder, moduleTranslation);
-      })
-      .Case([&](omp::WsLoopOp) {
-        return convertOmpWsLoop(*op, builder, moduleTranslation);
-      })
-      .Case([&](omp::SimdLoopOp) {
-        return convertOmpSimdLoop(*op, builder, moduleTranslation);
-      })
-      .Case([&](omp::AtomicReadOp) {
-        return convertOmpAtomicRead(*op, builder, moduleTranslation);
-      })
-      .Case([&](omp::AtomicWriteOp) {
-        return convertOmpAtomicWrite(*op, builder, moduleTranslation);
-      })
-      .Case([&](omp::AtomicUpdateOp op) {
-        return convertOmpAtomicUpdate(op, builder, moduleTranslation);
-      })
-      .Case([&](omp::AtomicCaptureOp op) {
-        return convertOmpAtomicCapture(op, builder, moduleTranslation);
-      })
-      .Case([&](omp::SectionsOp) {
-        return convertOmpSections(*op, builder, moduleTranslation);
-      })
-      .Case([&](omp::SingleOp op) {
-        return convertOmpSingle(op, builder, moduleTranslation);
-      })
-      .Case([&](omp::TeamsOp op) {
-        return convertOmpTeams(op, builder, moduleTranslation);
-      })
-      .Case([&](omp::TaskOp op) {
-        return convertOmpTaskOp(op, builder, moduleTranslation);
-      })
-      .Case([&](omp::TaskGroupOp op) {
-        return convertOmpTaskgroupOp(op, builder, moduleTranslation);
-      })
-      .Case<omp::YieldOp, omp::TerminatorOp, omp::ReductionDeclareOp,
-            omp::CriticalDeclareOp>([](auto op) {
-        // `yield` and `terminator` can be just omitted. The block structure
-        // was created in the region that handles their parent operation.
-        // `reduction.declare` will be used by reductions and is not
-        // converted directly, skip it.
-        // `critical.declare` is only used to declare names of critical
-        // sections which will be used by `critical` ops and hence can be
-        // ignored for lowering. The OpenMP IRBuilder will create unique
-        // name for critical section names.
-        return success();
-      })
-      .Case([&](omp::ThreadprivateOp) {
-        return convertOmpThreadprivate(*op, builder, moduleTranslation);
-      })
-      .Case<omp::DataOp, omp::EnterDataOp, omp::ExitDataOp, omp::UpdateDataOp>(
-          [&](auto op) {
-            return convertOmpTargetData(op, builder, moduleTranslation);
-          })
-      .Case([&](omp::TargetOp) {
-        return convertOmpTarget(*op, builder, moduleTranslation);
-      })
-      .Case<omp::MapInfoOp, omp::DataBoundsOp, omp::PrivateClauseOp>(
-          [&](auto op) {
-            // No-op, should be handled by relevant owning operations e.g.
-            // TargetOp, EnterDataOp, ExitDataOp, DataOp etc. and then
-            // discarded
-            return success();
-          })
-      .Default([&](Operation *inst) {
-        return inst->emitError("unsupported OpenMP operation: ")
-               << inst->getName();
-      });
+  return convertCommonOperation(op, builder, moduleTranslation);
 }
 
 void mlir::registerOpenMPDialectTranslation(DialectRegistry &registry) {
diff --git a/mlir/test/Target/LLVMIR/omptarget-parallel-wsloop.mlir b/mlir/test/Target/LLVMIR/omptarget-parallel-wsloop.mlir
index 8ab50f05f07167..b0fe642238f14f 100644
--- a/mlir/test/Target/LLVMIR/omptarget-parallel-wsloop.mlir
+++ b/mlir/test/Target/LLVMIR/omptarget-parallel-wsloop.mlir
@@ -4,10 +4,10 @@
 // for nested omp do loop inside omp target region
 
 module attributes {dlti.dl_spec = #dlti.dl_spec<#dlti.dl_entry<"dlti.alloca_memory_space", 5 : ui32>>, llvm.data_layout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8", llvm.target_triple = "amdgcn-amd-amdhsa", omp.is_gpu = true, omp.is_target_device = true } {
-  llvm.func @target_parallel_wsloop(%arg0: !llvm.ptr) attributes {
+  llvm.func @target_parallel_wsloop(%arg0: !llvm.ptr) attributes {omp.declare_target = #omp.declaretarget<device_type = (any), capture_clause = (to)>,
     target_cpu = "gfx90a",
-    target_features = #llvm.target_features<["+gfx9-insts", "+wavefrontsize64"]>
-  } {
+    target_features = #llvm.target_features<["+gfx9-insts", "+wavefrontsize64"]>}
+   {
     omp.parallel {
       %loop_ub = llvm.mlir.constant(9 : i32) : i32
       %loop_lb = llvm.mlir.constant(0 : i32) : i32
diff --git a/mlir/test/Target/LLVMIR/omptarget-teams-llvm.mlir b/mlir/test/Target/LLVMIR/omptarget-teams-llvm.mlir
index 96cced7a1d584b..c5f89eb2c3274c 100644
--- a/mlir/test/Target/LLVMIR/omptarget-teams-llvm.mlir
+++ b/mlir/test/Target/LLVMIR/omptarget-teams-llvm.mlir
@@ -5,7 +5,7 @@
 
 module attributes {omp.is_target_device = true} {
   llvm.func @foo(i32)
-  llvm.func @omp_target_teams_shared_simple(%arg0 : i32)  {
+  llvm.func @omp_target_teams_shared_simple(%arg0 : i32)  attributes {omp.declare_target = #omp.declaretarget<device_type = (any), capture_clause = (to)>} {
     omp.teams {
       llvm.call @foo(%arg0) : (i32) -> ()
       omp.terminator
diff --git a/mlir/test/Target/LLVMIR/omptarget-wsloop-collapsed.mlir b/mlir/test/Target/LLVMIR/omptarget-wsloop-collapsed.mlir
index e246c551886cfa..0d77423abcb4f1 100644
--- a/mlir/test/Target/LLVMIR/omptarget-wsloop-collapsed.mlir
+++ b/mlir/test/Target/LLVMIR/omptarget-wsloop-collapsed.mlir
@@ -4,7 +4,7 @@
 // for nested omp do loop with collapse clause inside omp target region
 
 module attributes {dlti.dl_spec = #dlti.dl_spec<#dlti.dl_entry<"dlti.alloca_memory_space", 5 : ui32>>, llvm.data_layout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8", llvm.target_triple = "amdgcn-amd-amdhsa", omp.is_gpu = true, omp.is_target_device = true } {
-  llvm.func @target_collapsed_wsloop(%arg0: !llvm.ptr) {
+  llvm.func @target_collapsed_wsloop(%arg0: !llvm.ptr) attributes {omp.declare_target = #omp.declaretarget<device_type = (any), capture_clause = (to)>} {
     %loop_ub = llvm.mlir.constant(99 : i32) : i32
     %loop_lb = llvm.mlir.constant(0 : i32) : i32
     %loop_step = llvm.mlir.constant(1 : index) : i32
diff --git a/mlir/test/Target/LLVMIR/omptarget-wsloop.mlir b/mlir/test/Target/LLVMIR/omptarget-wsloop.mlir
index 220eb85b3483ec..0f3f503dfa5377 100644
--- a/mlir/test/Target/LLVMIR/omptarget-wsloop.mlir
+++ b/mlir/test/Target/LLVMIR/omptarget-wsloop.mlir
@@ -4,7 +4,7 @@
 // for nested omp do loop inside omp target region
 
 module attributes {dlti.dl_spec = #dlti.dl_spec<#dlti.dl_entry<"dlti.alloca_memory_space", 5 : ui32>>, llvm.data_layout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8", llvm.target_triple = "amdgcn-amd-amdhsa", omp.is_gpu = true, omp.is_target_device = true } {
-  llvm.func @target_wsloop(%arg0: !llvm.ptr ){
+  llvm.func @target_wsloop(%arg0: !llvm.ptr ) attributes {omp.declare_target = #omp.declaretarget<device_type = (any), capture_clause = (to)>} {
       %loop_ub = llvm.mlir.constant(9 : i32) : i32
       %loop_lb = llvm.mlir.constant(0 : i32) : i32
       %loop_step = llvm.mlir.constant(1 : i32) : i32
@@ -16,7 +16,7 @@ module attributes {dlti.dl_spec = #dlti.dl_spec<#dlti.dl_entry<"dlti.alloca_memo
     llvm.return
   }
 
-  llvm.func @target_empty_wsloop(){
+  llvm.func @target_empty_wsloop() attributes {omp.declare_target = #omp.declaretarget<device_type = (any), capture_clause = (to)>} {
       %loop_ub = llvm.mlir.constant(9 : i32) : i32
       %loop_lb = llvm.mlir.constant(0 : i32) : i32
       %loop_step = llvm.mlir.constant(1 : i32) : i32
diff --git a/mlir/test/Target/LLVMIR/openmp-tark-target-device.mlir b/mlir/test/Target/LLVMIR/openmp-tark-target-device.mlir
new file mode 100644
index 00000000000000..c167604bcd12c0
--- /dev/null
+++ b/mlir/test/Target/LLVMIR/openmp-tark-target-device.mlir
@@ -0,0 +1,27 @@
+// RUN: mlir-translate -mlir-to-llvmir %s | FileCheck %s
+
+// This tests the fix for https://github.com/llvm/llvm-project/issues/84606
+// We are only interested in ensuring that the -mlir-to-llmvir pass doesn't crash.
+// CHECK: {{.*}} = add i32 {{.*}}, 5
+module attributes {omp.is_target_device = true } {
+  llvm.func @_QQmain() attributes {fir.bindc_name = "main", omp.declare_target = #omp.declaretarget<device_type = (host), capture_clause = (to)>} {
+    %0 = llvm.mlir.constant(0 : i32) : i32
+    %1 = llvm.mlir.constant(1 : i64) : i64
+    %2 = llvm.alloca %1 x i32 {bindc_name = "a"} : (i64) -> !llvm.ptr<5>
+    %3 = llvm.addrspacecast %2 : !llvm.ptr<5> to !llvm.ptr
+    omp.task {
+      llvm.store %0, %3 : i32, !llvm.ptr
+      omp.terminator
+    }
+    %4 = omp.map_info var_ptr(%3 : !llvm.ptr, i32) map_clauses(tofrom) capture(ByRef) -> !llvm.ptr {name = "a"}
+    omp.target map_entries(%4 -> %arg0 : !llvm.ptr) {
+    ^bb0(%arg0: !llvm.ptr):
+      %5 = llvm.mlir.constant(5 : i32) : i32
+      %6 = llvm.load %arg0  : !llvm.ptr -> i32
+      %7 = llvm.add %6, %5  : i32
+      llvm.store %7, %arg0  : i32, !llvm.ptr
+      omp.terminator
+    }
+    llvm.return
+  }
+}

github-actions · 2024-03-14T15:09:17Z

✅ With the latest revision this PR passed the C/C++ code formatter.

skatrak · 2024-03-14T15:31:05Z

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp

+      .Case([&](omp::TargetOp) {
+        return convertOmpTarget(*op, builder, moduleTranslation);
+      })
+      // Skip omp ops that are not legal top level ops for the target device


If I understand this correctly, this target region would be ignored:

omp.task { omp.target { } }

In general, by not processing the body of host operations, we will miss target regions nested within them. I don't know if there's a clean way to support that while following this approach.

Of course, if it isn't legal to nest target regions inside of any other constructs, this is a non-issue, but I haven't noticed anything like that in the spec. Maybe @mjklemm can enlighten us on that.

Thinking about it a bit more, maybe it's not too difficult to support these cases. For those host ops that might contain a target region nested (immediately or not), we could just call op.walk<omp::TargetOp>(...) and trigger lowering for these target ops. The only problem this could have (I'm not sure, though) is that some arguments of that omp.target operation might not have been lowered to LLVM IR by then, since we would have skipped parent regions.

program main integer :: A(10), B(10), C(10) do I = 1, 10 A(I) = 1 B(I) = 2 end do !$omp target data map(to: A, B) map(alloc: C) !$omp target map(from: C) do I = 1, 10 C(I) = A(I) + B(I) ! assigns 3, A:1 + B:2 end do !$omp end target !$omp target update from(C) ! updates C device -> host !$omp end target data print *, C ! should be all 3's end program

This is another use-case we need to be keep in mind, target data is a host construct that can contain a region that can have more than one target's within it as well as other host related directives (e.g. the update in the example).

This already has a little bit of a bug in it just now that @TIFitis has looked into and attempted to address in: #85218

This case does seem to be addressed in the above however, via DataOp in the switch case, but thought I'd bring it up just to bring light to that particular use-case.

It's possibly worth stating that I think Exit/Enter/Update are also host directives from my understanding and they do not own a region like Data that can contain device code in this case and likely should not be performed in device passes (there's been at least one error with this in the past). I don't think they can be invoked inside of a target directive/region either as far as the specification is concerned, but that would need some verification as it isn't something I've dug into too deeply.

If I understand this correctly, this target region would be ignored:

omp.task { omp.target { } }

In general, by not processing the body of host operations, we will miss target regions nested within them. I don't know if there's a clean way to support that while following this approach.

Of course, if it isn't legal to nest target regions inside of any other constructs, this is a non-issue, but I haven't noticed anything like that in the spec. Maybe @mjklemm can enlighten us on that.

This exact structure is what both of my PRs (#85130 & #83966) will generate. According to the spec, this shouldn't be illegal (@mjklemm can correct me if I am wrong) - Quoting the spec

When a target construct is encountered, a new target task is generated. The target task region
encloses the target region.

This target task could be interpreted as an implicit task but in the case of the depend clause (and I think the nowait clause as well) it needs an explicit task created (AFAICT as implied from the following statement in the spec 5.2 Section 13.8)

If a depend clause is present, it is associated with the target task

I put up a fix for handling inner regions before reading these comments, so keep that in mind if you look at the latest commit. But, yes the omp.task { omp.target { }} case was ignored in the original code. I added a case where the inner region of an op is traversed and calls the top level dispatch function recursively if it is an omp op. Probably not all operations with inner regions can contain top-level ops, but that should be handled earlier in the compiler, so we shouldn't have to check for illegal combinations.

Thinking about it a bit more, maybe it's not too difficult to support these cases. For those host ops that might contain a target region nested (immediately or not), we could just call op.walk<omp::TargetOp>(...) and trigger lowering for these target ops. The only problem this could have (I'm not sure, though) is that some arguments of that omp.target operation might not have been lowered to LLVM IR by then, since we would have skipped parent regions.

I think we are okay since omp.target is isolated from above, so I believe there shouldn't be any values that need to be translated (codegen seems to work for the omp.task { omp.target {}} case. I'm not sure if it is possible for the top level target data ops to have this issue.

Probably not all operations with inner regions can contain top-level ops, but that should be handled earlier in the compiler, so we shouldn't have to check for illegal combinations.

I agree, this would be way too late to realize that OpenMP operations were nested in an unsupported way.

I think we are okay since omp.target is isolated from above, so I believe there shouldn't be any values that need to be translated (codegen seems to work for the omp.task { omp.target {}} case.

I would think that the region inside of omp.target should be translatable for that reason. My concern was more about arguments to that operation. For example, there's some processing of map arguments as part of convertOmpTarget that is done for both host and device. Will that still work if we haven't translated them to LLVM IR as part of lowering its enclosing region?

%c1 = llvm.mlir.constant(1 : i64) : i64 %x = llvm.alloca %c1 x i32 : (i64) -> !llvm.ptr omp.task { %map = omp.map_info var_ptr(%x : !llvm.ptr, i32) map_clauses(tofrom) capture(ByRef) -> !llvm.ptr omp.target map_entries(%map -> %arg0 : !llvm.ptr) { ^bb0(%arg0: !llvm.ptr): ... omp.terminator } omp.terminator }

If that code makes this approach crash, then we would have to rethink the approach. We would need to create some dummy values, or somehow detect missing LLVM IR values and create them on demand. I think the former was basically what the early outlining did with the outlined function's arguments, if I remember correctly.

The map arguments are already skipped in the common case, since they are supposed to be lowered by the omp.target op lowering and that lowering should not go beyond the omp.map_info operations, because that would include host code in the target codegen.

Of course, if it isn't legal to nest target regions inside of any other constructs, this is a non-issue, but I haven't noticed anything like that in the spec. Maybe @mjklemm can enlighten us on that.

It's allowed to nest target in other OpenMP constructs like task, parallel, for, etc.

bhandarkar-pranav · 2024-03-15T19:19:31Z

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp

+            omp::DataBoundsOp, omp::CriticalDeclareOp>(
+          [&](auto op) { return success(); })
+      // Recursively traverse inner regions
+      .Case<omp::ParallelOp, omp::MasterOp, omp::CriticalOp,


I am wary of such bespoke listings of ops. It makes it one more arguably hard-to-find place to update if and when we add new ops.

My initial intuition was to see if an op contains a region, then we could recurse into it and find top level target ops. Ops that don't have regions can then be ignored. This logic works for all ops listed on line 3079 except SimdLoopOp and ReductionDeclareOp which have regions. :( So that wont work unless we are willing to needlessly go over all the ops in these two ops knowing fully well we won't find top level target ops.

+1 on trying to process the op's contents based on the existence of regions attached to it rather than listing the types. Probably, rather than the current type switch, we could do something similar to this instead:

if (isa<TargetOp>(op)) return convertOmpTarget(*op, builder, moduleTranslation); bool interrupted = op->walk<WalkOrder::PreOrder>([&](omp::TargetOp targetOp) { if (failed(convertOmpTarget(*targetOp, builder, moduleTranslation))) return WalkResult::interrupt(); return WalkResult::skip(); }).wasInterrupted(); return failure(interrupted);

That should do the trick of recursively looking for target regions, I think.

Yes, this is probably easier maintenance wise since we don't really need to control what is allowed and not, and the micro-optimization to skip ops that should not contain omp.target ops is probably not worth the effort, so I will change the code to do the generic handling of sub-regions.

The code above works fine, so I used that. Thanks!

bhandarkar-pranav · 2024-03-15T19:55:19Z

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp

+  return llvm::TypeSwitch<Operation *, LogicalResult>(op)
+      .Case<omp::DataOp, omp::EnterDataOp, omp::ExitDataOp, omp::UpdateDataOp>(
+          [&](auto op) {
+            return convertOmpTargetData(op, builder, moduleTranslation);


I think there is as risk here. @agozillon is right that omp::DataOp is a host op that can contain a region with omp.target ops. So, we need to recurse into this and calling convertOmpTargetData is fine, but only from the point of view of looking inside it for other omp.target ops. I am sure convertOmpTargetData does more than that which on the device (i.e. when ompBuilder->Config.isTargetDevice() == true) is unnecessary. Hopefully it doesn't lead to state that needs to be rolled back.

Secondly, we shouldn't be processing EnterDataOp, UpdateDataOp, ExitDataOp at all because their actions are performed on the host and they don't have regions that we need to recurse into to look for other omp.target ops like we have to in the case of omp.task below.

Okay, in that case only omp.target is a top-level target device op.

skatrak

Thank you Jan for addressing previous comments. I think this approach looks good, I just have a couple of nits to try make names hopefully a bit easier to understand.

There's also a comment to try out a small modification to one of the tests, which is the main concern I had with this approach. Even if it doesn't work, I think it's still worth merging this PR, but hopefully it does.

skatrak · 2024-03-21T14:38:30Z

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp

@@ -2922,6 +2922,162 @@ convertDeclareTargetAttr(Operation *op, mlir::omp::DeclareTargetAttr attribute,
  return success();
 }

+static bool isInternalTargetDeviceOp(Operation *op) {


Suggested change

static bool isInternalTargetDeviceOp(Operation *op) {

static bool isTargetDeviceOp(Operation *op) {

I'll make this change once we agree on the other comments.

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp

mlir/test/Target/LLVMIR/omptarget-target-inside-task.mlir

github-actions · 2024-03-25T16:52:34Z

✅ With the latest revision this PR passed the Python code formatter.

skatrak

Thanks Jan for working on this and addressing my comments. I think this looks good, I'd suggest giving it just a day or so to give others a chance to express any remaining concerns before merging.

skatrak

Thanks Jan for the changes to make this work for omp.target_data as well. In my opinion this looks ready, just notice when you rebase that some of the MLIR operations have been renamed (DataOp -> TargetDataOp, DataBoundsOp -> MapBoundsOp, ...). Some of the ops in the new tests will have to be updated as well (omp.map_info -> omp.map.info, ...)

This patch separates the lowering dispatch for host and target devices. For the target device, if the current operation is not a top-level operation (e.g. omp.target) or is inside a target device code region it will be ignored, since it belongs to the host code.

jsjodin requested review from ergawy, skatrak, kiranchandramohan, raghavendhra, agozillon, DominikAdamski, kparzysz and bhandarkar-pranav March 14, 2024 15:06

llvmbot added mlir:llvm mlir mlir:openmp flang:openmp labels Mar 14, 2024

skatrak reviewed Mar 14, 2024

View reviewed changes

jsjodin requested a review from skatrak March 15, 2024 14:40

bhandarkar-pranav reviewed Mar 15, 2024

View reviewed changes

jsjodin requested a review from bhandarkar-pranav March 18, 2024 15:36

skatrak mentioned this pull request Mar 20, 2024

[OpenMP] Fix target data region codegen being omitted for device pass #85218

Merged

skatrak reviewed Mar 21, 2024

View reviewed changes

jsjodin requested a review from skatrak March 22, 2024 13:57

skatrak approved these changes Mar 25, 2024

View reviewed changes

skatrak approved these changes Apr 2, 2024

View reviewed changes

jsjodin force-pushed the jleyonberg/target_dispatch branch from aaadaad to eaa836c Compare April 4, 2024 00:10

jsjodin force-pushed the jleyonberg/target_dispatch branch from 47e56ce to 98f256c Compare April 4, 2024 14:24

Add comment

Loading
Loading status checks…

37e36b5

jsjodin merged commit 9708d09 into llvm:main Apr 5, 2024
4 checks passed

skatrak mentioned this pull request Apr 9, 2024

This patch reverts the fix added in #85218 in favour of a better approach #88118

Closed

bhandarkar-pranav mentioned this pull request May 13, 2024

[mlir][openmp] - Fix crash in OpenMPIRBuilder when converting to LLVMIR #84611

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MLIR][OpenMP] Skip host omp ops when compiling for the target device #85239

[MLIR][OpenMP] Skip host omp ops when compiling for the target device #85239

jsjodin commented Mar 14, 2024

llvmbot commented Mar 14, 2024 •

edited

Loading

github-actions bot commented Mar 14, 2024 •

edited

Loading

skatrak Mar 14, 2024

skatrak Mar 14, 2024

agozillon Mar 14, 2024

agozillon Mar 14, 2024 •

edited

Loading

bhandarkar-pranav Mar 14, 2024

jsjodin Mar 15, 2024

jsjodin Mar 15, 2024

skatrak Mar 18, 2024 •

edited

Loading

jsjodin Mar 18, 2024

mjklemm Mar 20, 2024

bhandarkar-pranav Mar 15, 2024

skatrak Mar 18, 2024

jsjodin Mar 18, 2024

jsjodin Mar 18, 2024

bhandarkar-pranav Mar 15, 2024 •

edited

Loading

jsjodin Mar 18, 2024

skatrak left a comment

skatrak Mar 21, 2024

jsjodin Mar 22, 2024

github-actions bot commented Mar 25, 2024

skatrak left a comment

skatrak left a comment

	static bool isInternalTargetDeviceOp(Operation *op) {
	static bool isTargetDeviceOp(Operation *op) {

[MLIR][OpenMP] Skip host omp ops when compiling for the target device #85239

[MLIR][OpenMP] Skip host omp ops when compiling for the target device #85239

Conversation

jsjodin commented Mar 14, 2024

llvmbot commented Mar 14, 2024 • edited Loading

github-actions bot commented Mar 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

agozillon Mar 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

skatrak Mar 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bhandarkar-pranav Mar 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

skatrak left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Mar 25, 2024

skatrak left a comment

Choose a reason for hiding this comment

skatrak left a comment

Choose a reason for hiding this comment

llvmbot commented Mar 14, 2024 •

edited

Loading

github-actions bot commented Mar 14, 2024 •

edited

Loading

agozillon Mar 14, 2024 •

edited

Loading

skatrak Mar 18, 2024 •

edited

Loading

bhandarkar-pranav Mar 15, 2024 •

edited

Loading