Skip to content

[MLIR][OpenMP] Skip host omp ops when compiling for the target device #85239

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 5, 2024

Conversation

jsjodin
Copy link
Contributor

@jsjodin jsjodin commented Mar 14, 2024

This patch separates the lowering dispatch for host and target devices. For the target device, if the current operation is not a top-level operation (e.g. omp.target) or is inside a target device code region it will be ignored, since it belongs to the host code.

This is an alternative approach to #84611, the new test in this PR was taken from there.

@llvmbot
Copy link
Member

llvmbot commented Mar 14, 2024

@llvm/pr-subscribers-flang-openmp
@llvm/pr-subscribers-mlir
@llvm/pr-subscribers-mlir-llvm

@llvm/pr-subscribers-mlir-openmp

Author: Jan Leyonberg (jsjodin)

Changes

This patch separates the lowering dispatch for host and target devices. For the target device, if the current operation is not a top-level operation (e.g. omp.target) or is inside a target device code region it will be ignored, since it belongs to the host code.

This is an alternative approach to #84611, the new test in this PR was taken from there.


Full diff: https://github.com/llvm/llvm-project/pull/85239.diff

6 Files Affected:

  • (modified) mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+182-111)
  • (modified) mlir/test/Target/LLVMIR/omptarget-parallel-wsloop.mlir (+3-3)
  • (modified) mlir/test/Target/LLVMIR/omptarget-teams-llvm.mlir (+1-1)
  • (modified) mlir/test/Target/LLVMIR/omptarget-wsloop-collapsed.mlir (+1-1)
  • (modified) mlir/test/Target/LLVMIR/omptarget-wsloop.mlir (+2-2)
  • (added) mlir/test/Target/LLVMIR/openmp-tark-target-device.mlir (+27)
diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
index bef227f2c583ed..9ad0b99078005b 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
@@ -2922,6 +2922,178 @@ convertDeclareTargetAttr(Operation *op, mlir::omp::DeclareTargetAttr attribute,
   return success();
 }
 
+static bool isInternalTargetDeviceOp(Operation *op) {
+  // Assumes no reverse offloading
+  if (op->getParentOfType<omp::TargetOp>())
+    return true;
+
+  auto parentFn = op->getParentOfType<LLVM::LLVMFuncOp>();
+  if (auto declareTargetIface =
+          llvm::dyn_cast<mlir::omp::DeclareTargetInterface>(
+              parentFn.getOperation()))
+    if (declareTargetIface.isDeclareTarget() &&
+        declareTargetIface.getDeclareTargetDeviceType() !=
+            mlir::omp::DeclareTargetDeviceType::host)
+      return true;
+
+  return false;
+}
+
+/// Given an OpenMP MLIR operation, create the corresponding LLVM IR
+/// (including OpenMP runtime calls).
+static LogicalResult
+convertCommonOperation(Operation *op, llvm::IRBuilderBase &builder,
+                       LLVM::ModuleTranslation &moduleTranslation) {
+
+  llvm::OpenMPIRBuilder *ompBuilder = moduleTranslation.getOpenMPBuilder();
+
+  return llvm::TypeSwitch<Operation *, LogicalResult>(op)
+      .Case([&](omp::BarrierOp) {
+        ompBuilder->createBarrier(builder.saveIP(), llvm::omp::OMPD_barrier);
+        return success();
+      })
+      .Case([&](omp::TaskwaitOp) {
+        ompBuilder->createTaskwait(builder.saveIP());
+        return success();
+      })
+      .Case([&](omp::TaskyieldOp) {
+        ompBuilder->createTaskyield(builder.saveIP());
+        return success();
+      })
+      .Case([&](omp::FlushOp) {
+        // No support in Openmp runtime function (__kmpc_flush) to accept
+        // the argument list.
+        // OpenMP standard states the following:
+        //  "An implementation may implement a flush with a list by ignoring
+        //   the list, and treating it the same as a flush without a list."
+        //
+        // The argument list is discarded so that, flush with a list is treated
+        // same as a flush without a list.
+        ompBuilder->createFlush(builder.saveIP());
+        return success();
+      })
+      .Case([&](omp::ParallelOp op) {
+        return convertOmpParallel(op, builder, moduleTranslation);
+      })
+      .Case([&](omp::ReductionOp reductionOp) {
+        return convertOmpReductionOp(reductionOp, builder, moduleTranslation);
+      })
+      .Case([&](omp::MasterOp) {
+        return convertOmpMaster(*op, builder, moduleTranslation);
+      })
+      .Case([&](omp::CriticalOp) {
+        return convertOmpCritical(*op, builder, moduleTranslation);
+      })
+      .Case([&](omp::OrderedRegionOp) {
+        return convertOmpOrderedRegion(*op, builder, moduleTranslation);
+      })
+      .Case([&](omp::OrderedOp) {
+        return convertOmpOrdered(*op, builder, moduleTranslation);
+      })
+      .Case([&](omp::WsLoopOp) {
+        return convertOmpWsLoop(*op, builder, moduleTranslation);
+      })
+      .Case([&](omp::SimdLoopOp) {
+        return convertOmpSimdLoop(*op, builder, moduleTranslation);
+      })
+      .Case([&](omp::AtomicReadOp) {
+        return convertOmpAtomicRead(*op, builder, moduleTranslation);
+      })
+      .Case([&](omp::AtomicWriteOp) {
+        return convertOmpAtomicWrite(*op, builder, moduleTranslation);
+      })
+      .Case([&](omp::AtomicUpdateOp op) {
+        return convertOmpAtomicUpdate(op, builder, moduleTranslation);
+      })
+      .Case([&](omp::AtomicCaptureOp op) {
+        return convertOmpAtomicCapture(op, builder, moduleTranslation);
+      })
+      .Case([&](omp::SectionsOp) {
+        return convertOmpSections(*op, builder, moduleTranslation);
+      })
+      .Case([&](omp::SingleOp op) {
+        return convertOmpSingle(op, builder, moduleTranslation);
+      })
+      .Case([&](omp::TeamsOp op) {
+        return convertOmpTeams(op, builder, moduleTranslation);
+      })
+      .Case([&](omp::TaskOp op) {
+        return convertOmpTaskOp(op, builder, moduleTranslation);
+      })
+      .Case([&](omp::TaskGroupOp op) {
+        return convertOmpTaskgroupOp(op, builder, moduleTranslation);
+      })
+      .Case<omp::YieldOp, omp::TerminatorOp, omp::ReductionDeclareOp,
+            omp::CriticalDeclareOp>([](auto op) {
+        // `yield` and `terminator` can be just omitted. The block structure
+        // was created in the region that handles their parent operation.
+        // `reduction.declare` will be used by reductions and is not
+        // converted directly, skip it.
+        // `critical.declare` is only used to declare names of critical
+        // sections which will be used by `critical` ops and hence can be
+        // ignored for lowering. The OpenMP IRBuilder will create unique
+        // name for critical section names.
+        return success();
+      })
+      .Case([&](omp::ThreadprivateOp) {
+        return convertOmpThreadprivate(*op, builder, moduleTranslation);
+      })
+      .Case<omp::DataOp, omp::EnterDataOp, omp::ExitDataOp, omp::UpdateDataOp>(
+          [&](auto op) {
+            return convertOmpTargetData(op, builder, moduleTranslation);
+          })
+      .Case([&](omp::TargetOp) {
+        return convertOmpTarget(*op, builder, moduleTranslation);
+      })
+      .Case<omp::MapInfoOp, omp::DataBoundsOp, omp::PrivateClauseOp>(
+          [&](auto op) {
+            // No-op, should be handled by relevant owning operations e.g.
+            // TargetOp, EnterDataOp, ExitDataOp, DataOp etc. and then
+            // discarded
+            return success();
+          })
+      .Default([&](Operation *inst) {
+        return inst->emitError("unsupported OpenMP operation: ")
+               << inst->getName();
+      });
+}
+
+static LogicalResult
+convertInternalTargetOp(Operation *op, llvm::IRBuilderBase &builder,
+                        LLVM::ModuleTranslation &moduleTranslation) {
+  return convertCommonOperation(op, builder, moduleTranslation);
+}
+
+static LogicalResult
+convertTopLevelTargetOp(Operation *op, llvm::IRBuilderBase &builder,
+                        LLVM::ModuleTranslation &moduleTranslation) {
+  return llvm::TypeSwitch<Operation *, LogicalResult>(op)
+      .Case<omp::DataOp, omp::EnterDataOp, omp::ExitDataOp, omp::UpdateDataOp>(
+          [&](auto op) {
+            return convertOmpTargetData(op, builder, moduleTranslation);
+          })
+      .Case([&](omp::TargetOp) {
+        return convertOmpTarget(*op, builder, moduleTranslation);
+      })
+      // Skip omp ops that are not legal top level ops for the target device
+      .Case<omp::BarrierOp, omp::TaskwaitOp, omp::TaskyieldOp, omp::FlushOp,
+            omp::ParallelOp, omp::ReductionOp, omp::MasterOp, omp::CriticalOp,
+            omp::OrderedRegionOp, omp::OrderedOp, omp::WsLoopOp,
+            omp::SimdLoopOp, omp::AtomicReadOp, omp::AtomicWriteOp,
+            omp::AtomicUpdateOp, omp::AtomicCaptureOp, omp::SectionsOp,
+            omp::SingleOp, omp::TeamsOp, omp::TaskOp, omp::TaskGroupOp,
+            omp::YieldOp, omp::TerminatorOp, omp::ReductionDeclareOp,
+            omp::ThreadprivateOp, omp::DistributeOp, omp::MapInfoOp,
+            omp::DataBoundsOp, omp::CriticalDeclareOp>(
+          [&](auto op) {
+            return success();
+          })
+      .Default([&](Operation *inst) {
+        return inst->emitError("unsupported OpenMP operation: ")
+               << inst->getName();
+      });
+}
+
 namespace {
 
 /// Implementation of the dialect interface that converts operations belonging
@@ -2937,8 +3109,8 @@ class OpenMPDialectLLVMIRTranslationInterface
   convertOperation(Operation *op, llvm::IRBuilderBase &builder,
                    LLVM::ModuleTranslation &moduleTranslation) const final;
 
-  /// Given an OpenMP MLIR attribute, create the corresponding LLVM-IR, runtime
-  /// calls, or operation amendments
+  /// Given an OpenMP MLIR attribute, create the corresponding LLVM-IR,
+  /// runtime calls, or operation amendments
   LogicalResult
   amendOperation(Operation *op, ArrayRef<llvm::Instruction *> instructions,
                  NamedAttribute attribute,
@@ -3043,116 +3215,15 @@ LogicalResult OpenMPDialectLLVMIRTranslationInterface::convertOperation(
     LLVM::ModuleTranslation &moduleTranslation) const {
 
   llvm::OpenMPIRBuilder *ompBuilder = moduleTranslation.getOpenMPBuilder();
+  if (ompBuilder->Config.isTargetDevice()) {
+    if (isInternalTargetDeviceOp(op)) {
+      return convertInternalTargetOp(op, builder, moduleTranslation);
+    } else {
+      return convertTopLevelTargetOp(op, builder, moduleTranslation);
+    }
+  }
 
-  return llvm::TypeSwitch<Operation *, LogicalResult>(op)
-      .Case([&](omp::BarrierOp) {
-        ompBuilder->createBarrier(builder.saveIP(), llvm::omp::OMPD_barrier);
-        return success();
-      })
-      .Case([&](omp::TaskwaitOp) {
-        ompBuilder->createTaskwait(builder.saveIP());
-        return success();
-      })
-      .Case([&](omp::TaskyieldOp) {
-        ompBuilder->createTaskyield(builder.saveIP());
-        return success();
-      })
-      .Case([&](omp::FlushOp) {
-        // No support in Openmp runtime function (__kmpc_flush) to accept
-        // the argument list.
-        // OpenMP standard states the following:
-        //  "An implementation may implement a flush with a list by ignoring
-        //   the list, and treating it the same as a flush without a list."
-        //
-        // The argument list is discarded so that, flush with a list is treated
-        // same as a flush without a list.
-        ompBuilder->createFlush(builder.saveIP());
-        return success();
-      })
-      .Case([&](omp::ParallelOp op) {
-        return convertOmpParallel(op, builder, moduleTranslation);
-      })
-      .Case([&](omp::ReductionOp reductionOp) {
-        return convertOmpReductionOp(reductionOp, builder, moduleTranslation);
-      })
-      .Case([&](omp::MasterOp) {
-        return convertOmpMaster(*op, builder, moduleTranslation);
-      })
-      .Case([&](omp::CriticalOp) {
-        return convertOmpCritical(*op, builder, moduleTranslation);
-      })
-      .Case([&](omp::OrderedRegionOp) {
-        return convertOmpOrderedRegion(*op, builder, moduleTranslation);
-      })
-      .Case([&](omp::OrderedOp) {
-        return convertOmpOrdered(*op, builder, moduleTranslation);
-      })
-      .Case([&](omp::WsLoopOp) {
-        return convertOmpWsLoop(*op, builder, moduleTranslation);
-      })
-      .Case([&](omp::SimdLoopOp) {
-        return convertOmpSimdLoop(*op, builder, moduleTranslation);
-      })
-      .Case([&](omp::AtomicReadOp) {
-        return convertOmpAtomicRead(*op, builder, moduleTranslation);
-      })
-      .Case([&](omp::AtomicWriteOp) {
-        return convertOmpAtomicWrite(*op, builder, moduleTranslation);
-      })
-      .Case([&](omp::AtomicUpdateOp op) {
-        return convertOmpAtomicUpdate(op, builder, moduleTranslation);
-      })
-      .Case([&](omp::AtomicCaptureOp op) {
-        return convertOmpAtomicCapture(op, builder, moduleTranslation);
-      })
-      .Case([&](omp::SectionsOp) {
-        return convertOmpSections(*op, builder, moduleTranslation);
-      })
-      .Case([&](omp::SingleOp op) {
-        return convertOmpSingle(op, builder, moduleTranslation);
-      })
-      .Case([&](omp::TeamsOp op) {
-        return convertOmpTeams(op, builder, moduleTranslation);
-      })
-      .Case([&](omp::TaskOp op) {
-        return convertOmpTaskOp(op, builder, moduleTranslation);
-      })
-      .Case([&](omp::TaskGroupOp op) {
-        return convertOmpTaskgroupOp(op, builder, moduleTranslation);
-      })
-      .Case<omp::YieldOp, omp::TerminatorOp, omp::ReductionDeclareOp,
-            omp::CriticalDeclareOp>([](auto op) {
-        // `yield` and `terminator` can be just omitted. The block structure
-        // was created in the region that handles their parent operation.
-        // `reduction.declare` will be used by reductions and is not
-        // converted directly, skip it.
-        // `critical.declare` is only used to declare names of critical
-        // sections which will be used by `critical` ops and hence can be
-        // ignored for lowering. The OpenMP IRBuilder will create unique
-        // name for critical section names.
-        return success();
-      })
-      .Case([&](omp::ThreadprivateOp) {
-        return convertOmpThreadprivate(*op, builder, moduleTranslation);
-      })
-      .Case<omp::DataOp, omp::EnterDataOp, omp::ExitDataOp, omp::UpdateDataOp>(
-          [&](auto op) {
-            return convertOmpTargetData(op, builder, moduleTranslation);
-          })
-      .Case([&](omp::TargetOp) {
-        return convertOmpTarget(*op, builder, moduleTranslation);
-      })
-      .Case<omp::MapInfoOp, omp::DataBoundsOp, omp::PrivateClauseOp>(
-          [&](auto op) {
-            // No-op, should be handled by relevant owning operations e.g.
-            // TargetOp, EnterDataOp, ExitDataOp, DataOp etc. and then
-            // discarded
-            return success();
-          })
-      .Default([&](Operation *inst) {
-        return inst->emitError("unsupported OpenMP operation: ")
-               << inst->getName();
-      });
+  return convertCommonOperation(op, builder, moduleTranslation);
 }
 
 void mlir::registerOpenMPDialectTranslation(DialectRegistry &registry) {
diff --git a/mlir/test/Target/LLVMIR/omptarget-parallel-wsloop.mlir b/mlir/test/Target/LLVMIR/omptarget-parallel-wsloop.mlir
index 8ab50f05f07167..b0fe642238f14f 100644
--- a/mlir/test/Target/LLVMIR/omptarget-parallel-wsloop.mlir
+++ b/mlir/test/Target/LLVMIR/omptarget-parallel-wsloop.mlir
@@ -4,10 +4,10 @@
 // for nested omp do loop inside omp target region
 
 module attributes {dlti.dl_spec = #dlti.dl_spec<#dlti.dl_entry<"dlti.alloca_memory_space", 5 : ui32>>, llvm.data_layout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8", llvm.target_triple = "amdgcn-amd-amdhsa", omp.is_gpu = true, omp.is_target_device = true } {
-  llvm.func @target_parallel_wsloop(%arg0: !llvm.ptr) attributes {
+  llvm.func @target_parallel_wsloop(%arg0: !llvm.ptr) attributes {omp.declare_target = #omp.declaretarget<device_type = (any), capture_clause = (to)>,
     target_cpu = "gfx90a",
-    target_features = #llvm.target_features<["+gfx9-insts", "+wavefrontsize64"]>
-  } {
+    target_features = #llvm.target_features<["+gfx9-insts", "+wavefrontsize64"]>}
+   {
     omp.parallel {
       %loop_ub = llvm.mlir.constant(9 : i32) : i32
       %loop_lb = llvm.mlir.constant(0 : i32) : i32
diff --git a/mlir/test/Target/LLVMIR/omptarget-teams-llvm.mlir b/mlir/test/Target/LLVMIR/omptarget-teams-llvm.mlir
index 96cced7a1d584b..c5f89eb2c3274c 100644
--- a/mlir/test/Target/LLVMIR/omptarget-teams-llvm.mlir
+++ b/mlir/test/Target/LLVMIR/omptarget-teams-llvm.mlir
@@ -5,7 +5,7 @@
 
 module attributes {omp.is_target_device = true} {
   llvm.func @foo(i32)
-  llvm.func @omp_target_teams_shared_simple(%arg0 : i32)  {
+  llvm.func @omp_target_teams_shared_simple(%arg0 : i32)  attributes {omp.declare_target = #omp.declaretarget<device_type = (any), capture_clause = (to)>} {
     omp.teams {
       llvm.call @foo(%arg0) : (i32) -> ()
       omp.terminator
diff --git a/mlir/test/Target/LLVMIR/omptarget-wsloop-collapsed.mlir b/mlir/test/Target/LLVMIR/omptarget-wsloop-collapsed.mlir
index e246c551886cfa..0d77423abcb4f1 100644
--- a/mlir/test/Target/LLVMIR/omptarget-wsloop-collapsed.mlir
+++ b/mlir/test/Target/LLVMIR/omptarget-wsloop-collapsed.mlir
@@ -4,7 +4,7 @@
 // for nested omp do loop with collapse clause inside omp target region
 
 module attributes {dlti.dl_spec = #dlti.dl_spec<#dlti.dl_entry<"dlti.alloca_memory_space", 5 : ui32>>, llvm.data_layout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8", llvm.target_triple = "amdgcn-amd-amdhsa", omp.is_gpu = true, omp.is_target_device = true } {
-  llvm.func @target_collapsed_wsloop(%arg0: !llvm.ptr) {
+  llvm.func @target_collapsed_wsloop(%arg0: !llvm.ptr) attributes {omp.declare_target = #omp.declaretarget<device_type = (any), capture_clause = (to)>} {
     %loop_ub = llvm.mlir.constant(99 : i32) : i32
     %loop_lb = llvm.mlir.constant(0 : i32) : i32
     %loop_step = llvm.mlir.constant(1 : index) : i32
diff --git a/mlir/test/Target/LLVMIR/omptarget-wsloop.mlir b/mlir/test/Target/LLVMIR/omptarget-wsloop.mlir
index 220eb85b3483ec..0f3f503dfa5377 100644
--- a/mlir/test/Target/LLVMIR/omptarget-wsloop.mlir
+++ b/mlir/test/Target/LLVMIR/omptarget-wsloop.mlir
@@ -4,7 +4,7 @@
 // for nested omp do loop inside omp target region
 
 module attributes {dlti.dl_spec = #dlti.dl_spec<#dlti.dl_entry<"dlti.alloca_memory_space", 5 : ui32>>, llvm.data_layout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8", llvm.target_triple = "amdgcn-amd-amdhsa", omp.is_gpu = true, omp.is_target_device = true } {
-  llvm.func @target_wsloop(%arg0: !llvm.ptr ){
+  llvm.func @target_wsloop(%arg0: !llvm.ptr ) attributes {omp.declare_target = #omp.declaretarget<device_type = (any), capture_clause = (to)>} {
       %loop_ub = llvm.mlir.constant(9 : i32) : i32
       %loop_lb = llvm.mlir.constant(0 : i32) : i32
       %loop_step = llvm.mlir.constant(1 : i32) : i32
@@ -16,7 +16,7 @@ module attributes {dlti.dl_spec = #dlti.dl_spec<#dlti.dl_entry<"dlti.alloca_memo
     llvm.return
   }
 
-  llvm.func @target_empty_wsloop(){
+  llvm.func @target_empty_wsloop() attributes {omp.declare_target = #omp.declaretarget<device_type = (any), capture_clause = (to)>} {
       %loop_ub = llvm.mlir.constant(9 : i32) : i32
       %loop_lb = llvm.mlir.constant(0 : i32) : i32
       %loop_step = llvm.mlir.constant(1 : i32) : i32
diff --git a/mlir/test/Target/LLVMIR/openmp-tark-target-device.mlir b/mlir/test/Target/LLVMIR/openmp-tark-target-device.mlir
new file mode 100644
index 00000000000000..c167604bcd12c0
--- /dev/null
+++ b/mlir/test/Target/LLVMIR/openmp-tark-target-device.mlir
@@ -0,0 +1,27 @@
+// RUN: mlir-translate -mlir-to-llvmir %s | FileCheck %s
+
+// This tests the fix for https://github.com/llvm/llvm-project/issues/84606
+// We are only interested in ensuring that the -mlir-to-llmvir pass doesn't crash.
+// CHECK: {{.*}} = add i32 {{.*}}, 5
+module attributes {omp.is_target_device = true } {
+  llvm.func @_QQmain() attributes {fir.bindc_name = "main", omp.declare_target = #omp.declaretarget<device_type = (host), capture_clause = (to)>} {
+    %0 = llvm.mlir.constant(0 : i32) : i32
+    %1 = llvm.mlir.constant(1 : i64) : i64
+    %2 = llvm.alloca %1 x i32 {bindc_name = "a"} : (i64) -> !llvm.ptr<5>
+    %3 = llvm.addrspacecast %2 : !llvm.ptr<5> to !llvm.ptr
+    omp.task {
+      llvm.store %0, %3 : i32, !llvm.ptr
+      omp.terminator
+    }
+    %4 = omp.map_info var_ptr(%3 : !llvm.ptr, i32) map_clauses(tofrom) capture(ByRef) -> !llvm.ptr {name = "a"}
+    omp.target map_entries(%4 -> %arg0 : !llvm.ptr) {
+    ^bb0(%arg0: !llvm.ptr):
+      %5 = llvm.mlir.constant(5 : i32) : i32
+      %6 = llvm.load %arg0  : !llvm.ptr -> i32
+      %7 = llvm.add %6, %5  : i32
+      llvm.store %7, %arg0  : i32, !llvm.ptr
+      omp.terminator
+    }
+    llvm.return
+  }
+}

Copy link

github-actions bot commented Mar 14, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

.Case([&](omp::TargetOp) {
return convertOmpTarget(*op, builder, moduleTranslation);
})
// Skip omp ops that are not legal top level ops for the target device
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this correctly, this target region would be ignored:

omp.task {
  omp.target {
  }
}

In general, by not processing the body of host operations, we will miss target regions nested within them. I don't know if there's a clean way to support that while following this approach.

Of course, if it isn't legal to nest target regions inside of any other constructs, this is a non-issue, but I haven't noticed anything like that in the spec. Maybe @mjklemm can enlighten us on that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it a bit more, maybe it's not too difficult to support these cases. For those host ops that might contain a target region nested (immediately or not), we could just call op.walk<omp::TargetOp>(...) and trigger lowering for these target ops. The only problem this could have (I'm not sure, though) is that some arguments of that omp.target operation might not have been lowered to LLVM IR by then, since we would have skipped parent regions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

program main
   integer :: A(10), B(10), C(10)

   do I = 1, 10
      A(I) = 1
      B(I) = 2
   end do
   !$omp target data map(to: A, B) map(alloc: C)
   !$omp target map(from: C)
   do I = 1, 10
      C(I) = A(I) + B(I) ! assigns 3, A:1 + B:2
   end do
   !$omp end target
   !$omp target update from(C) ! updates C device -> host
   !$omp end target data

   print *, C ! should be all 3's
end program

This is another use-case we need to be keep in mind, target data is a host construct that can contain a region that can have more than one target's within it as well as other host related directives (e.g. the update in the example).

This already has a little bit of a bug in it just now that @TIFitis has looked into and attempted to address in: #85218

Copy link
Contributor

@agozillon agozillon Mar 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case does seem to be addressed in the above however, via DataOp in the switch case, but thought I'd bring it up just to bring light to that particular use-case.

It's possibly worth stating that I think Exit/Enter/Update are also host directives from my understanding and they do not own a region like Data that can contain device code in this case and likely should not be performed in device passes (there's been at least one error with this in the past). I don't think they can be invoked inside of a target directive/region either as far as the specification is concerned, but that would need some verification as it isn't something I've dug into too deeply.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this correctly, this target region would be ignored:

omp.task {
  omp.target {
  }
}

In general, by not processing the body of host operations, we will miss target regions nested within them. I don't know if there's a clean way to support that while following this approach.

Of course, if it isn't legal to nest target regions inside of any other constructs, this is a non-issue, but I haven't noticed anything like that in the spec. Maybe @mjklemm can enlighten us on that.

This exact structure is what both of my PRs (#85130 & #83966) will generate. According to the spec, this shouldn't be illegal (@mjklemm can correct me if I am wrong) - Quoting the spec

When a target construct is encountered, a new target task is generated. The target task region
encloses the target region.

This target task could be interpreted as an implicit task but in the case of the depend clause (and I think the nowait clause as well) it needs an explicit task created (AFAICT as implied from the following statement in the spec 5.2 Section 13.8)

If a depend clause is present, it is associated with the target task

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put up a fix for handling inner regions before reading these comments, so keep that in mind if you look at the latest commit. But, yes the omp.task { omp.target { }} case was ignored in the original code. I added a case where the inner region of an op is traversed and calls the top level dispatch function recursively if it is an omp op. Probably not all operations with inner regions can contain top-level ops, but that should be handled earlier in the compiler, so we shouldn't have to check for illegal combinations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it a bit more, maybe it's not too difficult to support these cases. For those host ops that might contain a target region nested (immediately or not), we could just call op.walk<omp::TargetOp>(...) and trigger lowering for these target ops. The only problem this could have (I'm not sure, though) is that some arguments of that omp.target operation might not have been lowered to LLVM IR by then, since we would have skipped parent regions.

I think we are okay since omp.target is isolated from above, so I believe there shouldn't be any values that need to be translated (codegen seems to work for the omp.task { omp.target {}} case. I'm not sure if it is possible for the top level target data ops to have this issue.

Copy link
Member

@skatrak skatrak Mar 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not all operations with inner regions can contain top-level ops, but that should be handled earlier in the compiler, so we shouldn't have to check for illegal combinations.

I agree, this would be way too late to realize that OpenMP operations were nested in an unsupported way.

I think we are okay since omp.target is isolated from above, so I believe there shouldn't be any values that need to be translated (codegen seems to work for the omp.task { omp.target {}} case.

I would think that the region inside of omp.target should be translatable for that reason. My concern was more about arguments to that operation. For example, there's some processing of map arguments as part of convertOmpTarget that is done for both host and device. Will that still work if we haven't translated them to LLVM IR as part of lowering its enclosing region?

%c1 = llvm.mlir.constant(1 : i64) : i64
%x = llvm.alloca %c1 x i32 : (i64) -> !llvm.ptr
omp.task {
  %map = omp.map_info var_ptr(%x : !llvm.ptr, i32) map_clauses(tofrom) capture(ByRef) -> !llvm.ptr
  omp.target map_entries(%map -> %arg0 : !llvm.ptr) {
  ^bb0(%arg0: !llvm.ptr):
    ...
    omp.terminator
  }
  omp.terminator
}

If that code makes this approach crash, then we would have to rethink the approach. We would need to create some dummy values, or somehow detect missing LLVM IR values and create them on demand. I think the former was basically what the early outlining did with the outlined function's arguments, if I remember correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The map arguments are already skipped in the common case, since they are supposed to be lowered by the omp.target op lowering and that lowering should not go beyond the omp.map_info operations, because that would include host code in the target codegen.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, if it isn't legal to nest target regions inside of any other constructs, this is a non-issue, but I haven't noticed anything like that in the spec. Maybe @mjklemm can enlighten us on that.

It's allowed to nest target in other OpenMP constructs like task, parallel, for, etc.

@jsjodin jsjodin requested a review from skatrak March 15, 2024 14:40
omp::DataBoundsOp, omp::CriticalDeclareOp>(
[&](auto op) { return success(); })
// Recursively traverse inner regions
.Case<omp::ParallelOp, omp::MasterOp, omp::CriticalOp,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wary of such bespoke listings of ops. It makes it one more arguably hard-to-find place to update if and when we add new ops.

My initial intuition was to see if an op contains a region, then we could recurse into it and find top level target ops. Ops that don't have regions can then be ignored. This logic works for all ops listed on line 3079 except SimdLoopOp and ReductionDeclareOp which have regions. :( So that wont work unless we are willing to needlessly go over all the ops in these two ops knowing fully well we won't find top level target ops.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on trying to process the op's contents based on the existence of regions attached to it rather than listing the types. Probably, rather than the current type switch, we could do something similar to this instead:

if (isa<TargetOp>(op))
  return convertOmpTarget(*op, builder, moduleTranslation);

bool interrupted = op->walk<WalkOrder::PreOrder>([&](omp::TargetOp targetOp) {
  if (failed(convertOmpTarget(*targetOp, builder, moduleTranslation)))
    return WalkResult::interrupt();
  return WalkResult::skip();
}).wasInterrupted();
return failure(interrupted);

That should do the trick of recursively looking for target regions, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is probably easier maintenance wise since we don't really need to control what is allowed and not, and the micro-optimization to skip ops that should not contain omp.target ops is probably not worth the effort, so I will change the code to do the generic handling of sub-regions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code above works fine, so I used that. Thanks!

return llvm::TypeSwitch<Operation *, LogicalResult>(op)
.Case<omp::DataOp, omp::EnterDataOp, omp::ExitDataOp, omp::UpdateDataOp>(
[&](auto op) {
return convertOmpTargetData(op, builder, moduleTranslation);
Copy link
Contributor

@bhandarkar-pranav bhandarkar-pranav Mar 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is as risk here. @agozillon is right that omp::DataOp is a host op that can contain a region with omp.target ops. So, we need to recurse into this and calling convertOmpTargetData is fine, but only from the point of view of looking inside it for other omp.target ops. I am sure convertOmpTargetData does more than that which on the device (i.e. when ompBuilder->Config.isTargetDevice() == true) is unnecessary. Hopefully it doesn't lead to state that needs to be rolled back.

Secondly, we shouldn't be processing EnterDataOp, UpdateDataOp, ExitDataOp at all because their actions are performed on the host and they don't have regions that we need to recurse into to look for other omp.target ops like we have to in the case of omp.task below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, in that case only omp.target is a top-level target device op.

Copy link
Member

@skatrak skatrak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Jan for addressing previous comments. I think this approach looks good, I just have a couple of nits to try make names hopefully a bit easier to understand.

There's also a comment to try out a small modification to one of the tests, which is the main concern I had with this approach. Even if it doesn't work, I think it's still worth merging this PR, but hopefully it does.

@@ -2922,6 +2922,162 @@ convertDeclareTargetAttr(Operation *op, mlir::omp::DeclareTargetAttr attribute,
return success();
}

static bool isInternalTargetDeviceOp(Operation *op) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static bool isInternalTargetDeviceOp(Operation *op) {
static bool isTargetDeviceOp(Operation *op) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make this change once we agree on the other comments.

@jsjodin jsjodin requested a review from skatrak March 22, 2024 13:57
Copy link

✅ With the latest revision this PR passed the Python code formatter.

Copy link
Member

@skatrak skatrak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Jan for working on this and addressing my comments. I think this looks good, I'd suggest giving it just a day or so to give others a chance to express any remaining concerns before merging.

Copy link
Member

@skatrak skatrak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Jan for the changes to make this work for omp.target_data as well. In my opinion this looks ready, just notice when you rebase that some of the MLIR operations have been renamed (DataOp -> TargetDataOp, DataBoundsOp -> MapBoundsOp, ...). Some of the ops in the new tests will have to be updated as well (omp.map_info -> omp.map.info, ...)

@jsjodin jsjodin force-pushed the jleyonberg/target_dispatch branch from aaadaad to eaa836c Compare April 4, 2024 00:10
This patch separates the lowering dispatch for host and target devices. For the
target device, if the current operation is not a top-level operation
(e.g. omp.target) or is inside a target device code region it will be ignored,
since it belongs to the host code.
@jsjodin jsjodin force-pushed the jleyonberg/target_dispatch branch from 47e56ce to 98f256c Compare April 4, 2024 14:24
@jsjodin jsjodin merged commit 9708d09 into llvm:main Apr 5, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants