Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
85 commits
Select commit Hold shift + click to select a range
cb9b219
Added Op for ParFor in dialect
Ossinking May 20, 2025
8ed4044
Added parfor statement in grammar
Ossinking May 20, 2025
3d1a0ef
added dummy visiter function for parfor
Ossinking May 20, 2025
34db92f
Added first simple test in control flow test-suit
Ossinking May 20, 2025
2b6e4ce
Extend ParForOp with argument for args
yesoer May 21, 2025
ec652b4
Implement visitParForStatement
yesoer May 21, 2025
f51f3a6
Fix type of args in ParForOp
yesoer May 21, 2025
d0f5c50
grammar fixed; rudementary lowering for parfor op
Ossinking May 22, 2025
335010f
Fix in lowering; parfor is no more mapped directly to kernel call
Ossinking May 24, 2025
efb31ad
Restrict the region of ParForOp
yesoer May 26, 2025
bd1b48d
Prepare ParFor body for IR
yesoer May 26, 2025
7127ce6
Fix name for body region of ParForOp
yesoer May 26, 2025
0e74f4d
simplified first test; visitor fix (may be); debug logging in kernel …
Ossinking May 27, 2025
8a0d543
saving of block operands in forOperands in parforOp
Ossinking May 27, 2025
2809bff
struct for parameters instead of array; catching of daphne context in…
Ossinking May 29, 2025
93ac119
simplified kernel call
Ossinking May 31, 2025
22b0394
creation of func & passing func ptr to kernel
Ossinking Jun 4, 2025
e6f3bcc
debug stuff removed
Ossinking Jun 4, 2025
b86935a
capture outer ssa
Ossinking Jun 5, 2025
93d13bf
removed debug prints
Ossinking Jun 5, 2025
46ba5eb
Add inputs, numInputs, isScalar params to kernel
yesoer Jun 12, 2025
c9aed26
Add license and pragma once to ParFor kernel
yesoer Jun 12, 2025
15bfd6a
don't remove uncalled function beginning with parfor_body_
Ossinking Jun 6, 2025
6de9cb3
Use variadicPack inputs to pass args to kernel
yesoer Jun 16, 2025
3290814
Add OpenMP to kernels CMakeLists
yesoer Jun 18, 2025
f10fc59
mapping of outer scope ssa for parfor
Ossinking Jun 22, 2025
1a445de
Handling of induction variable for parfor
Ossinking Jun 23, 2025
a910aee
simple test case for result aggregation added; removed debug dump
Ossinking Jun 23, 2025
45e6ee7
some refactoring for llvm lowering of parfor op to kernel call (there…
Ossinking Jun 23, 2025
2f7492d
comments and refactoring
Ossinking Jun 24, 2025
a07f993
refactoring
Ossinking Jun 24, 2025
5b8a19e
alloca of parameter array type issue fixed
Ossinking Jun 27, 2025
b8c0d93
no result in dummy return op in parfor
Ossinking Jun 27, 2025
85d313b
Handling of result values in parfor. There are still some issues.
Ossinking Jun 29, 2025
6a04a9f
output handling (complete)
Ossinking Jun 29, 2025
d5aaa79
output handling test
Ossinking Jun 29, 2025
c44e760
workaround for DecRefOp: we don't decrease the reference counter in t…
Ossinking Jun 30, 2025
cc8156a
separation of result and block argument handling
Ossinking Jun 30, 2025
b498db5
handling of void return in kernel.
Ossinking Jun 30, 2025
6324632
refactored handling of induction variable and daphne context
Ossinking Jun 30, 2025
a0f3405
removed unused.
Ossinking Jun 30, 2025
ec676a0
Add support for non-constant steps
yesoer Jun 30, 2025
8af59a9
Refactor step handling in kernel for omp
yesoer Jun 30, 2025
a5c294a
Fix wrong IV scoping
yesoer Jul 1, 2025
884eb1f
Add test for multiple parfors
yesoer Jul 1, 2025
727fee4
Fix nested parfor daphne ctx reference
yesoer Jul 3, 2025
2b082a1
Error if daphne ctx not present in parfor args
yesoer Jul 3, 2025
9b4918b
first steps to dependency analysis
Ossinking Jun 30, 2025
f2520f6
multi-block support in parfor
Ossinking Jul 10, 2025
ed26685
missing reducation of iteration results example
Ossinking Jul 10, 2025
5a06c3e
Fix reference counting in parfor
yesoer Jul 11, 2025
a39a8a4
some attempt to get reduction into parfor
Ossinking Jul 11, 2025
ed461dc
context is not set as argument yet
Ossinking Jul 12, 2025
494029f
initial buffering for parfor
Ossinking Jul 12, 2025
722606c
removed debug stuff
Ossinking Jul 12, 2025
770272e
Fix parfor return lowering
yesoer Jul 13, 2025
7309db4
skeleton for conversion to in-place operations inside of parfor loop …
yesoer Jul 13, 2025
83d3459
skeleton of rewrite modified
Ossinking Jul 14, 2025
899d1da
Continue linking of return pointers
yesoer Jul 14, 2025
10ec4b9
in-place update of results
Ossinking Jul 14, 2025
8a06361
Re-Enable garbage collection and Fix parfor case
yesoer Jul 14, 2025
a14ad4c
Add --enable-parallel-parfor build flag
yesoer Jul 14, 2025
279e4e6
Fix don't inc parfor operand ref counter if no args present
yesoer Jul 14, 2025
97d3691
debug prints removed & fixed problem with if statements handling comp…
Ossinking Jul 14, 2025
83209bf
test cases for previous fixed issues
Ossinking Jul 14, 2025
27bd651
block argument conversion refactored
Ossinking Jul 14, 2025
1ec759a
Fix lowering for parfor with no results
yesoer Jul 14, 2025
f06fe5f
Add test for empty parfor
yesoer Jul 14, 2025
f192742
IncRef only for daphne types
Ossinking Jul 15, 2025
61f5abd
Correct rewiring and determination of loop carried variables in condi…
Ossinking Jul 15, 2025
91ce9c5
some test cases added & removed debug outputs in the kernel
Ossinking Jul 15, 2025
8bc9eec
correct upward traversal of parent blocks if parfor is nested inside …
Ossinking Jul 15, 2025
c5df512
unused pass removed
Ossinking Jul 15, 2025
6465a94
unused attribute removed
Ossinking Jul 15, 2025
710b028
failure using function inside of an parfor / using parfor inside of a…
Ossinking Jul 15, 2025
38bd7ca
Fix too many args to parfor op build
yesoer Jul 15, 2025
ec8b2f6
Add build flag for timing parfor
yesoer Jul 15, 2025
229b3fd
Cleanup and minor Refactor for ParFor Visitor
yesoer Jul 15, 2025
f182224
decomposition of the lowering pass for parfor
Ossinking Jul 15, 2025
7f5d21e
refactoring of LinkParForOutputPass
Ossinking Jul 15, 2025
b55093c
void return parfor fix
Ossinking Jul 15, 2025
b992c0c
Fix memleak from first block in daphne::parfor
yesoer Jul 15, 2025
70b5f17
Fix protection of multiple parfor args
yesoer Jul 15, 2025
978016f
rectify return op inside parfor
Ossinking Jul 15, 2025
000031e
Merge branch 'main' into main
Ossinking Jul 19, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,16 @@ set(CMAKE_CXX_FLAGS_DEBUG="${CMAKE_CXX_FLAGS_DEBUG} -g -fno-omit-frame-pointer")
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} -O3")
set(CMAKE_CXX_FLAGS_RELWITHDEBINFO "${CMAKE_CXX_FLAGS_RELWITHDEBINFO} -g -O3 -fno-omit-frame-pointer")

option(ENABLE_PARALLEL_PARFOR "Enable PARALLEL_PARFOR" OFF)
if(ENABLE_PARALLEL_PARFOR)
add_definitions(-DPARALLEL_PARFOR)
endif()

option(ENABLE_TIME_PARFOR "Enable TIME_PARFOR" OFF)
if(ENABLE_TIME_PARFOR )
add_definitions(-DTIME_PARFOR)
endif()

# silence a warning about DEPFILE path transformations (used in LLVM)
cmake_policy(SET CMP0116 OLD)

Expand Down
15 changes: 15 additions & 0 deletions build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,8 @@ function printHelp {
echo " --hdfs Compile with support for HDFS"
echo " --io_uring Compile with support for io_uring"
echo " --no-papi Compile without support for PAPI"
echo " --enable-parallel-parfor Compile with omp support for parfor statements"
echo " --enable-time-parfor Run a timer during parfor execution and print the elapsed time"
}

#******************************************************************************
Expand Down Expand Up @@ -456,6 +458,8 @@ BUILD_IO_URING="-DUSE_IO_URING=OFF"
BUILD_PAPI="-DUSE_PAPI=ON"
WITH_DEPS=1
WITH_SUBMODULE_UPDATE=1
enableParallelParfor="OFF"
enableTimeParfor="OFF"

while [[ $# -gt 0 ]]; do
key=$1
Expand Down Expand Up @@ -528,6 +532,14 @@ while [[ $# -gt 0 ]]; do
-ns | --no-submodule-update)
WITH_SUBMODULE_UPDATE=0
;;
--enable-parallel-parfor)
echo enabling parallelization for parfor
enableParallelParfor="ON"
;;
--enable-time-parfor)
echo will run a timer during parfor execution and print the elapsed time
enableTimeParfor="ON"
;;
*)
unknown_options="${unknown_options} ${key}"
;;
Expand Down Expand Up @@ -1109,6 +1121,9 @@ daphne_msg "Build Daphne"

cmake -S "$projectRoot" -B "$daphneBuildDir" -G Ninja -DANTLR_VERSION="$antlrVersion" \
-DCMAKE_PREFIX_PATH="$installPrefix" \
-DCMAKE_EXPORT_COMPILE_COMMANDS=1 \
-DENABLE_PARALLEL_PARFOR="$enableParallelParfor" \
-DENABLE_TIME_PARFOR="$enableTimeParfor" \
$BUILD_CUDA $BUILD_FPGAOPENCL $BUILD_DEBUG $BUILD_MPI $BUILD_HDFS $BUILD_PAPI

cmake --build "$daphneBuildDir" --target "$target"
Expand Down
11 changes: 5 additions & 6 deletions src/compiler/execution/DaphneIrExecutor.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -84,16 +84,14 @@ bool DaphneIrExecutor::runPasses(mlir::ModuleOp module) {

// This flag is really useful to figure out why the lowering failed
llvm::DebugFlag = userConfig_.debug_llvm;

context_.disableMultithreading();
mlir::PassManager pm(&context_);
// TODO Enable the verifier for all passes where it is possible.
// Originally, it was only turned off for the
// SpecializeGenericFunctionsPass.
pm.enableVerifier(false);
pm.enableIRPrinting();

if (userConfig_.explain_parsing)
pm.addPass(mlir::daphne::createPrintIRPass("IR after parsing:"));

pm.addPass(mlir::daphne::createPrintIRPass("IR after parfor lowering:"));
pm.addPass(mlir::createCanonicalizerPass());
pm.addPass(mlir::createCSEPass());
if (userConfig_.explain_parsing_simplified)
Expand Down Expand Up @@ -132,6 +130,7 @@ bool DaphneIrExecutor::runPasses(mlir::ModuleOp module) {
pm.addNestedPass<mlir::func::FuncOp>(mlir::daphne::createInferencePass());
// Simplify the IR.
pm.addPass(mlir::createCanonicalizerPass());

// Remove unused ops after simplifications.
// TODO The CSE pass seems to eliminate only "one row" of dead code at a time, so we need it as many times as
// the longest chain of ops we reduce; how to apply CSE until a fixpoint?
Expand Down Expand Up @@ -185,7 +184,6 @@ bool DaphneIrExecutor::runPasses(mlir::ModuleOp module) {
pm.addNestedPass<mlir::func::FuncOp>(mlir::daphne::createProfilingPass());

pm.addNestedPass<mlir::func::FuncOp>(mlir::daphne::createInsertDaphneContextPass(userConfig_));

#ifdef USE_CUDA
if (userConfig_.use_cuda)
pm.addNestedPass<mlir::func::FuncOp>(mlir::daphne::createMarkCUDAOpsPass(userConfig_));
Expand Down Expand Up @@ -216,6 +214,7 @@ bool DaphneIrExecutor::runPasses(mlir::ModuleOp module) {
pm.addPass(mlir::createConvertSCFToCFPass());
pm.addNestedPass<mlir::func::FuncOp>(mlir::LLVM::createRequestCWrappersPass());
pm.addPass(mlir::daphne::createLowerToLLVMPass(userConfig_));
pm.addNestedPass<mlir::LLVM::LLVMFuncOp>(mlir::daphne::createLinkParForOutputPass());
pm.addPass(mlir::createReconcileUnrealizedCastsPass());
if (userConfig_.explain_llvm)
pm.addPass(mlir::daphne::createPrintIRPass("IR after llvm lowering:"));
Expand Down
2 changes: 1 addition & 1 deletion src/compiler/lowering/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ add_mlir_dialect_library(MLIRDaphneTransforms
AggDimOpLowering.cpp
TransposeOpLowering.cpp
SparsityExploitationPass.cpp

LinkParForOutputPass.cpp
DEPENDS
MLIRDaphneOpsIncGen
MLIRDaphneTransformsIncGen
Expand Down
165 changes: 165 additions & 0 deletions src/compiler/lowering/LinkParForOutputPass.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
/*
* Copyright 2021 The DAPHNE Consortium
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

#include "ir/daphneir/Passes.h"
#include "mlir/Dialect/LLVMIR/LLVMDialect.h"
#include "mlir/Pass/Pass.h"
#include <cstdio>
#include <deque>
#include <llvm/ADT/StringRef.h>
#include <llvm/Support/Casting.h>
#include <mlir/Analysis/SliceAnalysis.h>
#include <mlir/Dialect/Func/IR/FuncOps.h>
#include <set>

using namespace mlir;

namespace {
/**
* @brief Exploits the canonical form of ParFor output handling to rewrite the ParFor loop body function,
* so that it conduct in-place updates on the shared buffer.
*/
struct LinkParForOutputPass : public PassWrapper<LinkParForOutputPass, OperationPass<LLVM::LLVMFuncOp>> {
void runOnOperation() override {
LLVM::LLVMFuncOp func = getOperation();
llvm::StringRef fName = func.getSymName();

if (!fName.starts_with("parfor_body") || !func->hasAttr("parfor_inplace_rewrite_needed"))
return;

auto &blocks = func.getBody().getBlocks();
if (blocks.empty())
return;

// find all GEPOps that have the return arg as their base pointer
// output pointer is always passed first for kernels
auto funcOutArg = blocks.front().getArgument(0);

std::vector<LLVM::GEPOp> outGEPOps;
for (auto *user : funcOutArg.getUsers()) {
if (auto gep = dyn_cast<LLVM::GEPOp>(user)) {
if (gep.getBase() == funcOutArg)
outGEPOps.push_back(gep);
}
}

// find stores to the GEPOps, these are what currently represents returns
std::vector<LLVM::StoreOp> returnStores;
for (auto gepOp : outGEPOps) {
for (auto user : gepOp->getUsers()) {
if (auto store = dyn_cast<LLVM::StoreOp>(user)) {
if (store.getOperand(1) == gepOp) {
returnStores.push_back(store);
}
}
}
}
OpBuilder b(&getContext());

// rewire outputs of last kernel calls to the respective output of the function.
SetVector<Operation *> toErase = {};
for (auto store : returnStores) {
auto retVal = store->getOperand(0);
auto retValDef = retVal.getDefiningOp();
mlir::Value operand = retValDef->getOperand(0);
// load the output from the kernel
if (auto retValDef2 = retValDef->getOperand(0).getDefiningOp()) {
setInPlaceCalcKernelCall(retValDef2->getOperand(0).getDefiningOp(), store, b, &toErase);
} else if (auto blockArg = operand.dyn_cast<mlir::BlockArgument>()) {
setInPlaceCalcKernelCallInPrevBlocks(blockArg, store, b, &toErase);
}
}

// erase old operations after rewire
for (auto opToErase : toErase) {
opToErase->erase();
}
func->removeAttr("parfor_inplace_rewrite_needed");
}

/**
* @brief Traverses blocks backwards to determinate the `LLVM::LoadOp`, which loads the result of the last kernel
* call.
*/
void setInPlaceCalcKernelCallInPrevBlocks(BlockArgument blockArg, mlir::Operation *store, OpBuilder b, SetVector<Operation *> *toErase) {
mlir::Block *parentBlock = blockArg.getOwner();
unsigned argIndex = blockArg.getArgNumber();

for (mlir::Block *pred : parentBlock->getPredecessors()) {
mlir::Operation &terminator = pred->back();

if (auto branchInterface = mlir::dyn_cast<mlir::BranchOpInterface>(terminator)) {
auto successors = terminator.getSuccessors();

for (unsigned succIdx = 0; succIdx < successors.size(); ++succIdx) {
if (successors[succIdx] == parentBlock) {
auto succOperands = branchInterface.getSuccessorOperands(succIdx);
if (argIndex < succOperands.size()) {
setInPlaceCalcKernelCall(succOperands[argIndex].getDefiningOp(), store, b, toErase);
}
}
}
} else if (terminator.getNumSuccessors() == 1 && terminator.getSuccessor(0) == parentBlock) {
auto succOperands = terminator.getOperands();
if (argIndex < succOperands.size()) {
setInPlaceCalcKernelCall(succOperands[argIndex].getDefiningOp(), store, b, toErase);
}
}
}
}

/**
* @brief Replaces output argument of kernel CallOp with output argument of parfor body function to conduct in-place
* updates in shared buffer
*/
void setInPlaceCalcKernelCall(mlir::Operation *load, mlir::Operation *store, OpBuilder b,
SetVector<Operation *> *toErase) {
auto ptr = load->getOperand(0);
mlir::Operation *lastUpdate = nullptr;
for (auto usr : ptr.getUsers()) {
if ((lastUpdate = llvm::dyn_cast<LLVM::CallOp>(usr))) {
break;
}
}
// Set the insertion point before lastUpdate
b.setInsertionPoint(lastUpdate);
// Get the defining operation of gep
auto gep = store->getOperand(1);
auto gepOp = gep.getDefiningOp();
auto offset = gepOp->getOperand(1).getDefiningOp();

// Insert a clone of gepOp at the new location (if moving, use move semantics if supported)
auto *clonedGepOp = gepOp->clone();
auto *clonedOffset = offset->clone();

clonedGepOp->setOperand(1, clonedOffset->getResult(0));

b.insert(clonedOffset);
b.insert(clonedGepOp);
// Update lastUpdate operand to use the result of the newly inserted gepOp
lastUpdate->setOperand(0, clonedGepOp->getResult(0));
// Load the in-place updated result
load->setOperand(0, clonedGepOp->getResult(0));
// Erase the old operations
toErase->insert(store);
toErase->insert(gepOp);
toErase->insert(offset);
}
};

} // end anonymous namespace

std::unique_ptr<Pass> daphne::createLinkParForOutputPass() { return std::make_unique<LinkParForOutputPass>(); }
Loading