-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] PFT rewrite-based do-concurrent parallelization #230
Conversation
This is a proof of concept on a PFT rewrite-based approach to do OpenMP-based parallelization of `do concurrent` Fotran loops. The main advantage of this approach over an MLIR pass-based one is that it should allow us to avoid re-implementing and sharing significant pieces of PFT to MLIR lowering between Flang lowering and the MLIR pass. The current WIP replicates the PFT structure of an `!$omp parallel do` when encountering a `do concurrent` loop. It is still in very early stages and the resulting PFT cannot be lowered to MLIR yet, as it seems to be missing some symbol updates. However, it can already be tested: ```sh $ cat test.f90 subroutine foo() implicit none integer :: i do concurrent(i=1:10) end do !$omp parallel do do i=1,10 end do end subroutine $ flang-new -fc1 -fdebug-unparse -fopenmp test.f90 SUBROUTINE foo IMPLICIT NONE INTEGER i !$OMP PARALLEL DO DO i=1_4,10_4 END DO !$OMP PARALLEL DO DO i=1_4,10_4 END DO END SUBROUTINE $ flang-new -fc1 -fdebug-dump-parse-tree -fopenmp test.f90 Program -> ProgramUnit -> SubroutineSubprogram | SubroutineStmt | | Name = 'foo' | SpecificationPart | | ImplicitPart -> ImplicitPartStmt -> ImplicitStmt -> | | DeclarationConstruct -> SpecificationConstruct -> TypeDeclarationStmt | | | DeclarationTypeSpec -> IntrinsicTypeSpec -> IntegerTypeSpec -> | | | EntityDecl | | | | Name = 'i' | ExecutionPart -> Block | | ExecutionPartConstruct -> ExecutableConstruct -> OpenMPConstruct -> OpenMPLoopConstruct | | | OmpBeginLoopDirective | | | | OmpLoopDirective -> llvm::omp::Directive = parallel do | | | | OmpClauseList -> | | | DoConstruct | | | | NonLabelDoStmt | | | | | LoopControl -> LoopBounds | | | | | | Scalar -> Name = 'i' | | | | | | Scalar -> Expr = '1_4' | | | | | | | LiteralConstant -> IntLiteralConstant = '1' | | | | | | Scalar -> Expr = '10_4' | | | | | | | LiteralConstant -> IntLiteralConstant = '10' | | | | Block | | | | EndDoStmt -> | | ExecutionPartConstruct -> ExecutableConstruct -> OpenMPConstruct -> OpenMPLoopConstruct | | | OmpBeginLoopDirective | | | | OmpLoopDirective -> llvm::omp::Directive = parallel do | | | | OmpClauseList -> | | | DoConstruct | | | | NonLabelDoStmt | | | | | LoopControl -> LoopBounds | | | | | | Scalar -> Name = 'i' | | | | | | Scalar -> Expr = '1_4' | | | | | | | LiteralConstant -> IntLiteralConstant = '1' | | | | | | Scalar -> Expr = '10_4' | | | | | | | LiteralConstant -> IntLiteralConstant = '10' | | | | Block | | | | EndDoStmt -> | EndSubroutineStmt -> ```
Thanks Sergio for working on this. At this stage, this definitely looks simpler than the pass solution. The initial WIP for the pass (here: https://github.com/llvm/llvm-project/pull/77285/files) was similar to your proposal in terms of simplicity; if you remove all comments, tests, and boilerplate, you end up with a few lines of logic to do the actual conversion. However, I do understand that PFT rewriting is going to probably be much simpler than the pass when we map to One important point I would like to make clear: the current issues we are facing now with Additionally, the PFT rewriting approach is quite simpler but, I think, is quite limiting as well. For the following reasons:
Admittedly, the pass looks like a lot of code compared to the PFT rewriting at the current stage of the PR. However, much of that code are:
The pass has been validated on LBL's inference engine (which is a quite large codebase with annoying features):
I have to admit that I am biased though. The pass is one of my ugly babies that I contributed since I joined the team. Therefore, adding Michael Klemm and Michael Kruse to chime in. Maybe they have further input. And it is a very nice dicussion to have reglardless of the result, so thanks for opening the WIP. |
I share @ergawy concerns here. DO CONCURRENT should regularly need program analysis, for instance regarding localization rules. Just adding a For our first implementation that explicitly requires to be user-enabled using |
I also agree that for a proper translation of
|
Thank you @ergawy, @Meinersbur and @mjklemm for your comments. The idea with this was mainly to give a preview of how this feature could be implemented following a different approach than what we currently have. The main benefit would be that we would be able to work directly in terms of Fortran code to OpenMP construct translations, and rely on the existing infrastructure to lower these resulting high-level constructs. I understand that there are many edge cases and analyses that are needed, and that we can't really translate every case in a straightforward way, but I'm not sure I follow the specific concern about doing these at the PFT level as opposed to MLIR. We'd be looking for specific language patterns, so the PFT seems to be a good place to do this, since it's also where semantic checks are done. It also seems like it would be easier to e.g. add a In any case, I'm not against the current pass approach. It's already developed and supports many cases, so it makes sense for it to be the preferred approach unless we find out about important limitations or we find there's a much simpler alternative. I was hoping the PFT rewrite approach would potentially be that second case, but I can see that most of the important issues both approaches will have to deal with are actually the same. I think we just need to focus on making sure OpenMP lowering and the do concurrent transformation pass are able to effectively share code, to keep to a minimum the chance for divergence in supported features by both and to avoid making things harder for ourselves by having to re-implement significant amounts of MLIR code generation. |
This is a proof of concept on a PFT rewrite-based approach to do OpenMP-based parallelization of
do concurrent
Fotran loops. The main advantage of this approach over an MLIR pass-based one is that it should allow us to avoid re-implementing and sharing significant pieces of PFT to MLIR lowering between Flang lowering and the MLIR pass, potentially also making it much simpler to keep feature parity.The current WIP replicates the PFT structure of an
!$omp parallel do
when encountering ado concurrent
loop. It is still in very early stages and the resulting PFT cannot be lowered to MLIR yet, as it seems to be missing some symbol updates. However, it can already be tested: