-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CIR][HIP|CUDA] Mirrors CUDARuntime skeleton of OG #1380
Open
koparasy
wants to merge
2,286
commits into
llvm:main
Choose a base branch
from
koparasy:features/fix-device-cg-structure
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This PR adds LLVMIR lowering support for `cir.assume`, `cir.assume.aligned`, and `cir.assume.separate_storage`.
In OG CodeGen, string literals has `private` linkage as default (marked by `cir_private` in CIR assembly). But CIR uses `internal`, which is probably an ancient typo. This PR keeps align with it and thus modifies the existing test files.
…bits (llvm#1068) This PR adds a partial support for so-called indirect function arguments for struct types with size > 128 bits for aarch64. #### Couple words about the implementation The hard part is that it's not one-to-one copy from the original codegen, but the code is inspired by it of course. In the original codegen there is no much job is done for the indirect arguments inside the loop in the `EmitFunctionProlog`, and additional alloca is added in the end, in the call for `EmitParamDecl` function. In our case, for the indirect argument (which is a pointer) we replace the original alloca with a new one, and store the pointer in there. And replace all the uses of the old alloca with the load from the new one, i.e. in both cases users works with the pointer to a structure. Also, I added several missed features in the `constructAttributeList` for indirect arguments, but didn't preserve the original code structure, so let me know if I need to do it.
Note that we lack two pieces of support for aliases in LLVM IR dialect globals: the `alias` keyword and function types `void (ptr)`, this needs to be done before we can nail this for good, but it's outside the scope of this commit. The behavior is slightly different under -O1, which will be addressed next.
…m#1073) This PR changes the naming format of string literals from `.str1` to `.str.1`, making it easier to reuse the existing testcases of OG CodeGen.
It's currently polluting the `cir` namespace with very generic symbols like `Integer` and `Memory`, which is pretty confusing. `X86_64ABIInfo` already has `Class` alias for `X86ArgClass`, so we can use that alias to qualify all uses.
Note that there are still missing pieces, which will be incrementally addressed.
Just verified this is actually done by some LLVM optimization, not by the frontend emitting directly, so this is a non-goal now, since CIR can also use LLVM opts to do the same once we have real global alias.
Also verified this does not apply anymore, we match -O0. The only remaing part is to lower to proper LLVM globals once LLVM IR dialect gets the global alias support.
test case is from [Traditional Clang CodeGen test file](https://github.com/llvm/clangir/blob/723e78afb5ae4fbd000269a057410913ade3ef44/clang/test/CodeGen/2004-02-13-BuiltinFrameReturnAddress.c#L5)
This commit extends the support for floating point attributes parsing by using the new `AsmParser::parseFloat(fltSemnatics, APFloat&)` interface. As a drive-by, this commit also harmonizes the cir.fp print/parse namespace usage, and adds the constraint of supporting only "CIRFPType"s for cir.fp in tablegen instead of verifying it manually in the parsing logic. --- This commit is based on top of a to-be-upstreamed commit which extends the upstream MLIR float type parsing. Upstream parsing of float type has full capability only through parsing the Builtin Dialect's `FloatAttr`. Thos commit exposes the same capabilities to downstream users. --- This PR should resolve (at least) `GCC-C-execute-ieee-fp-cmp-2` and `GCC-C-execute-ieee-fp-cmp-4`, paving the way to other `GCC-C-execute-ieee-*` tests passing from the SingleSource suite. It resolves llvm#559 .
The test case is from [clang/test/CodeGen/neon-crypto.c](https://github.com/llvm/clangir/blob/dbf320e5c3db0410566ae561067c595308870bad/clang/test/CodeGen/neon-crypto.c#L28) Need a dedicated test file as RunOptions has crypto related target features.
llvm#1025 explains why we want to move the CIR dialect from the `mlir::cir` to the `cir` namespace. To avoid overloading the `cir` namespace too much afterwards, move all symbols whose equivalents live inside the `clang::CodeGen` namespace to a new `clang::CIRGen` namespace, so that we match the original CodeGen's structure more closely. There's some symbols that live under `clang/include/clang/CIR` whose equivalents live in `clang/lib/CodeGen` and are in the `clang::CodeGen` namespace. We have these symbols in a common location since they're also used by lowering, so I've also left them in the `cir` namespace. Those symbols are: - AArch64ABIKind - ABIArgInfo - FnInfoOpts - TypeEvaluationKind - X86AVXABILevel This is a pretty large PR out of necessity. To make it slightly more reviewable, I've split it out into three commits (which will be squashed together when the PR lands): - The first commit manually switches places to the `clang::CIRGen` namespace. This has to be manual because we only want to move things selectively. - The second commit adjusts namespace prefixes to make builds work. I ran https://gist.github.com/smeenai/f4dd441fb61c53e835c4e6057f8c322f to make this change. The script is idempotent, and I added substitutions one at a time and reviewed each one afterwards (using `git diff --color-words=.`) to ensure only intended changes were being made. - The third commit runs `git clang-format`. Because I went one-by-one with all my substitutions and checked each one afterwards, I'm pretty confident in the validity of all the changes (despite the size of the PR).
As the title says, this PR adds support for unions for AArch64 lowering. The idea is basically the same as the [original](https://github.com/llvm/clangir/blob/dbf320e5c3db0410566ae561067c595308870bad/clang/lib/AST/RecordLayoutBuilder.cpp#L2111) codegen, and I added a couple of tests.
llvm#1025 explains why we want to move the CIR dialect from the `mlir::cir` to the `cir` namespace. This is a large PR, and I've split it out into four commits (that'll be squashed when landing). The first commit changes `mlir::cir` to `cir` everywhere. This was originally done mechanically with: ``` find clang \( -name '*.h' -o -name '*.cpp' -o -name '*.td' \) -print0 | xargs -0 perl -pi -e 's/mlir::cir/cir/g' find clang \( -name '*.h' -o -name '*.cpp' \) -print0 | xargs -0 perl -pi -e 's/::cir/cir/g' find clang \( -name '*.h' -o -name '*.cpp' \) -print0 | xargs -0 perl -0777 -pi -e 's/namespace mlir \{\nnamespace cir \{/namespace cir {/g' find clang \( -name '*.h' -o -name '*.cpp' \) -print0 | xargs -0 perl -0777 -pi -e 's!\} // namespace cir\n\} // namespace mlir!} // namespace cir!g' ``` It then required some manual fixups to address edge cases. Code that lived under `mlir::cir` could refer to the `mlir` namespace without qualification, but after the namespace change, we need to explicitly qualify all our usages. This is done in the second commit via https://gist.github.com/smeenai/996200fd45ad123bbf22b412d59479b6, which is an idempotent script to add all qualifications. I added cases to the script one at a time and reviewed each change afterwards to ensure we were only making the intended modifications, so I feel pretty confident in the end result. I also removed `using namespace llvm` from some headers to avoid conflicts, which in turn required adding some `llvm::` qualifiers as well. The third commit fixes a test, since an error message now contains the mlir namespace. Similar tests in flang also have the namespace in their error messages, so this is an expected change. The fourth change runs `git clang-format`. Unfortunately, that doesn't work for TableGen files, so we'll have a few instances of undesirable formatting left there. I'll look into fixing that as a follow-up. I validated the end result by examining the symbols in the built Clang binary. There's nothing in the `mlir::cir` namespace anymore. https://gist.github.com/smeenai/8438fd01588109fcdbde5c8652781dc0 had the symbols which lived in `cir` and should have moved to `clang::CIRGen`, and I validated that all the symbols were moved, with the exceptions noted in llvm#1082 and the duplicated symbols noted in llvm#1025.
We have both clang/include/clang/CIRFrontendAction/CIRGenAction.h and clang/include/clang/cir/FrontendAction/CIRGenAction.h, which is a historical artifact. The latter is what's being upstreamed, so merge the former into it to avoid any confusion.
The buildX naming convention originated when the CIRGen implementation was planned to be substantially different from original CodeGen. CIRGen is now a much closer adaption of CodeGen, and the emitX to buildX renaming just makes things more confusing, since CodeGen also has some helper functions whose names start with build or Build, so it's not immediately clear which CodeGen function corresponds to a CIRGen buildX function. Rename the buildX functions back to emitX to fix this. This diff was generated mostly mechanically. I searched for all buildX functions in CIRGen and all emitX or buildX functions in CodeGen: ``` rg '\b[Bb]uild[A-Z][A-Za-z0-9_]*\b' clang/lib/CIR/CodeGen -Io | sort -u -o /tmp/buildfuncs rg '\b([Ee]mit|[Bb]uild)[A-Z][A-Za-z0-9_]*\b' clang/lib/CodeGen -Io | sort -u -o /tmp/emitfuncs ``` I used a simple Python script to find corresponding functions: https://gist.github.com/smeenai/02be7ced8564cef5518df72606ec7b19. https://gist.github.com/smeenai/6ffd67be4249c8cebdd7fa99cfa4f13c is the resulting list of correspondences. This isn't 100% accurate because it's not accounting for the files that the functions are present in, but that's pretty unlikely to matter here, so I kept it simple. The buildX functions in CIRGen which correspond to an emitX function in CodeGen should be changed, and the ones which correspond to a BuildX function in CodeGen should not be changed. That leaves some functions without any correspondences, which required a judgement call. I scanned through all those functions, and buildVirtualMethodAttr was the only one that seemed like it shouldn't be changed to emit. I performed the replacement as follows: ``` funcs="$(awk '(/-> [Ee]/ || !/->/) && !/buildVirtualMethodAttr/ { print substr($1, 6) }' /tmp/corrfuncs | paste -sd '|')" find clang/include/clang/CIR clang/lib/CIR/{CodeGen,FrontendAction} \( -name '*.h' -o -name '*.cpp' \) -print0 | \ xargs -0 perl -pi -e "s/\bbuild($funcs)\\b/emit\\1/g" ``` The mechanical changes are in the first commit of this PR. There was a manual fixup required for a token pasting macro in CIRGenExprScalar.cpp, which is the second commit. I then ran `git clang-format`, which is the third commit. (They'll be squashed together when the PR is committed.)
Move LP into CIRGen and give it a handle on the CIRGenModule. A lot of code has been duplicated from CIRGen into cir/Dialect/Transforms in order to let LP live there, but with more necessary CIRGen features (e.g. EH scope and cleanups) going to be used in LP it doesn't make sense to keep it separate. Add this patch that just refactors LoweringPrepare into the CIRGen directory and give it a handle on the CGM.
This PR supports string literals in OpenCL end to end, making it possible to use `printf`. This involves two changes: * In CIRGen, ensure we create the global symbol for string literals with correct `constant` address space. * In LowerToLLVM, make the lowering of `GlobalViewAttr` aware of the upstream address space. Other proper refactors are also applied. Two test cases from OG CodeGen are reused. `str_literals.cl` is the primary test, while `printf.cl` is the bonus one.
This reverts commit 4463352.
This patch adds support for the `__builtin_signbit` intrinsic. The intrinsic requires special handling for PowerPC; however, since ClangIR does not currently support PowerPC, this handling is omitted in this implementation.
The test code is from [OG's clang/test/CodeGen/builtins-memcpy-inline.c](https://github.com/llvm/clangir/blob/5f1afad625f1292ffcf02c36402d292c46213c86/clang/test/CodeGen/builtins-memcpy-inline.c#L7) Also, a little design choice when introducing MemCpyInlineOp, I chose to let it inherit CIR_MemCpyOp, so in future when we optimize MemCpy like Ops, we'd have cleaner and more unified code. However, the cost is that during LLVM lowering I'd have to convert the length from ConstOp into IntegerAttr as that's [what LLVM dialect is expecting](https://mlir.llvm.org/docs/Dialects/LLVM/#llvmintrmemcpyinline-llvmmemcpyinlineop)
Notable change is to introduce helper function `buildBinaryAtomicPost` which models on [OG's `EmitBinaryAtomicPost`](https://github.com/llvm/clangir/blob/dbf320e5c3db0410566ae561067c595308870bad/clang/lib/CodeGen/CGBuiltin.cpp#L340C15-L340C35). Comparing to `EmitBinaryAtomicPost`, `buildBinaryAtomicPost` is more concise as OG's ``EmitBinaryAtomicPost`` duplicates quite a bit of code from [MakeBinaryAtomicValue](https://github.com/llvm/clangir/blob/dbf320e5c3db0410566ae561067c595308870bad/clang/lib/CodeGen/CGBuiltin.cpp#L340) Also, didn't implement invert as __sync_add_and_fetch does not need it, but will add it (which is a trivial work) when we implement a builtin that needs it. Test cases are from [OG](https://github.com/llvm/clangir/blob/dbf320e5c3db0410566ae561067c595308870bad/clang/test/CodeGen/Atomics.c#L134)
…lvm#1095) When emitting initializers for static declarations, it's essential to ensure that the `cir.global` operation aligns its type with that of the initializer. The original approach creates a new global op and copies every attribute from the old one. But just `setSymType` should work well. This also removes missing feature flags there.
… bf16 (llvm#1360) Lower vget_lane_bf16, vduph_lane f16 and bf16
Traditional Clang implementation: https://github.com/llvm/clangir/blob/a0091e38f1027e35d17819e02ee1ae257a12d296/clang/lib/CodeGen/CGBuiltin.cpp#L4116-L4118 I use the first argument type as the return type. It is OK for `__builtin_elementwise_acos`, however, I'm not sure it is OK for other builtin functions. Resolves: llvm#1361
…ance (llvm#1363) This PR removes XFAIL test_vec3
CUDA shared variables are device-only, accessible from all threads in a block of some kernel. It's similar to `local` variables in OpenCL which all threads in a work-group can access. Hence they are realized as `static` variables in addrspace(local). On the other hand, the local variables inside a kernel (without special attributes) are just regular variables, typically emitted by `CreateTempAlloca`. They are in the default address space. OG checks if the expected address space, denoted by the type, is the same as the actual address space indicated by attributes. If they aren't the same, a `addrspacecast` is emitted when a global variable is accessed. In CIR however, `cir.get_global` alreadys carries that information in `!cir.ptr` type, so we don't need a cast.
Add an implementation for array new initialization of multidimension arrays. This is able to leverage existing array element initialization with just a few changes to update the pointer types.
…1341) On HIP when launching a kernel we pass as a first argument a global variable that points to the device stub function. We follow OG design by having a map that pairs globals to symbols. In CUDA this is effectively a nop, as CUDA passes the device stub as a first argument.
Lower `neon_vshrd_n_u64`
Lower `neon_vsrad_n_u64`
This PR fixes a case with global vars replacement when an array element is neither a`GlobalView` nor `ConstArray` but just a null pointer . Previously we just skip such element. Basically, the case in question is `static S* global[] = {0, &another_global}`
The final part of llvm#1273.
llvm#1372) Lower vgetq_lane_bf16, vduph f16 and fb16
This is the second part of [PR#1290](llvm#1290), adding initial support for `__cxa_rethrow`. So, the last PR did not support statements that come after the `UnreachableOp` from the `rethrow`, we just ignored them, e.g: ``` struct S { S() {} }; void foo() { int r = 1; try { throw; S s; } catch (...) { ++r; } } ``` This PR fixes this. A few changes: - `rethrow` statements split their block into multiple blocks. - Tests with statements that come after the `rethrow` were added and old ones were modified. Concern: - The `ScopeOp` workaround still exists which I guess was one of the main concerns when we tried in the last PR. As usual, I am open to discussions on this and how to approach it better -:)
This patch introduces `CIR_TBAAStructAttr`, which encodes the type and offset of each field in a struct, although it may lead to some duplication in `CIR`. If we manage `cir::TBAAStructAttr` by adding a new method to `ASTRecordDeclInterface`, it will also introduce duplication between `CIRGen` and LLVM lowering.
…vm#1370) Currently, trying to generate CIR for the following code snippet `yield.cpp` fails. Using `bin/clang++ yield.cpp -Xclang -fclangir -Xclang -emit-cir -S -o -`: ``` struct S { S() {}; int a; }; void foo() { try { S s; } catch (...) { foo(); } } ``` The error: ``` loc("yield.cpp":6:6): error: 'cir.yield' op must be the last operation in the parent block ``` There is an extra YieldOp! The CIR dump before verification looks something like: ``` "cir.scope"() ({ %0 = "cir.alloca"() <{alignment = 4 : i64, allocaType = !cir.struct<struct "S" {!cir.int<s, 32>} #cir.record.decl.ast>, ast = #cir.var.decl.ast, init, name = "s"}> : () -> !cir.ptr<!cir.struct<struct "S" {!cir.int<s, 32>} #cir.record.decl.ast>> "cir.try"() <{catch_types = [#cir.all]}> ({ "cir.call"(%0) <{callee = @_ZN1SC1Ev, calling_conv = 1 : i32, exception, extra_attrs = #cir<extra({})>, side_effect = 1 : i32}> ({ "cir.yield"() : () -> () }) : (!cir.ptr<!cir.struct<struct "S" {!cir.int<s, 32>} #cir.record.decl.ast>>) -> () "cir.yield"() : () -> () }, { %1 = "cir.catch_param"() : () -> !cir.ptr<!cir.void> "cir.call"() <{ast = #cir.call.expr.ast, callee = @_Z3foov, calling_conv = 1 : i32, exception, extra_attrs = #cir<extra({})>, side_effect = 1 : i32}> ({ "cir.yield"() : () -> () "cir.yield"() : () -> () <--- **DUPLICATE** }) : () -> () "cir.yield"() : () -> () }) : () -> () "cir.yield"() : () -> () }, { }) : () -> () ``` This PR adds a check for an already existing terminator before creating a YieldOp during the cleanup.
Original Clang's implementation: https://github.com/llvm/clangir/blob/2df2022c90e00c0eac542eba4078e79306155c7b/clang/lib/CodeGen/CGBuiltin.cpp#L4131-L4133 https://github.com/llvm/clangir/blob/2df2022c90e00c0eac542eba4078e79306155c7b/clang/lib/CodeGen/CGBuiltin.cpp#L762-L776 Compared with non-elementwise exp https://github.com/llvm/clangir/blob/2df2022c90e00c0eac542eba4078e79306155c7b/clang/lib/CodeGen/CGBuiltin.cpp#L3006-L3016 https://github.com/llvm/clangir/blob/2df2022c90e00c0eac542eba4078e79306155c7b/clang/lib/CodeGen/CGBuiltin.cpp#L669-L685 elementwise version doesn't handle constrained situation. I'm not sure whether it is intended. For renaming, it is to match original clang's implementation closely. Resolves: llvm#1375
Lower vrsrad_n u64 and s64
There were problems with the pointer type and element type of the Address class getting out of sync. In the traditional codegen the pointer has no type, so it was sufficient for the Address class to simply track the type it was supposed to be pointing to. Since ClangIR pointer values are typed, the Address::withType function wasn't really doing what it was supposed to. It returned an object with the same pointer that the original object had, but with a mismatched element type. This change updates the Address::withType function to perform a bitcast to get the expected pointer type before creating a new Address object. It also adds assertions in the Address class to verify that pointer type and element type are consistent and updates many places that were causing those assertions to fire. These code changes cause extra bitcasts to be emitted in a few places. Regression tests have been updated as needed to reflect the CIR that is now generated.
conflict should be resolved |
seven-mile
added a commit
that referenced
this pull request
Mar 26, 2025
This is a rebased version of the inactive PR #1380. --------- Co-authored-by: koparasy <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
As discussed in #1341 CIRGen does not match OG CG. This PR mirrors the skeleton of OG.