Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CIR][HIP|CUDA] Mirrors CUDARuntime skeleton of OG #1380

Open
wants to merge 2,286 commits into
base: main
Choose a base branch
from

Conversation

koparasy
Copy link
Contributor

As discussed in #1341 CIRGen does not match OG CG. This PR mirrors the skeleton of OG.

Lancern and others added 30 commits January 27, 2025 12:04
This PR adds LLVMIR lowering support for `cir.assume`,
`cir.assume.aligned`, and `cir.assume.separate_storage`.
In OG CodeGen, string literals has `private` linkage as default (marked
by `cir_private` in CIR assembly). But CIR uses `internal`, which is
probably an ancient typo.

This PR keeps align with it and thus modifies the existing test files.
…bits (llvm#1068)

This PR adds a partial support for so-called indirect function arguments
for struct types with size > 128 bits for aarch64.

#### Couple words about the implementation
The hard part is that it's not one-to-one copy from the original
codegen, but the code is inspired by it of course.
In the original codegen there is no much job is done for the indirect
arguments inside the loop in the `EmitFunctionProlog`, and additional
alloca is added in the end, in the call for `EmitParamDecl` function.

In our case, for the indirect argument (which is a pointer) we replace
the original alloca with a new one, and store the pointer in there. And
replace all the uses of the old alloca with the load from the new one,
i.e. in both cases users works with the pointer to a structure.

Also, I added several missed features in the `constructAttributeList`
for indirect arguments, but didn't preserve the original code structure,
so let me know if I need to do it.
Note that we lack two pieces of support for aliases in LLVM IR dialect
globals: the `alias` keyword and function types `void (ptr)`, this needs
to be done before we can nail this for good, but it's outside the scope
of this commit.

The behavior is slightly different under -O1, which will be addressed next.
…m#1073)

This PR changes the naming format of string literals from `.str1` to
`.str.1`, making it easier to reuse the existing testcases of OG
CodeGen.
It's currently polluting the `cir` namespace with very generic symbols
like `Integer` and `Memory`, which is pretty confusing. `X86_64ABIInfo`
already has `Class` alias for `X86ArgClass`, so we can use that alias to
qualify all uses.
Note that there are still missing pieces, which will be incrementally
addressed.
Just verified this is actually done by some LLVM optimization, not by the
frontend emitting directly, so this is a non-goal now, since CIR can also use
LLVM opts to do the same once we have real global alias.
Also verified this does not apply anymore, we match -O0. The only
remaing part is to lower to proper LLVM globals once LLVM IR dialect
gets the global alias support.
This commit extends the support for floating point attributes parsing by
using
the new `AsmParser::parseFloat(fltSemnatics, APFloat&)` interface.
As a drive-by, this commit also harmonizes the cir.fp print/parse
namespace
usage, and adds the constraint of supporting only "CIRFPType"s for
cir.fp in
tablegen instead of verifying it manually in the parsing logic.

---

This commit is based on top of a to-be-upstreamed commit which extends
the upstream MLIR float type parsing. Upstream parsing of float type has
full capability only through parsing the Builtin Dialect's `FloatAttr`.
Thos commit exposes the same capabilities to downstream users.

---

This PR should resolve (at least) `GCC-C-execute-ieee-fp-cmp-2` and
`GCC-C-execute-ieee-fp-cmp-4`, paving the way to other
`GCC-C-execute-ieee-*` tests passing from the SingleSource suite. It
resolves llvm#559 .
The test case is from
[clang/test/CodeGen/neon-crypto.c](https://github.com/llvm/clangir/blob/dbf320e5c3db0410566ae561067c595308870bad/clang/test/CodeGen/neon-crypto.c#L28)
Need a dedicated test file as RunOptions has crypto related target
features.
llvm#1025 explains why we want to move
the CIR dialect from the `mlir::cir` to the `cir` namespace. To avoid
overloading the `cir` namespace too much afterwards, move all symbols
whose equivalents live inside the `clang::CodeGen` namespace to a new
`clang::CIRGen` namespace, so that we match the original CodeGen's
structure more closely.

There's some symbols that live under `clang/include/clang/CIR` whose
equivalents live in `clang/lib/CodeGen` and are in the `clang::CodeGen`
namespace. We have these symbols in a common location since they're also
used by lowering, so I've also left them in the `cir` namespace. Those
symbols are:
- AArch64ABIKind
- ABIArgInfo
- FnInfoOpts
- TypeEvaluationKind
- X86AVXABILevel

This is a pretty large PR out of necessity. To make it slightly more
reviewable, I've split it out into three commits (which will be squashed
together when the PR lands):
- The first commit manually switches places to the `clang::CIRGen`
  namespace. This has to be manual because we only want to move things
  selectively.
- The second commit adjusts namespace prefixes to make builds work. I
  ran https://gist.github.com/smeenai/f4dd441fb61c53e835c4e6057f8c322f
  to make this change. The script is idempotent, and I added
  substitutions one at a time and reviewed each one afterwards (using
  `git diff --color-words=.`) to ensure only intended changes were being
  made.
- The third commit runs `git clang-format`.

Because I went one-by-one with all my substitutions and checked each one
afterwards, I'm pretty confident in the validity of all the changes
(despite the size of the PR).
As the title says, this PR adds support for unions for AArch64 lowering.
The idea is basically the same as the
[original](https://github.com/llvm/clangir/blob/dbf320e5c3db0410566ae561067c595308870bad/clang/lib/AST/RecordLayoutBuilder.cpp#L2111)
codegen, and I added a couple of tests.
llvm#1025 explains why we want to move
the CIR dialect from the `mlir::cir` to the `cir` namespace.

This is a large PR, and I've split it out into four commits (that'll be
squashed when landing). The first commit changes `mlir::cir` to `cir`
everywhere. This was originally done mechanically with:

```
find clang \( -name '*.h' -o -name '*.cpp' -o -name '*.td' \) -print0 | xargs -0 perl -pi -e 's/mlir::cir/cir/g'
find clang \( -name '*.h' -o -name '*.cpp' \) -print0 | xargs -0 perl -pi -e 's/::cir/cir/g'
find clang \( -name '*.h' -o -name '*.cpp' \) -print0 | xargs -0 perl -0777 -pi -e 's/namespace mlir \{\nnamespace cir \{/namespace cir {/g'
find clang \( -name '*.h' -o -name '*.cpp' \) -print0 | xargs -0 perl -0777 -pi -e 's!\} // namespace cir\n\} // namespace mlir!} // namespace cir!g'
```

It then required some manual fixups to address edge cases.

Code that lived under `mlir::cir` could refer to the `mlir` namespace
without qualification, but after the namespace change, we need to
explicitly qualify all our usages. This is done in the second commit via
https://gist.github.com/smeenai/996200fd45ad123bbf22b412d59479b6, which
is an idempotent script to add all qualifications. I added cases to the
script one at a time and reviewed each change afterwards to ensure we
were only making the intended modifications, so I feel pretty confident
in the end result. I also removed `using namespace llvm` from some
headers to avoid conflicts, which in turn required adding some `llvm::`
qualifiers as well.

The third commit fixes a test, since an error message now contains the
mlir namespace. Similar tests in flang also have the namespace in their
error messages, so this is an expected change.

The fourth change runs `git clang-format`. Unfortunately, that doesn't
work for TableGen files, so we'll have a few instances of undesirable
formatting left there. I'll look into fixing that as a follow-up.

I validated the end result by examining the symbols in the built Clang
binary. There's nothing in the `mlir::cir` namespace anymore.
https://gist.github.com/smeenai/8438fd01588109fcdbde5c8652781dc0 had the
symbols which lived in `cir` and should have moved to `clang::CIRGen`,
and I validated that all the symbols were moved, with the exceptions
noted in llvm#1082 and the duplicated
symbols noted in llvm#1025.
We have both clang/include/clang/CIRFrontendAction/CIRGenAction.h and
clang/include/clang/cir/FrontendAction/CIRGenAction.h, which is a
historical artifact. The latter is what's being upstreamed, so merge the
former into it to avoid any confusion.
The buildX naming convention originated when the CIRGen implementation
was planned to be substantially different from original CodeGen. CIRGen
is now a much closer adaption of CodeGen, and the emitX to buildX
renaming just makes things more confusing, since CodeGen also has some
helper functions whose names start with build or Build, so it's not
immediately clear which CodeGen function corresponds to a CIRGen buildX
function. Rename the buildX functions back to emitX to fix this.

This diff was generated mostly mechanically. I searched for all buildX
functions in CIRGen and all emitX or buildX functions in CodeGen:

```
rg '\b[Bb]uild[A-Z][A-Za-z0-9_]*\b' clang/lib/CIR/CodeGen -Io | sort -u -o /tmp/buildfuncs
rg '\b([Ee]mit|[Bb]uild)[A-Z][A-Za-z0-9_]*\b' clang/lib/CodeGen -Io | sort -u -o /tmp/emitfuncs
```

I used a simple Python script to find corresponding functions:
https://gist.github.com/smeenai/02be7ced8564cef5518df72606ec7b19.
https://gist.github.com/smeenai/6ffd67be4249c8cebdd7fa99cfa4f13c is the
resulting list of correspondences. This isn't 100% accurate because it's
not accounting for the files that the functions are present in, but
that's pretty unlikely to matter here, so I kept it simple.

The buildX functions in CIRGen which correspond to an emitX function in
CodeGen should be changed, and the ones which correspond to a BuildX
function in CodeGen should not be changed. That leaves some functions
without any correspondences, which required a judgement call. I scanned
through all those functions, and buildVirtualMethodAttr was the only one
that seemed like it shouldn't be changed to emit. I performed the
replacement as follows:

```
funcs="$(awk '(/-> [Ee]/ || !/->/) && !/buildVirtualMethodAttr/ { print substr($1, 6) }' /tmp/corrfuncs | paste -sd '|')"
find clang/include/clang/CIR clang/lib/CIR/{CodeGen,FrontendAction} \( -name '*.h' -o -name '*.cpp' \) -print0 | \
  xargs -0 perl -pi -e "s/\bbuild($funcs)\\b/emit\\1/g"
```

The mechanical changes are in the first commit of this PR. There was a
manual fixup required for a token pasting macro in CIRGenExprScalar.cpp,
which is the second commit. I then ran `git clang-format`, which is the
third commit. (They'll be squashed together when the PR is committed.)
Move LP into CIRGen and give it a handle on the CIRGenModule. A lot of
code has been duplicated from CIRGen into cir/Dialect/Transforms in
order to let LP live there, but with more necessary CIRGen features
(e.g. EH scope and cleanups) going to be used in LP it doesn't make
sense to keep it separate. Add this patch that just refactors
LoweringPrepare into the CIRGen directory and give it a handle on the
CGM.
This PR supports string literals in OpenCL end to end, making it
possible to use `printf`. This involves two changes:

* In CIRGen, ensure we create the global symbol for string literals with
correct `constant` address space.
* In LowerToLLVM, make the lowering of `GlobalViewAttr` aware of the
upstream address space.

Other proper refactors are also applied.

Two test cases from OG CodeGen are reused. `str_literals.cl` is the
primary test, while `printf.cl` is the bonus one.
This patch adds support for the `__builtin_signbit` intrinsic. The
intrinsic requires special handling for PowerPC; however, since ClangIR
does not currently support PowerPC, this handling is omitted in this
implementation.
The test code is from [OG's
clang/test/CodeGen/builtins-memcpy-inline.c](https://github.com/llvm/clangir/blob/5f1afad625f1292ffcf02c36402d292c46213c86/clang/test/CodeGen/builtins-memcpy-inline.c#L7)
Also, a little design choice when introducing MemCpyInlineOp, I chose to
let it inherit CIR_MemCpyOp, so in future when we optimize MemCpy like
Ops, we'd have cleaner and more unified code.
However, the cost is that during LLVM lowering I'd have to convert the
length from ConstOp into IntegerAttr as that's [what LLVM dialect is
expecting](https://mlir.llvm.org/docs/Dialects/LLVM/#llvmintrmemcpyinline-llvmmemcpyinlineop)
Notable change is to introduce helper function `buildBinaryAtomicPost`
which models on [OG's
`EmitBinaryAtomicPost`](https://github.com/llvm/clangir/blob/dbf320e5c3db0410566ae561067c595308870bad/clang/lib/CodeGen/CGBuiltin.cpp#L340C15-L340C35).
Comparing to `EmitBinaryAtomicPost`, `buildBinaryAtomicPost` is more
concise as OG's ``EmitBinaryAtomicPost`` duplicates quite a bit of code
from
[MakeBinaryAtomicValue](https://github.com/llvm/clangir/blob/dbf320e5c3db0410566ae561067c595308870bad/clang/lib/CodeGen/CGBuiltin.cpp#L340)
Also, didn't implement invert as __sync_add_and_fetch does not need it,
but will add it (which is a trivial work) when we implement a builtin
that needs it.
Test cases are from
[OG](https://github.com/llvm/clangir/blob/dbf320e5c3db0410566ae561067c595308870bad/clang/test/CodeGen/Atomics.c#L134)
…lvm#1095)

When emitting initializers for static declarations, it's essential to
ensure that the `cir.global` operation aligns its type with that of the
initializer.

The original approach creates a new global op and copies every attribute
from the old one. But just `setSymType` should work well.

This also removes missing feature flags there.
FantasqueX and others added 22 commits February 19, 2025 11:24
… bf16 (llvm#1360)

Lower vget_lane_bf16, vduph_lane f16 and bf16
Traditional Clang implementation:
https://github.com/llvm/clangir/blob/a0091e38f1027e35d17819e02ee1ae257a12d296/clang/lib/CodeGen/CGBuiltin.cpp#L4116-L4118

I use the first argument type as the return type. It is OK for
`__builtin_elementwise_acos`, however, I'm not sure it is OK for other
builtin functions.

Resolves: llvm#1361
CUDA shared variables are device-only, accessible from all threads in a
block of some kernel. It's similar to `local` variables in OpenCL which
all threads in a work-group can access. Hence they are realized as
`static` variables in addrspace(local).

On the other hand, the local variables inside a kernel (without special
attributes) are just regular variables, typically emitted by
`CreateTempAlloca`. They are in the default address space.

OG checks if the expected address space, denoted by the type, is the
same as the actual address space indicated by attributes. If they aren't
the same, a `addrspacecast` is emitted when a global variable is
accessed. In CIR however, `cir.get_global` alreadys carries that
information in `!cir.ptr` type, so we don't need a cast.
Add an implementation for array new initialization of multidimension
arrays. This is able to leverage existing array element initialization
with just a few changes to update the pointer types.
…1341)

On HIP when launching a kernel we pass as a first argument a global
variable that points to the device stub function.

We follow OG design by having a map that pairs globals to symbols. In
CUDA this is effectively a nop, as CUDA passes the device stub as a
first argument.
This PR fixes a case with global vars replacement when an array element
is neither a`GlobalView` nor `ConstArray` but just a null pointer .
Previously we just skip such element.

Basically, the case in question is `static S* global[] = {0,
&another_global}`
This is the second part of
[PR#1290](llvm#1290), adding initial
support for `__cxa_rethrow`.

So, the last PR did not support statements that come after the
`UnreachableOp` from the `rethrow`, we just ignored them, e.g:
```
struct S {
  S() {}
};

void foo() {
  int r = 1;
  try {
    throw;
    S s;
  } catch (...) {
    ++r;
  }
}
```

This PR fixes this. 

A few changes: 
- `rethrow` statements split their block into multiple blocks.
- Tests with statements that come after the `rethrow` were added and old
ones were modified.

Concern:
- The `ScopeOp` workaround still exists which I guess was one of the
main concerns when we tried in the last PR.

As usual, I am open to discussions on this and how to approach it better
-:)
This patch introduces `CIR_TBAAStructAttr`, which encodes the type and
offset of each field in a struct, although it may lead to some
duplication in `CIR`. If we manage `cir::TBAAStructAttr` by adding a new
method to `ASTRecordDeclInterface`, it will also introduce duplication
between `CIRGen` and LLVM lowering.
…vm#1370)

Currently, trying to generate CIR for the following code snippet
`yield.cpp` fails. Using `bin/clang++ yield.cpp -Xclang -fclangir
-Xclang -emit-cir -S -o -`:
``` 
struct S {
  S() {};
  int a;
};

void foo() {
  try {
    S s;
  } catch (...) {
    foo();
  }
}
```

The error: 
```
loc("yield.cpp":6:6): error: 'cir.yield' op must be the last operation in the parent block
```
There is an extra YieldOp! The CIR dump before verification looks
something like:
```
"cir.scope"() ({
  %0 = "cir.alloca"() <{alignment = 4 : i64, allocaType = !cir.struct<struct "S" {!cir.int<s, 32>} #cir.record.decl.ast>, ast = #cir.var.decl.ast, init, name = "s"}> : () -> !cir.ptr<!cir.struct<struct "S" {!cir.int<s, 32>} #cir.record.decl.ast>>
  "cir.try"() <{catch_types = [#cir.all]}> ({
    "cir.call"(%0) <{callee = @_ZN1SC1Ev, calling_conv = 1 : i32, exception, extra_attrs = #cir<extra({})>, side_effect = 1 : i32}> ({
      "cir.yield"() : () -> ()
    }) : (!cir.ptr<!cir.struct<struct "S" {!cir.int<s, 32>} #cir.record.decl.ast>>) -> ()
    "cir.yield"() : () -> ()
  }, {
    %1 = "cir.catch_param"() : () -> !cir.ptr<!cir.void>
    "cir.call"() <{ast = #cir.call.expr.ast, callee = @_Z3foov, calling_conv = 1 : i32, exception, extra_attrs = #cir<extra({})>, side_effect = 1 : i32}> ({
      "cir.yield"() : () -> ()
      "cir.yield"() : () -> () <--- **DUPLICATE**
    }) : () -> ()
    "cir.yield"() : () -> ()
  }) : () -> ()
  "cir.yield"() : () -> ()
}, {
}) : () -> ()
```
This PR adds a check for an already existing terminator before creating
a YieldOp during the cleanup.
There were problems with the pointer type and element type of the
Address class getting out of sync. In the traditional codegen the
pointer has no type, so it was sufficient for the Address class to
simply track the type it was supposed to be pointing to. Since ClangIR
pointer values are typed, the Address::withType function wasn't really
doing what it was supposed to. It returned an object with the same
pointer that the original object had, but with a mismatched element
type.

This change updates the Address::withType function to perform a bitcast
to get the expected pointer type before creating a new Address object.
It also adds assertions in the Address class to verify that pointer type
and element type are consistent and updates many places that were
causing those assertions to fire.

These code changes cause extra bitcasts to be emitted in a few places.
Regression tests have been updated as needed to reflect the CIR that is
now generated.
@koparasy koparasy marked this pull request as ready for review February 21, 2025 05:33
@bcardosolopes
Copy link
Member

conflict should be resolved

seven-mile added a commit that referenced this pull request Mar 26, 2025
This is a rebased version of the inactive PR #1380.

---------

Co-authored-by: koparasy <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.