-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP][CIR][CIRGen][TBAA] Add support for -new-struct-path-tbaa #1403
Open
PikachuHyA
wants to merge
2,301
commits into
llvm:main
Choose a base branch
from
PikachuHyA:support_new_struct_path_tbaa
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The test case is from [clang/test/CodeGen/neon-crypto.c](https://github.com/llvm/clangir/blob/dbf320e5c3db0410566ae561067c595308870bad/clang/test/CodeGen/neon-crypto.c#L28) Need a dedicated test file as RunOptions has crypto related target features.
llvm#1025 explains why we want to move the CIR dialect from the `mlir::cir` to the `cir` namespace. To avoid overloading the `cir` namespace too much afterwards, move all symbols whose equivalents live inside the `clang::CodeGen` namespace to a new `clang::CIRGen` namespace, so that we match the original CodeGen's structure more closely. There's some symbols that live under `clang/include/clang/CIR` whose equivalents live in `clang/lib/CodeGen` and are in the `clang::CodeGen` namespace. We have these symbols in a common location since they're also used by lowering, so I've also left them in the `cir` namespace. Those symbols are: - AArch64ABIKind - ABIArgInfo - FnInfoOpts - TypeEvaluationKind - X86AVXABILevel This is a pretty large PR out of necessity. To make it slightly more reviewable, I've split it out into three commits (which will be squashed together when the PR lands): - The first commit manually switches places to the `clang::CIRGen` namespace. This has to be manual because we only want to move things selectively. - The second commit adjusts namespace prefixes to make builds work. I ran https://gist.github.com/smeenai/f4dd441fb61c53e835c4e6057f8c322f to make this change. The script is idempotent, and I added substitutions one at a time and reviewed each one afterwards (using `git diff --color-words=.`) to ensure only intended changes were being made. - The third commit runs `git clang-format`. Because I went one-by-one with all my substitutions and checked each one afterwards, I'm pretty confident in the validity of all the changes (despite the size of the PR).
As the title says, this PR adds support for unions for AArch64 lowering. The idea is basically the same as the [original](https://github.com/llvm/clangir/blob/dbf320e5c3db0410566ae561067c595308870bad/clang/lib/AST/RecordLayoutBuilder.cpp#L2111) codegen, and I added a couple of tests.
llvm#1025 explains why we want to move the CIR dialect from the `mlir::cir` to the `cir` namespace. This is a large PR, and I've split it out into four commits (that'll be squashed when landing). The first commit changes `mlir::cir` to `cir` everywhere. This was originally done mechanically with: ``` find clang \( -name '*.h' -o -name '*.cpp' -o -name '*.td' \) -print0 | xargs -0 perl -pi -e 's/mlir::cir/cir/g' find clang \( -name '*.h' -o -name '*.cpp' \) -print0 | xargs -0 perl -pi -e 's/::cir/cir/g' find clang \( -name '*.h' -o -name '*.cpp' \) -print0 | xargs -0 perl -0777 -pi -e 's/namespace mlir \{\nnamespace cir \{/namespace cir {/g' find clang \( -name '*.h' -o -name '*.cpp' \) -print0 | xargs -0 perl -0777 -pi -e 's!\} // namespace cir\n\} // namespace mlir!} // namespace cir!g' ``` It then required some manual fixups to address edge cases. Code that lived under `mlir::cir` could refer to the `mlir` namespace without qualification, but after the namespace change, we need to explicitly qualify all our usages. This is done in the second commit via https://gist.github.com/smeenai/996200fd45ad123bbf22b412d59479b6, which is an idempotent script to add all qualifications. I added cases to the script one at a time and reviewed each change afterwards to ensure we were only making the intended modifications, so I feel pretty confident in the end result. I also removed `using namespace llvm` from some headers to avoid conflicts, which in turn required adding some `llvm::` qualifiers as well. The third commit fixes a test, since an error message now contains the mlir namespace. Similar tests in flang also have the namespace in their error messages, so this is an expected change. The fourth change runs `git clang-format`. Unfortunately, that doesn't work for TableGen files, so we'll have a few instances of undesirable formatting left there. I'll look into fixing that as a follow-up. I validated the end result by examining the symbols in the built Clang binary. There's nothing in the `mlir::cir` namespace anymore. https://gist.github.com/smeenai/8438fd01588109fcdbde5c8652781dc0 had the symbols which lived in `cir` and should have moved to `clang::CIRGen`, and I validated that all the symbols were moved, with the exceptions noted in llvm#1082 and the duplicated symbols noted in llvm#1025.
We have both clang/include/clang/CIRFrontendAction/CIRGenAction.h and clang/include/clang/cir/FrontendAction/CIRGenAction.h, which is a historical artifact. The latter is what's being upstreamed, so merge the former into it to avoid any confusion.
The buildX naming convention originated when the CIRGen implementation was planned to be substantially different from original CodeGen. CIRGen is now a much closer adaption of CodeGen, and the emitX to buildX renaming just makes things more confusing, since CodeGen also has some helper functions whose names start with build or Build, so it's not immediately clear which CodeGen function corresponds to a CIRGen buildX function. Rename the buildX functions back to emitX to fix this. This diff was generated mostly mechanically. I searched for all buildX functions in CIRGen and all emitX or buildX functions in CodeGen: ``` rg '\b[Bb]uild[A-Z][A-Za-z0-9_]*\b' clang/lib/CIR/CodeGen -Io | sort -u -o /tmp/buildfuncs rg '\b([Ee]mit|[Bb]uild)[A-Z][A-Za-z0-9_]*\b' clang/lib/CodeGen -Io | sort -u -o /tmp/emitfuncs ``` I used a simple Python script to find corresponding functions: https://gist.github.com/smeenai/02be7ced8564cef5518df72606ec7b19. https://gist.github.com/smeenai/6ffd67be4249c8cebdd7fa99cfa4f13c is the resulting list of correspondences. This isn't 100% accurate because it's not accounting for the files that the functions are present in, but that's pretty unlikely to matter here, so I kept it simple. The buildX functions in CIRGen which correspond to an emitX function in CodeGen should be changed, and the ones which correspond to a BuildX function in CodeGen should not be changed. That leaves some functions without any correspondences, which required a judgement call. I scanned through all those functions, and buildVirtualMethodAttr was the only one that seemed like it shouldn't be changed to emit. I performed the replacement as follows: ``` funcs="$(awk '(/-> [Ee]/ || !/->/) && !/buildVirtualMethodAttr/ { print substr($1, 6) }' /tmp/corrfuncs | paste -sd '|')" find clang/include/clang/CIR clang/lib/CIR/{CodeGen,FrontendAction} \( -name '*.h' -o -name '*.cpp' \) -print0 | \ xargs -0 perl -pi -e "s/\bbuild($funcs)\\b/emit\\1/g" ``` The mechanical changes are in the first commit of this PR. There was a manual fixup required for a token pasting macro in CIRGenExprScalar.cpp, which is the second commit. I then ran `git clang-format`, which is the third commit. (They'll be squashed together when the PR is committed.)
Move LP into CIRGen and give it a handle on the CIRGenModule. A lot of code has been duplicated from CIRGen into cir/Dialect/Transforms in order to let LP live there, but with more necessary CIRGen features (e.g. EH scope and cleanups) going to be used in LP it doesn't make sense to keep it separate. Add this patch that just refactors LoweringPrepare into the CIRGen directory and give it a handle on the CGM.
This PR supports string literals in OpenCL end to end, making it possible to use `printf`. This involves two changes: * In CIRGen, ensure we create the global symbol for string literals with correct `constant` address space. * In LowerToLLVM, make the lowering of `GlobalViewAttr` aware of the upstream address space. Other proper refactors are also applied. Two test cases from OG CodeGen are reused. `str_literals.cl` is the primary test, while `printf.cl` is the bonus one.
This reverts commit 4463352.
This patch adds support for the `__builtin_signbit` intrinsic. The intrinsic requires special handling for PowerPC; however, since ClangIR does not currently support PowerPC, this handling is omitted in this implementation.
The test code is from [OG's clang/test/CodeGen/builtins-memcpy-inline.c](https://github.com/llvm/clangir/blob/5f1afad625f1292ffcf02c36402d292c46213c86/clang/test/CodeGen/builtins-memcpy-inline.c#L7) Also, a little design choice when introducing MemCpyInlineOp, I chose to let it inherit CIR_MemCpyOp, so in future when we optimize MemCpy like Ops, we'd have cleaner and more unified code. However, the cost is that during LLVM lowering I'd have to convert the length from ConstOp into IntegerAttr as that's [what LLVM dialect is expecting](https://mlir.llvm.org/docs/Dialects/LLVM/#llvmintrmemcpyinline-llvmmemcpyinlineop)
Notable change is to introduce helper function `buildBinaryAtomicPost` which models on [OG's `EmitBinaryAtomicPost`](https://github.com/llvm/clangir/blob/dbf320e5c3db0410566ae561067c595308870bad/clang/lib/CodeGen/CGBuiltin.cpp#L340C15-L340C35). Comparing to `EmitBinaryAtomicPost`, `buildBinaryAtomicPost` is more concise as OG's ``EmitBinaryAtomicPost`` duplicates quite a bit of code from [MakeBinaryAtomicValue](https://github.com/llvm/clangir/blob/dbf320e5c3db0410566ae561067c595308870bad/clang/lib/CodeGen/CGBuiltin.cpp#L340) Also, didn't implement invert as __sync_add_and_fetch does not need it, but will add it (which is a trivial work) when we implement a builtin that needs it. Test cases are from [OG](https://github.com/llvm/clangir/blob/dbf320e5c3db0410566ae561067c595308870bad/clang/test/CodeGen/Atomics.c#L134)
…lvm#1095) When emitting initializers for static declarations, it's essential to ensure that the `cir.global` operation aligns its type with that of the initializer. The original approach creates a new global op and copies every attribute from the old one. But just `setSymType` should work well. This also removes missing feature flags there.
…lvm#1096) Following llvm#1009 and llvm#1028, this PR removes the double white spaces in the assembly format of `cir.global` op. It's basically some `mlir-tablegen`-builtin magic: With `constBuilderCall` specified, we can apply `DefaultValuedAttr` with any default value we can construct from constant values. Then we can easily omit the default in assembly. Hence, we don't need to compromise anything for the wrapper attribute `cir::VisibilityAttr`. Similarly to llvm#1009, an empty literal ``` `` ``` is used to omit the leading space emitted by inner attribute. The test case `visibility-attribute.c` is modified to save the intermediate CIR to disk and reflect the effects. Double whitespaces in other test files are removed.
The code changes modify the `cir.if` and `cir.scope` operations to ensure that their code regions are properly terminated. Previously, the if/else and scope regions could be left completely empty which is non-trivially expected in most code inspecting these ops. This led, for example, to a crash when and if clause was left empty in the source code. Now, the child regions must be terminated, either explicitly or implicitly by the default builder and assembly parser. This change improves the clarity and correctness of the code.
This got misaligned after the namespace changes.
…vm#1107) Before the commit, when flattening if/else clauses - the else body came before the "then" body, as opposed to clang's output order. This commit reverses this and hopefully allows easier comparisson between clang's output and cir's.
This patch implements transformations for VAArg in X86_64 ABI **in shape**. `In shape` means it can't work properly due to the dependent X86_64 ABI is not robust. e.g., when we want to use VAArg with `long double`, we need llvm#1087. This patch literally implement https://github.com/llvm/llvm-project/blob/d233fedfb0de882353c348cd1ac57dab619efa6d/clang/lib/CodeGen/Targets/X86.cpp#L3015-L3240 in CIR. There some differences due to the traditional pipeline are converting AST to LLVM and we're transforming CIR to CIR. And also to get the ABI Info, I moved `X86_64ABIInfo` to the header.
…ry (llvm#1111) This PR adds a support for one more case of passing structs by value, with `memcpy` emitted. First of all, don't worry - despite the PR seems big, it's basically consist of helpers + refactoring. Also, there is a minor change in the `CIRBaseBuilder` - I made static the `getBestAllocaInsertPoint` method in order to call it from lowering - we discussed once - and I here we just need it (or copy-paste the code, which doesn't seem good). I will add several comments in order to simplify review.
follow llvm#1033 handle `LongDoubleType` with `FP80Type`.
This PR handles calls with unions passed by value in the calling convention pass. #### Implementation As one may know, data layout for unions in CIR and in LLVM differ one from another. In CIR we track all the union members, while in LLVM IR only the largest one. And here we need to take this difference into account: we need to find a type of the largest member and treat it as the first (and only) union member in order to preserve all the logic from the original codegen. There is a method `StructType::getLargestMember` - but looks like it produces different results (with the one I implemented or better to say copy-pasted). Maybe it's done intentionally, I don't know. The LLVM IR produced has also some difference from the original one. In the original IR `gep` is emitted - and we can not do the same. If we create `getMemberOp` we may fail on type checking for unions - since the first member type may differ from the largest type. This is why we create `bitcast` instead. Relates to the issue llvm#1061
…enBuiltinAArch64.cpp (llvm#1124) We are still seeing crash message like `NYI UNREACHABLE executed at clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp:3304`, which is not convenient for triaging as our code base changes so fast, line number doesn't help much. So, here we replaced most of `llvm_unreachable("NYI")` with more informative message.
Original Clang's implementation: https://github.com/llvm/clangir/blob/2df2022c90e00c0eac542eba4078e79306155c7b/clang/lib/CodeGen/CGBuiltin.cpp#L4131-L4133 https://github.com/llvm/clangir/blob/2df2022c90e00c0eac542eba4078e79306155c7b/clang/lib/CodeGen/CGBuiltin.cpp#L762-L776 Compared with non-elementwise exp https://github.com/llvm/clangir/blob/2df2022c90e00c0eac542eba4078e79306155c7b/clang/lib/CodeGen/CGBuiltin.cpp#L3006-L3016 https://github.com/llvm/clangir/blob/2df2022c90e00c0eac542eba4078e79306155c7b/clang/lib/CodeGen/CGBuiltin.cpp#L669-L685 elementwise version doesn't handle constrained situation. I'm not sure whether it is intended. For renaming, it is to match original clang's implementation closely. Resolves: llvm#1375
Lower vrsrad_n u64 and s64
There were problems with the pointer type and element type of the Address class getting out of sync. In the traditional codegen the pointer has no type, so it was sufficient for the Address class to simply track the type it was supposed to be pointing to. Since ClangIR pointer values are typed, the Address::withType function wasn't really doing what it was supposed to. It returned an object with the same pointer that the original object had, but with a mismatched element type. This change updates the Address::withType function to perform a bitcast to get the expected pointer type before creating a new Address object. It also adds assertions in the Address class to verify that pointer type and element type are consistent and updates many places that were causing those assertions to fire. These code changes cause extra bitcasts to be emitted in a few places. Regression tests have been updated as needed to reflect the CIR that is now generated.
Added a skeleton of NVPTX target lowering info. This enables lowering of `simple.cu` (as it hardly tests device side functionalities), so a test of LLVM IR is also added onto it.
This is a preparation of generating registration functions in LoweringPrepare. CUDA compilation works as follows (irrelevant arguments omitted): ```sh # First compile for device, generating PTX assembly clang++ test.cu -fcuda-is-device -o device.s # Convert that into a binary file ptxas device.s --output-file device.o fatbin --create device.fatbin --image=profile=sm_52,file=device.o # Pass that file as an argument to host clang++ test.cu -fcuda-include-gpubinary device.fatbin -cuid="some unique id" ``` And from the name of GPU binary, we can obtain a handle for registration. So we add an attribute to ModuleOp, recording that name. If that `-fcuda-include-gpubinary` is not specified (like in the test `simple.cu`), OG will not generate any registration function. We do the same here by not generating the attribute.
…m#1383) Lower neon_vrshrd_n for s64 and u64
…lvm#1385) Lower builtin_neon_vqshld_n s64 and u64
This PR adds support for LLVM lowering of pointers to member functions. The lowering is ABI-specific and this patch only considers Itanium ABI. Itanium ABI lowers pointers to member functions to a struct with two fields of type `ptrdiff_t`. To extract fields from such aggregate values, this PR includes a new operation `cir.extract_member` to accomplish this.
This PR adds CIRGen and LLVM lowering support for the `__builtin_bitreverse` family of builtin functions.
Lower neon_vrnd32x
This deals with some x86 aggregate types for CallConvLowering pass. Suppose we have a simple struct like this. ```cpp struct dim3 { int x, y, z; }; ``` It can be coerced into ```cpp struct dim3_ { uint64_t xy; int z; }; ``` And for a function that receives it as an argument, OG does the following transformation for x86: ```cpp void f(dim3 arg) { /* Before */ } void f(uint64_t xy, int z) { /* After */ } ``` Now this transformation is implemented in the CallConvLowering pass of CIR.
I checked https://github.com/llvm/clangir/blob/main/clang/test/CIR/CodeGen/globals.cpp and thought code works as expected. Although, test results need to be adjusted a bit. Resolves: llvm#1252
Lower vrnd32z and vrnd32zq
This PR adds support for returns inside of a TryOp, for example: ``` void foo() { int r = 1; try { return; ++r; } catch (...) { } } ``` Currently, it fails during the CodeGen with: ``` error: 'cir.return' op expects parent op to be one of 'cir.func, cir.scope, cir.if, cir.switch, cir.do, cir.while, cir.for, cir.case' ``` were TryOp's omitted on purpose?
Change the assembly format for `cir::FuncType` from ``` !cir.func<!returnType (!argType)> ``` to ``` !cir.func<(!argType) -> !returnType> ``` This change (1) is easier to parse because it doesn't require lookahead, (2) is consistent with the format of ClangIR `FuncOp` assembly, and (3) is consistent with function types in other MLIR dialects. Change all the tests to use or to expect the new format for function types. The contents and the semantics of `cir::FuncType` are unchanged. Only the assembly format is being changed. Functions that return `void` in C or C++ are still represented in MLIR as having no return type. Most of the changes are in `parseFuncType` and `printFuncType` and the functions they call in `CIRTypes.cpp`. A `FuncType::verify` function was added to check that an explicit return type is not `cir::VoidType`. `FuncType::isVoid()` was renamed to `FuncType::hasVoidReturn()` Some comments for `FuncType` were improved. An `llvm_unreachable` was added to `StructType::getKindAsStr` to suppress a compiler warning and to catch a memory error that corrupts the `RecordKind` field. (This was a drive-by fix and has nothing to do with the rest of this change.)
Includes function name while mangling C functions to avoid link errors. [This is the same way OG handles it](https://github.com/llvm/clangir/blob/c301b4a0d3d2d79b26c9c809c11b8a1137c0a9ec/clang/lib/CodeGen/CodeGenModule.cpp#L1896).
Commit the .clang-tidy files for ClangIR. The biggest different between these and the Clang files is the capitalization of variable names. Most ClangIR code follows the MLIR conventions instead of the Clang conventions.
Before reviewing this patch, please first review the upstream patch at llvm/llvm-project#119698. |
Nice, thanks for adding support for this in both LLVM IR dialect and ClangIR. You got good reviewers over there, so I'm gonna wait for it to land before taking a look at this, let me know if anything shows up that needs my attention! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
see also https://reviews.llvm.org/D41394