-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BOLT]Identify indirect call #123305
base: main
Are you sure you want to change the base?
[BOLT]Identify indirect call #123305
Conversation
AArch64 instructions have a fixed size 4 bytes, no need to compute.
@llvm/pr-subscribers-bolt Author: Nicholas (liusy58) Changes(1)Use CFI directives OpDefCfaOffset 0 to identify indirect tail call like br x*. OpDefCfaOffset 0 implies that the stack frame from the caller to the target function remains unchanged. So if a br x* instruction is preceded by a OpDefCfaOffset 0 directive, we can conclude that it is an indirect tail call, under the assumption that no other instructions have modified the stack pointer (sp) between the OpDefCfaOffset 0 directive and the br x* instruction. (2)If there are no instructions manipulating the stack within a function, we can conclude that br x* is an indirect tail call. Full diff: https://github.com/llvm/llvm-project/pull/123305.diff 5 Files Affected:
diff --git a/bolt/include/bolt/Core/MCPlusBuilder.h b/bolt/include/bolt/Core/MCPlusBuilder.h
index 5d77e6faff2fc6..a38f17b2c81b09 100644
--- a/bolt/include/bolt/Core/MCPlusBuilder.h
+++ b/bolt/include/bolt/Core/MCPlusBuilder.h
@@ -610,6 +610,11 @@ class MCPlusBuilder {
virtual bool isLeave(const MCInst &Inst) const { return false; }
+ virtual bool hasUseOrDefofSPOrFP(const MCInst &Inst) const {
+ llvm_unreachable("not implemented");
+ return false;
+ }
+
virtual bool isADRP(const MCInst &Inst) const {
llvm_unreachable("not implemented");
return false;
diff --git a/bolt/lib/Core/BinaryFunction.cpp b/bolt/lib/Core/BinaryFunction.cpp
index 1c5cd62a095b24..f44303b52c7a0b 100644
--- a/bolt/lib/Core/BinaryFunction.cpp
+++ b/bolt/lib/Core/BinaryFunction.cpp
@@ -1961,6 +1961,59 @@ bool BinaryFunction::postProcessIndirectBranches(
bool IsEpilogue = llvm::any_of(BB, [&](const MCInst &Instr) {
return BC.MIB->isLeave(Instr) || BC.MIB->isPop(Instr);
});
+ if (BC.isAArch64()) {
+ // Any adr instruction of aarch64 will generate a new entry,
+ // Adr instruction cannt afford to do any optimizations
+ if (!IsEpilogue && !isMultiEntry()) {
+ BinaryBasicBlock::iterator LastDefCFAOffsetInstIter = BB.end();
+ // find the last OpDefCfaOffset 0 instruction.
+ for (BinaryBasicBlock::iterator Iter = BB.begin(); Iter != BB.end();
+ ++Iter) {
+ if (&*Iter == &Instr) {
+ break;
+ }
+ if (BC.MIB->isCFI(*Iter)) {
+ const MCCFIInstruction *CFIInst =
+ BB.getParent()->getCFIFor(*Iter);
+ if ((CFIInst->getOperation() ==
+ MCCFIInstruction::OpDefCfaOffset) &&
+ (CFIInst->getOffset() == 0)) {
+ LastDefCFAOffsetInstIter = Iter;
+ break;
+ }
+ }
+ }
+ if (LastDefCFAOffsetInstIter != BB.end()) {
+ IsEpilogue = true;
+ // make sure there is no instruction manipulating sp between the two
+ // instructions
+ BinaryBasicBlock::iterator Iter = LastDefCFAOffsetInstIter;
+ while (&*Iter != &Instr) {
+ if (BC.MIB->hasUseOrDefofSPOrFP(*Iter)) {
+ IsEpilogue = false;
+ break;
+ }
+ ++Iter;
+ }
+ }
+ }
+
+ if (!IsEpilogue) {
+ IsEpilogue = true;
+ BinaryFunction *Func = BB.getFunction();
+ for (const BinaryBasicBlock &BinaryBB : *Func) {
+ for (const MCInst &Inst : BinaryBB) {
+ if (BC.MIB->hasUseOrDefofSPOrFP(Inst)) {
+ IsEpilogue = false;
+ break;
+ }
+ }
+ if (!IsEpilogue) {
+ break;
+ }
+ }
+ }
+ }
if (IsEpilogue) {
BC.MIB->convertJmpToTailCall(Instr);
BB.removeAllSuccessors();
diff --git a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
index d752751c17932a..e6e79cd33605fc 100644
--- a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
+++ b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
@@ -1797,6 +1797,16 @@ class AArch64MCPlusBuilder : public MCPlusBuilder {
getInstructionSize(const MCInst &Inst) const override {
return 4;
}
+
+ bool hasUseOrDefofSPOrFP(const MCInst &Inst) const override {
+ if (isPseudo(Inst) || isNoop(Inst) || isCFI(Inst)) {
+ return false;
+ }
+ return hasDefOfPhysReg(Inst, AArch64::SP) ||
+ hasUseOfPhysReg(Inst, AArch64::SP) ||
+ hasDefOfPhysReg(Inst, AArch64::FP) ||
+ hasUseOfPhysReg(Inst, AArch64::FP);
+ }
};
} // end anonymous namespace
diff --git a/bolt/lib/Target/X86/X86MCPlusBuilder.cpp b/bolt/lib/Target/X86/X86MCPlusBuilder.cpp
index 63086c06d74fd9..ad998d8601e26f 100644
--- a/bolt/lib/Target/X86/X86MCPlusBuilder.cpp
+++ b/bolt/lib/Target/X86/X86MCPlusBuilder.cpp
@@ -89,6 +89,17 @@ class X86MCPlusBuilder : public MCPlusBuilder {
public:
using MCPlusBuilder::MCPlusBuilder;
+ virtual bool hasUseOrDefofSPOrFP(const MCInst &Inst) const override {
+ bool IsLoad, IsStore, IsStoreFromReg, IsSimple, IsIndexed;
+ MCPhysReg Reg;
+ int32_t SrcImm;
+ uint16_t StackPtrReg;
+ int64_t StackOffset;
+ uint8_t Size;
+ return isStackAccess(Inst, IsLoad, IsStore, IsStoreFromReg, Reg, SrcImm,
+ StackPtrReg, StackOffset, Size, IsSimple, IsIndexed);
+ }
+
std::unique_ptr<MCSymbolizer>
createTargetSymbolizer(BinaryFunction &Function,
bool CreateNewSymbols) const override {
diff --git a/bolt/test/AArch64/indirect-tail-call.s b/bolt/test/AArch64/indirect-tail-call.s
new file mode 100644
index 00000000000000..1c820b1ad94bbc
--- /dev/null
+++ b/bolt/test/AArch64/indirect-tail-call.s
@@ -0,0 +1,31 @@
+## This test checks that indirect tail call is properly identified by BOLT on aarch64.
+# REQUIRES: system-linux
+
+# RUN: llvm-mc -filetype=obj -triple aarch64-unknown-unknown %s -o %t.o
+# RUN: %clang %cflags -O0 %t.o -o %t.exe -Wl,-q
+# RUN: llvm-bolt --print-all --print-only=indirect \
+# RUN: %t.exe -o %t.bolt | FileCheck %s
+
+#CHECK: Binary Function "indirect" after building cfg {
+#CHECK-NOT: # UNKNOWN CONTROL FLOW
+#CHECK: End of Function "indirect"
+ .text
+ .globl indirect
+ .type indirect,@function
+indirect:
+ cbz x0, .LBB0_2
+ ldr x8, [x0]
+ ldr x1, [x8]
+ br x1
+.LBB0_2:
+ mov w0, #3
+ ret
+ .size indirect, .-indirect
+
+
+ .globl main
+ .type main,@function
+main:
+ mov w0, wzr
+ ret
+ .size main, .-main
|
@yavtuk Hi, could you please help me review this patch? |
I will try, need some time |
@liusy58 it's good idea to use dwarf information to identify indirect tail call 👍. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for your work Nicholas. Also, it's great that you are very active to BOLT!
I am not sure how valid this is, but is it guaranteed that the indirect branch won't somehow end up to another block of the same function? Because if it does, then it may not be a tail-call.
I presume this is why you require the function not to be a isMultiEntry
?
In that case will such block targets indeed be registered as secondary entrypoints, and therefore you safely ignore such edge case?
I have some more comments on code but let's hear on this first. Thanks again!
yes, you're right. In fact, even the target is in the same function, it is also safe to regard this call as an indirect tail call since the stack remains unchanged. |
Indeed, no stack changes but what about other side-effects? BTW I too think that your case (1) should be OK, given a CFI was emitted. |
Thank you for your advice! I will check now! |
I think we could first accept condition (1) to identify indirect tail calls, especially if there are concerns related to condition (2). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again @liusy58.
Adding a few more comments for case (1) until we get some updates on my previous concern.
Could you also restructure a bit your commit message, including the PR title to note that this is for 'indirect tail calls' specifically, adding also [AArch64]
tag?
Marking indirect branches as functions calls (incl. their possible targets) can be an interesting case for the ABI discussions (ARM-software/abi-aa/issues/297).
3605dc6
to
b5f4fbb
Compare
b5f4fbb
to
711c1d7
Compare
hi, @paschalis-mpeis , any update? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.cfi_def_cfa_offset 0 | ||
mov w3, w1 | ||
add w1, w0, w1 | ||
br x16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On your fist iteration you had a variant with no instructions between the CFI and the branch, but it has since been removed. It looked something like this:
...
.cfi_def_cfa_offset 0
br x16
...
Could you add it back as another test (or another function in this test)?
Also do all the current instructions meaningfully contribute to this test? If not, could the assembly be further minimized by removing any instructions that don't add to the testing logic?
// Any ADR instruction of AArch64 will generate a new entry, | ||
// ADR instruction cannot afford to do any optimizations. Because ADR | ||
// computes a PC-relative address within a limited range tied to the | ||
// current program counter, optimizing transformations (like code | ||
// rearrangements) can change address distances and potentially exceed | ||
// ADR’s range. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the comment on ADR still relevant for this patch?
// Find the last OpDefCfaOffset 0 instruction. | ||
for (BinaryBasicBlock::iterator Iter = BB.begin(); Iter != BB.end(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused about your goal here. IIUC you need the last CFI before the branch, right?
Wouldn't this loop retrieve the first CFI of the whole block instead?
If that's the case, wouldn't you need to start from II
and go backward until you hit the first CFI?
(1)Use CFI directives OpDefCfaOffset 0 to identify indirect tail call like br x*. OpDefCfaOffset 0 implies that the stack frame from the caller to the target function remains unchanged. So if a br x* instruction is preceded by a OpDefCfaOffset 0 directive, we can conclude that it is an indirect tail call, under the assumption that no other instructions have modified the stack pointer (sp) between the OpDefCfaOffset 0 directive and the br x* instruction.