Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BOLT]Identify indirect call #123305

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

liusy58
Copy link
Contributor

@liusy58 liusy58 commented Jan 17, 2025

(1)Use CFI directives OpDefCfaOffset 0 to identify indirect tail call like br x*. OpDefCfaOffset 0 implies that the stack frame from the caller to the target function remains unchanged. So if a br x* instruction is preceded by a OpDefCfaOffset 0 directive, we can conclude that it is an indirect tail call, under the assumption that no other instructions have modified the stack pointer (sp) between the OpDefCfaOffset 0 directive and the br x* instruction.

liusy58 and others added 2 commits January 17, 2025 17:09
AArch64 instructions have a fixed size 4 bytes, no need to compute.
@llvmbot
Copy link
Member

llvmbot commented Jan 17, 2025

@llvm/pr-subscribers-bolt

Author: Nicholas (liusy58)

Changes

(1)Use CFI directives OpDefCfaOffset 0 to identify indirect tail call like br x*. OpDefCfaOffset 0 implies that the stack frame from the caller to the target function remains unchanged. So if a br x* instruction is preceded by a OpDefCfaOffset 0 directive, we can conclude that it is an indirect tail call, under the assumption that no other instructions have modified the stack pointer (sp) between the OpDefCfaOffset 0 directive and the br x* instruction. (2)If there are no instructions manipulating the stack within a function, we can conclude that br x* is an indirect tail call.


Full diff: https://github.com/llvm/llvm-project/pull/123305.diff

5 Files Affected:

  • (modified) bolt/include/bolt/Core/MCPlusBuilder.h (+5)
  • (modified) bolt/lib/Core/BinaryFunction.cpp (+53)
  • (modified) bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp (+10)
  • (modified) bolt/lib/Target/X86/X86MCPlusBuilder.cpp (+11)
  • (added) bolt/test/AArch64/indirect-tail-call.s (+31)
diff --git a/bolt/include/bolt/Core/MCPlusBuilder.h b/bolt/include/bolt/Core/MCPlusBuilder.h
index 5d77e6faff2fc6..a38f17b2c81b09 100644
--- a/bolt/include/bolt/Core/MCPlusBuilder.h
+++ b/bolt/include/bolt/Core/MCPlusBuilder.h
@@ -610,6 +610,11 @@ class MCPlusBuilder {
 
   virtual bool isLeave(const MCInst &Inst) const { return false; }
 
+  virtual bool hasUseOrDefofSPOrFP(const MCInst &Inst) const {
+    llvm_unreachable("not implemented");
+    return false;
+  }
+
   virtual bool isADRP(const MCInst &Inst) const {
     llvm_unreachable("not implemented");
     return false;
diff --git a/bolt/lib/Core/BinaryFunction.cpp b/bolt/lib/Core/BinaryFunction.cpp
index 1c5cd62a095b24..f44303b52c7a0b 100644
--- a/bolt/lib/Core/BinaryFunction.cpp
+++ b/bolt/lib/Core/BinaryFunction.cpp
@@ -1961,6 +1961,59 @@ bool BinaryFunction::postProcessIndirectBranches(
       bool IsEpilogue = llvm::any_of(BB, [&](const MCInst &Instr) {
         return BC.MIB->isLeave(Instr) || BC.MIB->isPop(Instr);
       });
+      if (BC.isAArch64()) {
+        // Any adr instruction of aarch64 will generate a new entry,
+        // Adr instruction cannt afford to do any optimizations
+        if (!IsEpilogue && !isMultiEntry()) {
+          BinaryBasicBlock::iterator LastDefCFAOffsetInstIter = BB.end();
+          // find the last OpDefCfaOffset 0 instruction.
+          for (BinaryBasicBlock::iterator Iter = BB.begin(); Iter != BB.end();
+               ++Iter) {
+            if (&*Iter == &Instr) {
+              break;
+            }
+            if (BC.MIB->isCFI(*Iter)) {
+              const MCCFIInstruction *CFIInst =
+                  BB.getParent()->getCFIFor(*Iter);
+              if ((CFIInst->getOperation() ==
+                   MCCFIInstruction::OpDefCfaOffset) &&
+                  (CFIInst->getOffset() == 0)) {
+                LastDefCFAOffsetInstIter = Iter;
+                break;
+              }
+            }
+          }
+          if (LastDefCFAOffsetInstIter != BB.end()) {
+            IsEpilogue = true;
+            // make sure there is no instruction manipulating sp between the two
+            // instructions
+            BinaryBasicBlock::iterator Iter = LastDefCFAOffsetInstIter;
+            while (&*Iter != &Instr) {
+              if (BC.MIB->hasUseOrDefofSPOrFP(*Iter)) {
+                IsEpilogue = false;
+                break;
+              }
+              ++Iter;
+            }
+          }
+        }
+
+        if (!IsEpilogue) {
+          IsEpilogue = true;
+          BinaryFunction *Func = BB.getFunction();
+          for (const BinaryBasicBlock &BinaryBB : *Func) {
+            for (const MCInst &Inst : BinaryBB) {
+              if (BC.MIB->hasUseOrDefofSPOrFP(Inst)) {
+                IsEpilogue = false;
+                break;
+              }
+            }
+            if (!IsEpilogue) {
+              break;
+            }
+          }
+        }
+      }
       if (IsEpilogue) {
         BC.MIB->convertJmpToTailCall(Instr);
         BB.removeAllSuccessors();
diff --git a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
index d752751c17932a..e6e79cd33605fc 100644
--- a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
+++ b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
@@ -1797,6 +1797,16 @@ class AArch64MCPlusBuilder : public MCPlusBuilder {
   getInstructionSize(const MCInst &Inst) const override {
     return 4;
   }
+
+  bool hasUseOrDefofSPOrFP(const MCInst &Inst) const override {
+    if (isPseudo(Inst) || isNoop(Inst) || isCFI(Inst)) {
+      return false;
+    }
+    return hasDefOfPhysReg(Inst, AArch64::SP) ||
+           hasUseOfPhysReg(Inst, AArch64::SP) ||
+           hasDefOfPhysReg(Inst, AArch64::FP) ||
+           hasUseOfPhysReg(Inst, AArch64::FP);
+  }
 };
 
 } // end anonymous namespace
diff --git a/bolt/lib/Target/X86/X86MCPlusBuilder.cpp b/bolt/lib/Target/X86/X86MCPlusBuilder.cpp
index 63086c06d74fd9..ad998d8601e26f 100644
--- a/bolt/lib/Target/X86/X86MCPlusBuilder.cpp
+++ b/bolt/lib/Target/X86/X86MCPlusBuilder.cpp
@@ -89,6 +89,17 @@ class X86MCPlusBuilder : public MCPlusBuilder {
 public:
   using MCPlusBuilder::MCPlusBuilder;
 
+  virtual bool hasUseOrDefofSPOrFP(const MCInst &Inst) const override {
+    bool IsLoad, IsStore, IsStoreFromReg, IsSimple, IsIndexed;
+    MCPhysReg Reg;
+    int32_t SrcImm;
+    uint16_t StackPtrReg;
+    int64_t StackOffset;
+    uint8_t Size;
+    return isStackAccess(Inst, IsLoad, IsStore, IsStoreFromReg, Reg, SrcImm,
+                         StackPtrReg, StackOffset, Size, IsSimple, IsIndexed);
+  }
+
   std::unique_ptr<MCSymbolizer>
   createTargetSymbolizer(BinaryFunction &Function,
                          bool CreateNewSymbols) const override {
diff --git a/bolt/test/AArch64/indirect-tail-call.s b/bolt/test/AArch64/indirect-tail-call.s
new file mode 100644
index 00000000000000..1c820b1ad94bbc
--- /dev/null
+++ b/bolt/test/AArch64/indirect-tail-call.s
@@ -0,0 +1,31 @@
+## This test checks that indirect tail call is properly identified by BOLT on aarch64.
+# REQUIRES: system-linux
+
+# RUN: llvm-mc -filetype=obj -triple aarch64-unknown-unknown %s -o %t.o
+# RUN: %clang %cflags -O0 %t.o -o %t.exe -Wl,-q
+# RUN: llvm-bolt --print-all --print-only=indirect  \
+# RUN: %t.exe -o %t.bolt | FileCheck %s
+
+#CHECK: Binary Function "indirect" after building cfg {
+#CHECK-NOT: # UNKNOWN CONTROL FLOW
+#CHECK: End of Function "indirect"
+	.text
+	.globl	indirect                       
+	.type	indirect,@function
+indirect:      
+    cbz	x0, .LBB0_2                         
+	ldr	x8, [x0]
+	ldr	x1, [x8]
+	br	x1
+.LBB0_2:
+	mov	w0, #3
+	ret
+	.size	indirect, .-indirect
+
+
+	.globl	main                            
+	.type	main,@function
+main:                                   
+	mov	w0, wzr
+	ret
+	.size	main, .-main

@liusy58
Copy link
Contributor Author

liusy58 commented Jan 21, 2025

@yavtuk Hi, could you please help me review this patch?

@yavtuk
Copy link
Contributor

yavtuk commented Jan 21, 2025

@yavtuk Hi, could you please help me review this patch?

I will try, need some time

@yavtuk
Copy link
Contributor

yavtuk commented Jan 24, 2025

@liusy58 it's good idea to use dwarf information to identify indirect tail call 👍.
But in my opinion the postProcessIndirectBranches is not better place where the CFI records are parsed for this functionality. We have debugInfo (preprocessDebugInfo()) function which is called before disassemble stage, here we can add the parsing related to indirect calls and store metadata for later usage. This information is architecture independent and can be used for x86 as well. Inside the postProcessIndirectBranches we can request debug info about a hint for this indirect call. But I am not a owner and it's better to hear more thoughts about it.

Copy link
Member

@paschalis-mpeis paschalis-mpeis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your work Nicholas. Also, it's great that you are very active to BOLT!

I am not sure how valid this is, but is it guaranteed that the indirect branch won't somehow end up to another block of the same function? Because if it does, then it may not be a tail-call.

I presume this is why you require the function not to be a isMultiEntry?
In that case will such block targets indeed be registered as secondary entrypoints, and therefore you safely ignore such edge case?

I have some more comments on code but let's hear on this first. Thanks again!

@liusy58
Copy link
Contributor Author

liusy58 commented Jan 24, 2025

I presume this is why you require the function not to be a isMultiEntry?

yes, you're right. In fact, even the target is in the same function, it is also safe to regard this call as an indirect tail call since the stack remains unchanged.

@paschalis-mpeis
Copy link
Member

Indeed, no stack changes but what about other side-effects?
What if we access a global, do I/O, or some other observable behavior?

BTW I too think that your case (1) should be OK, given a CFI was emitted.
My concern is whether something funny can happen in case (2).
Are we certain that secondary entrypoints will be populated for this case?

@liusy58
Copy link
Contributor Author

liusy58 commented Jan 25, 2025

Thank you for your advice! I will check now!

@liusy58
Copy link
Contributor Author

liusy58 commented Jan 26, 2025

I think we could first accept condition (1) to identify indirect tail calls, especially if there are concerns related to condition (2).

Copy link
Member

@paschalis-mpeis paschalis-mpeis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @liusy58.

Adding a few more comments for case (1) until we get some updates on my previous concern.

Could you also restructure a bit your commit message, including the PR title to note that this is for 'indirect tail calls' specifically, adding also [AArch64] tag?


Marking indirect branches as functions calls (incl. their possible targets) can be an interesting case for the ABI discussions (ARM-software/abi-aa/issues/297).

@liusy58 liusy58 force-pushed the identify_indirect_call branch 3 times, most recently from 3605dc6 to b5f4fbb Compare January 30, 2025 13:41
@liusy58 liusy58 force-pushed the identify_indirect_call branch from b5f4fbb to 711c1d7 Compare January 31, 2025 01:35
@liusy58
Copy link
Contributor Author

liusy58 commented Feb 14, 2025

hi, @paschalis-mpeis , any update?

Copy link
Member

@paschalis-mpeis paschalis-mpeis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @liusy58 ,

Thanks for working on this! I've left few comments below.
Also, can you revise your description here ?

Comment on lines +38 to +41
.cfi_def_cfa_offset 0
mov w3, w1
add w1, w0, w1
br x16
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On your fist iteration you had a variant with no instructions between the CFI and the branch, but it has since been removed. It looked something like this:

...
	.cfi_def_cfa_offset 0
	br	x16
...

Could you add it back as another test (or another function in this test)?

Also do all the current instructions meaningfully contribute to this test? If not, could the assembly be further minimized by removing any instructions that don't add to the testing logic?

Comment on lines +1965 to +1970
// Any ADR instruction of AArch64 will generate a new entry,
// ADR instruction cannot afford to do any optimizations. Because ADR
// computes a PC-relative address within a limited range tied to the
// current program counter, optimizing transformations (like code
// rearrangements) can change address distances and potentially exceed
// ADR’s range.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the comment on ADR still relevant for this patch?

Comment on lines +1973 to +1974
// Find the last OpDefCfaOffset 0 instruction.
for (BinaryBasicBlock::iterator Iter = BB.begin(); Iter != BB.end();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused about your goal here. IIUC you need the last CFI before the branch, right?
Wouldn't this loop retrieve the first CFI of the whole block instead?

If that's the case, wouldn't you need to start from II and go backward until you hit the first CFI?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants