-
Notifications
You must be signed in to change notification settings - Fork 14.7k
[X86] Align f128 and i128 to 16 bytes when passing on x86-32 #138092
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I'm not sure why this seems to have no effect; the tests at llvm/test/CodeGen/X86/fp128-abi.ll should fail since they need to be updated, but there is no change in codegen. I wonder if this is getting fixed up somewhere. |
fp128 is not a legal type, so this won't reach here (this is a super frustrating detail of how calling convention handling is implemented, it would be easier if the backend directly consumed the original type) |
Should it be made a legal type? I’m not sure what the exact definition of that is, but it is in the i386 ABI as of recent versions to be returned and (presumably, since I haven’t seen otherwise) passed in memory. |
No. There are no f128 registers or operations |
How does stack alignment get handled for non-legal types, or where does this need to get fixed up? I’m surprised passing doesn’t use the type’s natural alignment. |
The type is broken up into legal pieces according to getRegisterTypeForCallingConv + getNumRegistersForCallingConv, and those pieces are processed by the call lowering. This code is entirely disconnected from the decision of where to pass the value, in a register or at some stack offset |
a3b1041
to
3b2a385
Compare
✅ With the latest revision this PR passed the C/C++ code formatter. |
3b2a385
to
ee3b81f
Compare
@llvm/pr-subscribers-backend-x86 Author: Trevor Gross (tgross35) ChangesThe i386 psABI specifies that i386 does not specify an Fixes: #77401 Patch is 551.12 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/138092.diff 52 Files Affected:
diff --git a/llvm/docs/ReleaseNotes.md b/llvm/docs/ReleaseNotes.md
index daf822388a2ff..e91460d3a551c 100644
--- a/llvm/docs/ReleaseNotes.md
+++ b/llvm/docs/ReleaseNotes.md
@@ -228,6 +228,8 @@ Changes to the X86 Backend
--------------------------
* `fp128` will now use `*f128` libcalls on 32-bit GNU targets as well.
+* On x86-32, `fp128` and `i128` are now passed with the expected 16-byte stack
+ alignment.
Changes to the OCaml bindings
-----------------------------
diff --git a/llvm/lib/Target/X86/X86CallingConv.cpp b/llvm/lib/Target/X86/X86CallingConv.cpp
index 0b4c63f7a81f7..eb39259f7166b 100644
--- a/llvm/lib/Target/X86/X86CallingConv.cpp
+++ b/llvm/lib/Target/X86/X86CallingConv.cpp
@@ -374,5 +374,37 @@ static bool CC_X86_64_I128(unsigned &ValNo, MVT &ValVT, MVT &LocVT,
return true;
}
+/// Special handling for i128 and fp128: on x86-32, i128 and fp128 get legalized
+/// as four i32s, but fp128 must be passed on the stack with 16-byte alignment.
+/// Technically only fp128 has a specified ABI, but it makes sense to handle
+/// i128 the same until we hear differently.
+static bool CC_X86_32_I128_FP128(unsigned &ValNo, MVT &ValVT, MVT &LocVT,
+ CCValAssign::LocInfo &LocInfo,
+ ISD::ArgFlagsTy &ArgFlags, CCState &State) {
+ assert(ValVT == MVT::i32 && "Should have i32 parts");
+ SmallVectorImpl<CCValAssign> &PendingMembers = State.getPendingLocs();
+ PendingMembers.push_back(
+ CCValAssign::getPending(ValNo, ValVT, LocVT, LocInfo));
+
+ if (!ArgFlags.isInConsecutiveRegsLast())
+ return true;
+
+ unsigned NumRegs = PendingMembers.size();
+ assert(NumRegs == 4 && "Should have two parts");
+
+ int64_t Offset = State.AllocateStack(16, Align(16));
+ PendingMembers[0].convertToMem(Offset);
+ PendingMembers[1].convertToMem(Offset + 4);
+ PendingMembers[2].convertToMem(Offset + 8);
+ PendingMembers[3].convertToMem(Offset + 12);
+
+ State.addLoc(PendingMembers[0]);
+ State.addLoc(PendingMembers[1]);
+ State.addLoc(PendingMembers[2]);
+ State.addLoc(PendingMembers[3]);
+ PendingMembers.clear();
+ return true;
+}
+
// Provides entry points of CC_X86 and RetCC_X86.
#include "X86GenCallingConv.inc"
diff --git a/llvm/lib/Target/X86/X86CallingConv.td b/llvm/lib/Target/X86/X86CallingConv.td
index 823e0caa02262..f020e0b55141c 100644
--- a/llvm/lib/Target/X86/X86CallingConv.td
+++ b/llvm/lib/Target/X86/X86CallingConv.td
@@ -859,6 +859,11 @@ def CC_X86_32_C : CallingConv<[
// The 'nest' parameter, if any, is passed in ECX.
CCIfNest<CCAssignToReg<[ECX]>>,
+ // i128 and fp128 need to be passed on the stack with a higher alignment than
+ // their legal types. Handle this with a custom function.
+ CCIfType<[i32],
+ CCIfConsecutiveRegs<CCCustom<"CC_X86_32_I128_FP128">>>,
+
// On swifttailcc pass swiftself in ECX.
CCIfCC<"CallingConv::SwiftTail",
CCIfSwiftSelf<CCIfType<[i32], CCAssignToReg<[ECX]>>>>,
diff --git a/llvm/lib/Target/X86/X86ISelLoweringCall.cpp b/llvm/lib/Target/X86/X86ISelLoweringCall.cpp
index 9ad355311527b..b4639ac2577e8 100644
--- a/llvm/lib/Target/X86/X86ISelLoweringCall.cpp
+++ b/llvm/lib/Target/X86/X86ISelLoweringCall.cpp
@@ -237,9 +237,18 @@ EVT X86TargetLowering::getSetCCResultType(const DataLayout &DL,
bool X86TargetLowering::functionArgumentNeedsConsecutiveRegisters(
Type *Ty, CallingConv::ID CallConv, bool isVarArg,
const DataLayout &DL) const {
- // i128 split into i64 needs to be allocated to two consecutive registers,
- // or spilled to the stack as a whole.
- return Ty->isIntegerTy(128);
+ // On x86-64 i128 is split into two i64s and needs to be allocated to two
+ // consecutive registers, or spilled to the stack as a whole. On x86-32 i128
+ // is split to four i32s and never actually passed in registers, but we use
+ // the consecutive register mark to match it in TableGen.
+ if (Ty->isIntegerTy(128))
+ return true;
+
+ // On x86-32, fp128 acts the same as i128.
+ if (Subtarget.is32Bit() && Ty->isFP128Ty())
+ return true;
+
+ return false;
}
/// Helper for getByValTypeAlignment to determine
diff --git a/llvm/test/CodeGen/X86/abds-neg.ll b/llvm/test/CodeGen/X86/abds-neg.ll
index f6d66ab47ce05..2911edfbfd409 100644
--- a/llvm/test/CodeGen/X86/abds-neg.ll
+++ b/llvm/test/CodeGen/X86/abds-neg.ll
@@ -367,44 +367,49 @@ define i128 @abd_ext_i128(i128 %a, i128 %b) nounwind {
; X86-LABEL: abd_ext_i128:
; X86: # %bb.0:
; X86-NEXT: pushl %ebp
+; X86-NEXT: movl %esp, %ebp
; X86-NEXT: pushl %ebx
; X86-NEXT: pushl %edi
; X86-NEXT: pushl %esi
-; X86-NEXT: pushl %eax
-; X86-NEXT: movl {{[0-9]+}}(%esp), %edi
-; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
-; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
-; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT: movl {{[0-9]+}}(%esp), %ebp
-; X86-NEXT: subl %ecx, %eax
-; X86-NEXT: movl %eax, (%esp) # 4-byte Spill
-; X86-NEXT: sbbl %edx, %ebp
-; X86-NEXT: movl {{[0-9]+}}(%esp), %ebx
-; X86-NEXT: sbbl %edi, %ebx
-; X86-NEXT: movl {{[0-9]+}}(%esp), %esi
-; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT: sbbl %esi, %eax
-; X86-NEXT: subl {{[0-9]+}}(%esp), %ecx
-; X86-NEXT: sbbl {{[0-9]+}}(%esp), %edx
-; X86-NEXT: sbbl {{[0-9]+}}(%esp), %edi
-; X86-NEXT: sbbl {{[0-9]+}}(%esp), %esi
-; X86-NEXT: cmovll %eax, %esi
-; X86-NEXT: cmovll %ebx, %edi
-; X86-NEXT: cmovll %ebp, %edx
-; X86-NEXT: cmovll (%esp), %ecx # 4-byte Folded Reload
-; X86-NEXT: xorl %ebx, %ebx
+; X86-NEXT: andl $-16, %esp
+; X86-NEXT: subl $16, %esp
+; X86-NEXT: movl 40(%ebp), %ecx
+; X86-NEXT: movl 44(%ebp), %eax
+; X86-NEXT: movl 24(%ebp), %edx
+; X86-NEXT: movl 28(%ebp), %esi
+; X86-NEXT: subl %ecx, %edx
+; X86-NEXT: movl %edx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; X86-NEXT: movl %esi, %edx
+; X86-NEXT: sbbl %eax, %edx
+; X86-NEXT: movl %edx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; X86-NEXT: movl 48(%ebp), %edx
+; X86-NEXT: movl 32(%ebp), %ebx
+; X86-NEXT: sbbl %edx, %ebx
+; X86-NEXT: movl 52(%ebp), %esi
+; X86-NEXT: movl 36(%ebp), %edi
+; X86-NEXT: sbbl %esi, %edi
+; X86-NEXT: subl 24(%ebp), %ecx
+; X86-NEXT: sbbl 28(%ebp), %eax
+; X86-NEXT: sbbl 32(%ebp), %edx
+; X86-NEXT: sbbl 36(%ebp), %esi
+; X86-NEXT: cmovll %edi, %esi
+; X86-NEXT: cmovll %ebx, %edx
+; X86-NEXT: cmovll {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Folded Reload
+; X86-NEXT: cmovll {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Folded Reload
+; X86-NEXT: xorl %edi, %edi
; X86-NEXT: negl %ecx
-; X86-NEXT: movl $0, %ebp
-; X86-NEXT: sbbl %edx, %ebp
-; X86-NEXT: movl $0, %edx
-; X86-NEXT: sbbl %edi, %edx
-; X86-NEXT: sbbl %esi, %ebx
-; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT: movl %ecx, (%eax)
-; X86-NEXT: movl %ebp, 4(%eax)
-; X86-NEXT: movl %edx, 8(%eax)
-; X86-NEXT: movl %ebx, 12(%eax)
-; X86-NEXT: addl $4, %esp
+; X86-NEXT: movl $0, %ebx
+; X86-NEXT: sbbl %eax, %ebx
+; X86-NEXT: movl $0, %eax
+; X86-NEXT: sbbl %edx, %eax
+; X86-NEXT: sbbl %esi, %edi
+; X86-NEXT: movl 8(%ebp), %edx
+; X86-NEXT: movl %ecx, (%edx)
+; X86-NEXT: movl %ebx, 4(%edx)
+; X86-NEXT: movl %eax, 8(%edx)
+; X86-NEXT: movl %edi, 12(%edx)
+; X86-NEXT: movl %edx, %eax
+; X86-NEXT: leal -12(%ebp), %esp
; X86-NEXT: popl %esi
; X86-NEXT: popl %edi
; X86-NEXT: popl %ebx
@@ -438,44 +443,49 @@ define i128 @abd_ext_i128_undef(i128 %a, i128 %b) nounwind {
; X86-LABEL: abd_ext_i128_undef:
; X86: # %bb.0:
; X86-NEXT: pushl %ebp
+; X86-NEXT: movl %esp, %ebp
; X86-NEXT: pushl %ebx
; X86-NEXT: pushl %edi
; X86-NEXT: pushl %esi
-; X86-NEXT: pushl %eax
-; X86-NEXT: movl {{[0-9]+}}(%esp), %edi
-; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
-; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
-; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT: movl {{[0-9]+}}(%esp), %ebp
-; X86-NEXT: subl %ecx, %eax
-; X86-NEXT: movl %eax, (%esp) # 4-byte Spill
-; X86-NEXT: sbbl %edx, %ebp
-; X86-NEXT: movl {{[0-9]+}}(%esp), %ebx
-; X86-NEXT: sbbl %edi, %ebx
-; X86-NEXT: movl {{[0-9]+}}(%esp), %esi
-; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT: sbbl %esi, %eax
-; X86-NEXT: subl {{[0-9]+}}(%esp), %ecx
-; X86-NEXT: sbbl {{[0-9]+}}(%esp), %edx
-; X86-NEXT: sbbl {{[0-9]+}}(%esp), %edi
-; X86-NEXT: sbbl {{[0-9]+}}(%esp), %esi
-; X86-NEXT: cmovll %eax, %esi
-; X86-NEXT: cmovll %ebx, %edi
-; X86-NEXT: cmovll %ebp, %edx
-; X86-NEXT: cmovll (%esp), %ecx # 4-byte Folded Reload
-; X86-NEXT: xorl %ebx, %ebx
+; X86-NEXT: andl $-16, %esp
+; X86-NEXT: subl $16, %esp
+; X86-NEXT: movl 40(%ebp), %ecx
+; X86-NEXT: movl 44(%ebp), %eax
+; X86-NEXT: movl 24(%ebp), %edx
+; X86-NEXT: movl 28(%ebp), %esi
+; X86-NEXT: subl %ecx, %edx
+; X86-NEXT: movl %edx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; X86-NEXT: movl %esi, %edx
+; X86-NEXT: sbbl %eax, %edx
+; X86-NEXT: movl %edx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; X86-NEXT: movl 48(%ebp), %edx
+; X86-NEXT: movl 32(%ebp), %ebx
+; X86-NEXT: sbbl %edx, %ebx
+; X86-NEXT: movl 52(%ebp), %esi
+; X86-NEXT: movl 36(%ebp), %edi
+; X86-NEXT: sbbl %esi, %edi
+; X86-NEXT: subl 24(%ebp), %ecx
+; X86-NEXT: sbbl 28(%ebp), %eax
+; X86-NEXT: sbbl 32(%ebp), %edx
+; X86-NEXT: sbbl 36(%ebp), %esi
+; X86-NEXT: cmovll %edi, %esi
+; X86-NEXT: cmovll %ebx, %edx
+; X86-NEXT: cmovll {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Folded Reload
+; X86-NEXT: cmovll {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Folded Reload
+; X86-NEXT: xorl %edi, %edi
; X86-NEXT: negl %ecx
-; X86-NEXT: movl $0, %ebp
-; X86-NEXT: sbbl %edx, %ebp
-; X86-NEXT: movl $0, %edx
-; X86-NEXT: sbbl %edi, %edx
-; X86-NEXT: sbbl %esi, %ebx
-; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT: movl %ecx, (%eax)
-; X86-NEXT: movl %ebp, 4(%eax)
-; X86-NEXT: movl %edx, 8(%eax)
-; X86-NEXT: movl %ebx, 12(%eax)
-; X86-NEXT: addl $4, %esp
+; X86-NEXT: movl $0, %ebx
+; X86-NEXT: sbbl %eax, %ebx
+; X86-NEXT: movl $0, %eax
+; X86-NEXT: sbbl %edx, %eax
+; X86-NEXT: sbbl %esi, %edi
+; X86-NEXT: movl 8(%ebp), %edx
+; X86-NEXT: movl %ecx, (%edx)
+; X86-NEXT: movl %ebx, 4(%edx)
+; X86-NEXT: movl %eax, 8(%edx)
+; X86-NEXT: movl %edi, 12(%edx)
+; X86-NEXT: movl %edx, %eax
+; X86-NEXT: leal -12(%ebp), %esp
; X86-NEXT: popl %esi
; X86-NEXT: popl %edi
; X86-NEXT: popl %ebx
@@ -639,55 +649,59 @@ define i128 @abd_minmax_i128(i128 %a, i128 %b) nounwind {
; X86-LABEL: abd_minmax_i128:
; X86: # %bb.0:
; X86-NEXT: pushl %ebp
+; X86-NEXT: movl %esp, %ebp
; X86-NEXT: pushl %ebx
; X86-NEXT: pushl %edi
; X86-NEXT: pushl %esi
-; X86-NEXT: pushl %eax
-; X86-NEXT: movl {{[0-9]+}}(%esp), %ebp
-; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
-; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT: movl {{[0-9]+}}(%esp), %ebx
-; X86-NEXT: movl {{[0-9]+}}(%esp), %esi
-; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
-; X86-NEXT: cmpl %eax, %esi
-; X86-NEXT: sbbl %ebx, %ecx
-; X86-NEXT: movl %edx, %ecx
-; X86-NEXT: sbbl %ebp, %ecx
-; X86-NEXT: movl {{[0-9]+}}(%esp), %edi
-; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
-; X86-NEXT: movl %edx, %ecx
-; X86-NEXT: sbbl %edi, %ecx
-; X86-NEXT: movl %edi, %ecx
-; X86-NEXT: cmovll %edx, %ecx
-; X86-NEXT: movl %ecx, (%esp) # 4-byte Spill
-; X86-NEXT: cmovll {{[0-9]+}}(%esp), %ebp
-; X86-NEXT: movl %ebx, %ecx
-; X86-NEXT: cmovll {{[0-9]+}}(%esp), %ecx
-; X86-NEXT: movl %eax, %edx
-; X86-NEXT: cmovll %esi, %edx
-; X86-NEXT: cmpl %esi, %eax
-; X86-NEXT: movl %ebx, %esi
-; X86-NEXT: sbbl {{[0-9]+}}(%esp), %esi
-; X86-NEXT: movl {{[0-9]+}}(%esp), %esi
-; X86-NEXT: sbbl {{[0-9]+}}(%esp), %esi
-; X86-NEXT: movl %edi, %esi
-; X86-NEXT: sbbl {{[0-9]+}}(%esp), %esi
-; X86-NEXT: cmovll {{[0-9]+}}(%esp), %edi
-; X86-NEXT: movl {{[0-9]+}}(%esp), %esi
-; X86-NEXT: cmovll {{[0-9]+}}(%esp), %esi
-; X86-NEXT: cmovll {{[0-9]+}}(%esp), %ebx
-; X86-NEXT: cmovll {{[0-9]+}}(%esp), %eax
-; X86-NEXT: subl %eax, %edx
-; X86-NEXT: sbbl %ebx, %ecx
-; X86-NEXT: sbbl %esi, %ebp
-; X86-NEXT: movl (%esp), %esi # 4-byte Reload
-; X86-NEXT: sbbl %edi, %esi
-; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT: movl %edx, (%eax)
-; X86-NEXT: movl %ecx, 4(%eax)
-; X86-NEXT: movl %ebp, 8(%eax)
-; X86-NEXT: movl %esi, 12(%eax)
-; X86-NEXT: addl $4, %esp
+; X86-NEXT: andl $-16, %esp
+; X86-NEXT: subl $16, %esp
+; X86-NEXT: movl 40(%ebp), %esi
+; X86-NEXT: movl 24(%ebp), %edi
+; X86-NEXT: movl 28(%ebp), %eax
+; X86-NEXT: cmpl %esi, %edi
+; X86-NEXT: sbbl 44(%ebp), %eax
+; X86-NEXT: movl 48(%ebp), %edx
+; X86-NEXT: movl 32(%ebp), %eax
+; X86-NEXT: sbbl %edx, %eax
+; X86-NEXT: movl 52(%ebp), %ebx
+; X86-NEXT: movl 36(%ebp), %ecx
+; X86-NEXT: movl %ecx, %eax
+; X86-NEXT: sbbl %ebx, %eax
+; X86-NEXT: movl %ebx, %eax
+; X86-NEXT: cmovll %ecx, %eax
+; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; X86-NEXT: movl %edx, %eax
+; X86-NEXT: cmovll 32(%ebp), %eax
+; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; X86-NEXT: movl 44(%ebp), %eax
+; X86-NEXT: cmovll 28(%ebp), %eax
+; X86-NEXT: movl %esi, %ecx
+; X86-NEXT: cmovll %edi, %ecx
+; X86-NEXT: cmpl %edi, %esi
+; X86-NEXT: movl 44(%ebp), %edi
+; X86-NEXT: sbbl 28(%ebp), %edi
+; X86-NEXT: movl %edx, %edi
+; X86-NEXT: sbbl 32(%ebp), %edi
+; X86-NEXT: movl %ebx, %edi
+; X86-NEXT: sbbl 36(%ebp), %edi
+; X86-NEXT: cmovll 36(%ebp), %ebx
+; X86-NEXT: cmovll 32(%ebp), %edx
+; X86-NEXT: movl 44(%ebp), %edi
+; X86-NEXT: cmovll 28(%ebp), %edi
+; X86-NEXT: cmovll 24(%ebp), %esi
+; X86-NEXT: subl %esi, %ecx
+; X86-NEXT: sbbl %edi, %eax
+; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %edi # 4-byte Reload
+; X86-NEXT: sbbl %edx, %edi
+; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %esi # 4-byte Reload
+; X86-NEXT: sbbl %ebx, %esi
+; X86-NEXT: movl 8(%ebp), %edx
+; X86-NEXT: movl %ecx, (%edx)
+; X86-NEXT: movl %eax, 4(%edx)
+; X86-NEXT: movl %edi, 8(%edx)
+; X86-NEXT: movl %esi, 12(%edx)
+; X86-NEXT: movl %edx, %eax
+; X86-NEXT: leal -12(%ebp), %esp
; X86-NEXT: popl %esi
; X86-NEXT: popl %edi
; X86-NEXT: popl %ebx
@@ -848,37 +862,41 @@ define i128 @abd_cmp_i128(i128 %a, i128 %b) nounwind {
; X86-LABEL: abd_cmp_i128:
; X86: # %bb.0:
; X86-NEXT: pushl %ebp
+; X86-NEXT: movl %esp, %ebp
; X86-NEXT: pushl %ebx
; X86-NEXT: pushl %edi
; X86-NEXT: pushl %esi
-; X86-NEXT: pushl %eax
-; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
-; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
-; X86-NEXT: movl {{[0-9]+}}(%esp), %esi
-; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT: movl {{[0-9]+}}(%esp), %ebx
-; X86-NEXT: subl %edx, %eax
-; X86-NEXT: movl %eax, (%esp) # 4-byte Spill
-; X86-NEXT: sbbl %esi, %ebx
-; X86-NEXT: movl {{[0-9]+}}(%esp), %ebp
-; X86-NEXT: sbbl %ecx, %ebp
-; X86-NEXT: movl {{[0-9]+}}(%esp), %edi
-; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT: sbbl %edi, %eax
-; X86-NEXT: subl {{[0-9]+}}(%esp), %edx
-; X86-NEXT: sbbl {{[0-9]+}}(%esp), %esi
-; X86-NEXT: sbbl {{[0-9]+}}(%esp), %ecx
-; X86-NEXT: sbbl {{[0-9]+}}(%esp), %edi
-; X86-NEXT: cmovgel (%esp), %edx # 4-byte Folded Reload
-; X86-NEXT: cmovgel %ebx, %esi
-; X86-NEXT: cmovgel %ebp, %ecx
-; X86-NEXT: cmovgel %eax, %edi
-; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT: movl %edi, 12(%eax)
-; X86-NEXT: movl %ecx, 8(%eax)
-; X86-NEXT: movl %esi, 4(%eax)
-; X86-NEXT: movl %edx, (%eax)
-; X86-NEXT: addl $4, %esp
+; X86-NEXT: andl $-16, %esp
+; X86-NEXT: subl $16, %esp
+; X86-NEXT: movl 24(%ebp), %ecx
+; X86-NEXT: movl 28(%ebp), %edx
+; X86-NEXT: movl 40(%ebp), %eax
+; X86-NEXT: movl 44(%ebp), %esi
+; X86-NEXT: subl %ecx, %eax
+; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; X86-NEXT: movl %esi, %eax
+; X86-NEXT: sbbl %edx, %eax
+; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; X86-NEXT: movl 32(%ebp), %esi
+; X86-NEXT: movl 48(%ebp), %edi
+; X86-NEXT: sbbl %esi, %edi
+; X86-NEXT: movl 36(%ebp), %ebx
+; X86-NEXT: movl 52(%ebp), %eax
+; X86-NEXT: sbbl %ebx, %eax
+; X86-NEXT: subl 40(%ebp), %ecx
+; X86-NEXT: sbbl 44(%ebp), %edx
+; X86-NEXT: sbbl 48(%ebp), %esi
+; X86-NEXT: sbbl 52(%ebp), %ebx
+; X86-NEXT: cmovgel {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Folded Reload
+; X86-NEXT: cmovgel {{[-0-9]+}}(%e{{[sb]}}p), %edx # 4-byte Folded Reload
+; X86-NEXT: cmovgel %edi, %esi
+; X86-NEXT: cmovgel %eax, %ebx
+; X86-NEXT: movl 8(%ebp), %eax
+; X86-NEXT: movl %ebx, 12(%eax)
+; X86-NEXT: movl %esi, 8(%eax)
+; X86-NEXT: movl %edx, 4(%eax)
+; X86-NEXT: movl %ecx, (%eax)
+; X86-NEXT: leal -12(%ebp), %esp
; X86-NEXT: popl %esi
; X86-NEXT: popl %edi
; X86-NEXT: popl %ebx
@@ -1118,35 +1136,39 @@ define i128 @abd_subnsw_i128(i128 %a, i128 %b) nounwind {
; X86-LABEL: abd_subnsw_i128:
; X86: # %bb.0:
; X86-NEXT: pushl %ebp
+; X86-NEXT: movl %esp, %ebp
; X86-NEXT: pushl %ebx
; X86-NEXT: pushl %edi
; X86-NEXT: pushl %esi
-; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
-; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
-; X86-NEXT: movl {{[0-9]+}}(%esp), %esi
-; X86-NEXT: movl {{[0-9]+}}(%esp), %edi
-; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT: subl {{[0-9]+}}(%esp), %edi
-; X86-NEXT: sbbl {{[0-9]+}}(%esp), %esi
-; X86-NEXT: sbbl {{[0-9]+}}(%esp), %edx
-; X86-NEXT: sbbl {{[0-9]+}}(%esp), %ecx
-; X86-NEXT: movl %ecx, %ebx
-; X86-NEXT: sarl $31, %ebx
-; X86-NEXT: xorl %ebx, %ecx
-; X86-NEXT: xorl %ebx, %edx
-; X86-NEXT: xorl %ebx, %esi
-; X86-NEXT: xorl %ebx, %edi
-; X86-NEXT: movl %ebx, %ebp
-; X86-NEXT: subl %edi, %ebp
-; X86-NEXT: movl %ebx, %edi
-; X86-NEXT: sbbl %esi, %edi
-; X86-NEXT: movl %ebx, %esi
+; X86-NEXT: andl $-16, %esp
+; X86-NEXT: subl $16, %esp
+; X86-NEXT: movl 36(%ebp), %eax
+; X86-NEXT: movl 32(%ebp), %ecx
+; X86-NEXT: movl 28(%ebp), %edx
+; X86-NEXT: movl 24(%ebp), %esi
+; X86-NEXT: subl 40(%ebp), %esi
+; X86-NEXT: sbbl 44(%ebp), %edx
+; X86-NEXT: sbbl 48(%ebp), %ecx
+; X86-NEXT: sbbl 52(%ebp), %eax
+; X86-NEXT: movl %eax, %edi
+; X86-NEXT: sarl $31, %edi
+; X86-NEXT: xorl %edi, %eax
+; X86-NEXT: xorl %edi, %ecx
+; X86-NEXT: xorl %edi, %edx
+; X86-NEXT: xorl %edi, %esi
+; X86-NEXT: movl %edi, %ebx
+; X86-NEXT: subl %esi, %ebx
+; X86-NEXT: movl %edi, %esi
; X86-NEXT: sbbl %edx, %esi
-; X86-NEXT: sbbl %ecx, %ebx
-; X86-NEXT: movl %ebp, (%eax)
-; X86-NEXT: movl %edi, 4(%eax)
-; X86-NEXT: movl %esi, 8(%eax)
-; X86-NEXT: movl %ebx, 12(%eax)
+; X86-NEXT: movl %edi, %edx
+; X86-NEXT: sbbl %ecx, %edx
+; X86-NEXT: sbbl %eax, %edi
+; X86-NEXT: movl 8(%ebp), %eax
+; X86-NEXT: movl %ebx, (%eax)
+; X86-NEXT: movl %esi, 4(%eax)
+; X86-NEXT: movl %edx, 8(%eax)
+; X86-NEXT: movl %edi, 12(%eax)
+; X86-NEXT: leal -12(%ebp), %esp
; X86-NEXT: popl %esi
; X86-NEXT: popl %edi
; X86-NEXT: popl %ebx
@@ -1175,35 +1197,39 @@ define i128 @abd_subnsw_i128_undef(i128 %a, i128 %b) nounwind {
; X86-LABEL: abd_subnsw_i128_undef:
; X86: # %bb.0:
; X86-NEXT: pushl %ebp
+; X86-NEXT: movl %esp, %ebp
; X86-NEXT: pushl %ebx
; X86-NEXT: pushl %edi
; X86-NEXT: pushl %esi
-; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
-; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
-; X86-NEXT: movl {{[0-9]+}}(%esp), %esi
-; X86-NEXT: movl {{[0-9]+}}(%esp), %edi
-; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT: subl {{[0-9]+}}(%esp), %edi
-; X86-NEXT: sbbl {{[0-9]+}}(%esp), %esi
-; X86-NEXT: sbbl {{[0-9]+}}(%esp), %edx
-; X86-NEXT: sbbl {{[0-9]+}}(%esp), %ecx
-; X86-NEXT: movl %ecx, %ebx
-; X86-NEXT: sarl $31, %ebx
-; X86-NEXT: xorl %ebx, %ecx
-; X86-NEXT: xorl %ebx, %edx
-; X86-NEXT: xorl %ebx, %esi
-; X86-NEXT: xorl %ebx, %edi
-; X86-NEXT: movl %ebx, %ebp
-; X86-NEXT: subl %edi, %ebp
-; X86-NEXT: movl %ebx, %edi
-; X86-NEXT: sbbl %esi, %edi
-; X86-NEXT: m...
[truncated]
|
bool X86TargetLowering::functionArgumentNeedsConsecutiveRegisters( | ||
Type *Ty, CallingConv::ID CallConv, bool isVarArg, | ||
const DataLayout &DL) const { | ||
// i128 split into i64 needs to be allocated to two consecutive registers, | ||
// or spilled to the stack as a whole. | ||
return Ty->isIntegerTy(128); | ||
// On x86-64 i128 is split into two i64s and needs to be allocated to two | ||
// consecutive registers, or spilled to the stack as a whole. On x86-32 i128 | ||
// is split to four i32s and never actually passed in registers, but we use | ||
// the consecutive register mark to match it in TableGen. | ||
if (Ty->isIntegerTy(128)) | ||
return true; | ||
|
||
// On x86-32, fp128 acts the same as i128. | ||
if (Subtarget.is32Bit() && Ty->isFP128Ty()) | ||
return true; | ||
|
||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is the right approach; functionArgumentNeedsConsecutiveRegisters
doesn't really seem like the correct thing because the type will never be passed in regesters, but I can't come up with another way to "mark" a register set to indicate that it needs custom lowering. Is there a better way to do this?
Cc @nikic since I think you rewrote the x86-64 custom lowering a couple of times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably also match vector types somehow because _m64
, __m128
, __m256
, and __m512
are specified to have an alignment of 8, 16, 32, and 64 bytes, respectively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, unfortunately the CC lowering is pretty limited in the information it can access, and backends have a lot of different hacks to work around that. I'd like to improve that, but this looks like an acceptable hack for now.
I also brought up discussion about specifying an ABI for __int128 https://groups.google.com/u/1/g/ia32-abi/c/TuMAt7mwbIU |
9191404
to
3daf079
Compare
Rebased to pick up a relevant test change. Also put up a separate PR with the pretest so that can get its own commit #148753. |
3daf079
to
ccdf8f7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM from my side.
Looks like this needs a rebase.
The i386 psABI specifies that `__float128` has 16 byte alignment and must be passed on the stack; however, LLVM currently stores it in a stack slot that has an offset of 4. Add a custom lowering to correct this alignment to 16-byte. i386 does not specify an `__int128`, but it seems reasonable to keep the same behavior as `__float128` so this is changed as well. Fixes: llvm#77401
ccdf8f7
to
ca1efdd
Compare
Thanks, rebased |
if (!ArgFlags.isInConsecutiveRegsLast()) | ||
return true; | ||
|
||
unsigned NumRegs = PendingMembers.size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this is an assertion-only variable. In non-assertion builds, this gives unused variable warnings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like somebody got this already fe1941967267e472
return true; | ||
|
||
unsigned NumRegs = PendingMembers.size(); | ||
assert(NumRegs == 4 && "Should have two parts"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert(NumRegs == 4 && "Should have two parts"); | |
assert(NumRegs == 4 && "Should have four parts"); |
Minor nit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix in #149386
@nikic is this something that would make sense to backport? I know ABI changes are typically a no, but this missed the cutoff by only a few days, and f128 is effectively completely broken without it (though i128 of course works fine). It would also be grouped with the windows f128 abi change 5ee1c0b. (fe19419 should be picked as well if so) |
Failed to cherry-pick: a78a0f8 https://github.com/llvm/llvm-project/actions/runs/16537791057 Please manually backport the fix and push it to your github fork. Once this is done, please create a pull request |
I'll open one |
…8092) The i386 psABI specifies that `__float128` has 16 byte alignment and must be passed on the stack; however, LLVM currently stores it in a stack slot that has an offset of 4. Add a custom lowering to correct this alignment to 16-byte. i386 does not specify an `__int128`, but it seems reasonable to keep the same behavior as `__float128` so this is changed as well. There also isn't a good way to distinguish whether a set of four registers came from an integer or a float. The main test demonstrating this change is `store_perturbed` in `llvm/test/CodeGen/X86/i128-fp128-abi.ll`. Referenced ABI: https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/uploads/14c05f1b1e156e0e46b61bfa7c1df1e2/intel386-psABI-2020-08-07.pdf Fixes: llvm#77401 (cherry picked from commit a78a0f8)
Created #150746 |
…8092) The i386 psABI specifies that `__float128` has 16 byte alignment and must be passed on the stack; however, LLVM currently stores it in a stack slot that has an offset of 4. Add a custom lowering to correct this alignment to 16-byte. i386 does not specify an `__int128`, but it seems reasonable to keep the same behavior as `__float128` so this is changed as well. There also isn't a good way to distinguish whether a set of four registers came from an integer or a float. The main test demonstrating this change is `store_perturbed` in `llvm/test/CodeGen/X86/i128-fp128-abi.ll`. Referenced ABI: https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/uploads/14c05f1b1e156e0e46b61bfa7c1df1e2/intel386-psABI-2020-08-07.pdf Fixes: llvm#77401 (cherry picked from commit a78a0f8)
…8092) The i386 psABI specifies that `__float128` has 16 byte alignment and must be passed on the stack; however, LLVM currently stores it in a stack slot that has an offset of 4. Add a custom lowering to correct this alignment to 16-byte. i386 does not specify an `__int128`, but it seems reasonable to keep the same behavior as `__float128` so this is changed as well. There also isn't a good way to distinguish whether a set of four registers came from an integer or a float. The main test demonstrating this change is `store_perturbed` in `llvm/test/CodeGen/X86/i128-fp128-abi.ll`. Referenced ABI: https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/uploads/14c05f1b1e156e0e46b61bfa7c1df1e2/intel386-psABI-2020-08-07.pdf Fixes: llvm#77401 (cherry picked from commit a78a0f8)
…8092) The i386 psABI specifies that `__float128` has 16 byte alignment and must be passed on the stack; however, LLVM currently stores it in a stack slot that has an offset of 4. Add a custom lowering to correct this alignment to 16-byte. i386 does not specify an `__int128`, but it seems reasonable to keep the same behavior as `__float128` so this is changed as well. There also isn't a good way to distinguish whether a set of four registers came from an integer or a float. The main test demonstrating this change is `store_perturbed` in `llvm/test/CodeGen/X86/i128-fp128-abi.ll`. Referenced ABI: https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/uploads/14c05f1b1e156e0e46b61bfa7c1df1e2/intel386-psABI-2020-08-07.pdf Fixes: llvm#77401 (cherry picked from commit a78a0f8)
LLVM21 fixed the new float types on a number of targets: * SystemZ gained f16 support llvm/llvm-project#109164 * Hexagon now uses soft f16 to avoid recursion bugs llvm/llvm-project#130977 * Mips now correctly handles f128 llvm/llvm-project#117525 * f128 is now correctly aligned when passing the stack on x86 llvm/llvm-project#138092 Thus, enable the types on relevant targets for LLVM > 21.0.0. NVPTX also gained handling of f128 as a storage type, but it lacks support for basic math operations so is still disabled here.
LLVM21 fixed the new float types on a number of targets: * SystemZ gained f16 support llvm/llvm-project#109164 * Hexagon now uses soft f16 to avoid recursion bugs llvm/llvm-project#130977 * Mips now correctly handles f128 (actually done in LLVM20) llvm/llvm-project#117525 * f128 is now correctly aligned when passing the stack on x86 llvm/llvm-project#138092 Thus, enable the types on relevant targets for LLVM > 21.0.0. NVPTX also gained handling of f128 as a storage type, but it lacks support for basic math operations so is still disabled here.
LLVM21 fixed the new float types on a number of targets: * SystemZ gained f16 support llvm/llvm-project#109164 * Hexagon now uses soft f16 to avoid recursion bugs llvm/llvm-project#130977 * Mips now correctly handles f128 (actually since LLVM20) llvm/llvm-project#117525 * f128 is now correctly aligned when passing the stack on x86 llvm/llvm-project#138092 Thus, enable the types on relevant targets for LLVM > 21.0.0. NVPTX also gained handling of f128 as a storage type, but it lacks support for basic math operations so is still disabled here.
Enable f16 and f128 on targets that were fixed in LLVM21 LLVM21 fixed the new float types on a number of targets: * SystemZ gained f16 support llvm/llvm-project#109164 * Hexagon now uses soft f16 to avoid recursion bugs llvm/llvm-project#130977 * Mips now correctly handles f128 (actually since LLVM20) llvm/llvm-project#117525 * f128 is now correctly aligned when passing the stack on x86 llvm/llvm-project#138092 Thus, enable the types on relevant targets for LLVM > 21.0.0. NVPTX also gained handling of f128 as a storage type, but it lacks support for basic math operations so is still disabled here. try-job: dist-i586-gnu-i586-i686-musl try-job: dist-i686-linux try-job: dist-i686-msvc try-job: dist-s390x-linux try-job: dist-various-1 try-job: dist-various-2 try-job: dist-x86_64-linux try-job: i686-gnu-1 try-job: i686-gnu-2 try-job: i686-msvc-1 try-job: i686-msvc-2 try-job: test-various
Enable f16 and f128 on targets that were fixed in LLVM21 LLVM21 fixed the new float types on a number of targets: * SystemZ gained f16 support llvm/llvm-project#109164 * Hexagon now uses soft f16 to avoid recursion bugs llvm/llvm-project#130977 * Mips now correctly handles f128 (actually since LLVM20) llvm/llvm-project#117525 * f128 is now correctly aligned when passing the stack on x86 llvm/llvm-project#138092 Thus, enable the types on relevant targets for LLVM > 21.0.0. NVPTX also gained handling of f128 as a storage type, but it lacks support for basic math operations so is still disabled here. try-job: dist-i586-gnu-i586-i686-musl try-job: dist-i686-linux try-job: dist-i686-msvc try-job: dist-s390x-linux try-job: dist-various-1 try-job: dist-various-2 try-job: dist-x86_64-linux try-job: i686-gnu-1 try-job: i686-gnu-2 try-job: i686-msvc-1 try-job: i686-msvc-2 try-job: test-various
Enable f16 and f128 on targets that were fixed in LLVM21 LLVM21 fixed the new float types on a number of targets: * SystemZ gained f16 support llvm/llvm-project#109164 * Hexagon now uses soft f16 to avoid recursion bugs llvm/llvm-project#130977 * Mips now correctly handles f128 (actually since LLVM20) llvm/llvm-project#117525 * f128 is now correctly aligned when passing the stack on x86 llvm/llvm-project#138092 Thus, enable the types on relevant targets for LLVM > 21.0.0. NVPTX also gained handling of f128 as a storage type, but it lacks support for basic math operations so is still disabled here. try-job: dist-i586-gnu-i586-i686-musl try-job: dist-i686-linux try-job: dist-i686-msvc try-job: dist-s390x-linux try-job: dist-various-1 try-job: dist-various-2 try-job: dist-x86_64-linux try-job: i686-gnu-1 try-job: i686-gnu-2 try-job: i686-msvc-1 try-job: i686-msvc-2 try-job: test-various
Rollup merge of #144987 - tgross35:llvm21-f16-f128, r=nikic Enable f16 and f128 on targets that were fixed in LLVM21 LLVM21 fixed the new float types on a number of targets: * SystemZ gained f16 support llvm/llvm-project#109164 * Hexagon now uses soft f16 to avoid recursion bugs llvm/llvm-project#130977 * Mips now correctly handles f128 (actually since LLVM20) llvm/llvm-project#117525 * f128 is now correctly aligned when passing the stack on x86 llvm/llvm-project#138092 Thus, enable the types on relevant targets for LLVM > 21.0.0. NVPTX also gained handling of f128 as a storage type, but it lacks support for basic math operations so is still disabled here. try-job: dist-i586-gnu-i586-i686-musl try-job: dist-i686-linux try-job: dist-i686-msvc try-job: dist-s390x-linux try-job: dist-various-1 try-job: dist-various-2 try-job: dist-x86_64-linux try-job: i686-gnu-1 try-job: i686-gnu-2 try-job: i686-msvc-1 try-job: i686-msvc-2 try-job: test-various
Enable f16 and f128 on targets that were fixed in LLVM21 LLVM21 fixed the new float types on a number of targets: * SystemZ gained f16 support llvm/llvm-project#109164 * Hexagon now uses soft f16 to avoid recursion bugs llvm/llvm-project#130977 * Mips now correctly handles f128 (actually since LLVM20) llvm/llvm-project#117525 * f128 is now correctly aligned when passing the stack on x86 llvm/llvm-project#138092 Thus, enable the types on relevant targets for LLVM > 21.0.0. NVPTX also gained handling of f128 as a storage type, but it lacks support for basic math operations so is still disabled here. try-job: dist-i586-gnu-i586-i686-musl try-job: dist-i686-linux try-job: dist-i686-msvc try-job: dist-s390x-linux try-job: dist-various-1 try-job: dist-various-2 try-job: dist-x86_64-linux try-job: i686-gnu-1 try-job: i686-gnu-2 try-job: i686-msvc-1 try-job: i686-msvc-2 try-job: test-various
The i386 psABI specifies that
__float128
has 16 byte alignment andmust be passed on the stack; however, LLVM currently stores it in a
stack slot that has an offset of 4. Add a custom lowering to correct
this alignment to 16-byte.
i386 does not specify an
__int128
, but it seems reasonable to keep thesame behavior as
__float128
so this is changed as well. There alsoisn't a good way to distinguish whether a set of four registers came
from an integer or a float.
The main test demonstrating this change is
store_perturbed
inllvm/test/CodeGen/X86/i128-fp128-abi.ll
.Referenced ABI: https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/uploads/14c05f1b1e156e0e46b61bfa7c1df1e2/intel386-psABI-2020-08-07.pdf
Fixes: #77401