[X86] Align f128 and i128 to 16 bytes when passing on x86-32 #138092

tgross35 · 2025-05-01T07:06:28Z

The i386 psABI specifies that __float128 has 16 byte alignment and
must be passed on the stack; however, LLVM currently stores it in a
stack slot that has an offset of 4. Add a custom lowering to correct
this alignment to 16-byte.

i386 does not specify an __int128, but it seems reasonable to keep the
same behavior as __float128 so this is changed as well. There also
isn't a good way to distinguish whether a set of four registers came
from an integer or a float.

The main test demonstrating this change is store_perturbed in
llvm/test/CodeGen/X86/i128-fp128-abi.ll.

Referenced ABI: https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/uploads/14c05f1b1e156e0e46b61bfa7c1df1e2/intel386-psABI-2020-08-07.pdf
Fixes: #77401

tgross35 · 2025-05-01T07:22:15Z

I'm not sure why this seems to have no effect; the tests at llvm/test/CodeGen/X86/fp128-abi.ll should fail since they need to be updated, but there is no change in codegen. I wonder if this is getting fixed up somewhere.

arsenm · 2025-05-01T09:19:37Z

I'm not sure why this seems to have no effect; the tests at llvm/test/CodeGen/X86/fp128-abi.ll should fail since they need to be updated, but there is no change in codegen. I wonder if this is getting fixed up somewhere.

fp128 is not a legal type, so this won't reach here (this is a super frustrating detail of how calling convention handling is implemented, it would be easier if the backend directly consumed the original type)

tgross35 · 2025-05-01T16:10:57Z

Should it be made a legal type? I’m not sure what the exact definition of that is, but it is in the i386 ABI as of recent versions to be returned and (presumably, since I haven’t seen otherwise) passed in memory.

arsenm · 2025-05-01T16:16:11Z

Should it be made a legal type?

No. There are no f128 registers or operations

tgross35 · 2025-05-01T16:42:36Z

How does stack alignment get handled for non-legal types, or where does this need to get fixed up? I’m surprised passing doesn’t use the type’s natural alignment.

arsenm · 2025-05-01T16:59:58Z

How does stack alignment get handled for non-legal types, or where does this need to get fixed up? I’m surprised passing doesn’t use the type’s natural alignment.

The type is broken up into legal pieces according to getRegisterTypeForCallingConv + getNumRegistersForCallingConv, and those pieces are processed by the call lowering. This code is entirely disconnected from the decision of where to pass the value, in a register or at some stack offset

github-actions · 2025-07-11T10:48:38Z

✅ With the latest revision this PR passed the C/C++ code formatter.

llvmbot · 2025-07-11T12:24:59Z

@llvm/pr-subscribers-backend-x86

Author: Trevor Gross (tgross35)

Changes

The i386 psABI specifies that __float128 has 16 byte alignment and
must be passed on the stack; however, LLVM currently stores it in a
stack slot that has an offset of 4. Add a custom lowering to correct
this alignment to 16-byte.

i386 does not specify an __int128, but it seems reasonable to keep the
same behavior as __float128 so this is changed as well.

Fixes: #77401

Patch is 551.12 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/138092.diff

52 Files Affected:

(modified) llvm/docs/ReleaseNotes.md (+2)
(modified) llvm/lib/Target/X86/X86CallingConv.cpp (+32)
(modified) llvm/lib/Target/X86/X86CallingConv.td (+5)
(modified) llvm/lib/Target/X86/X86ISelLoweringCall.cpp (+12-3)
(modified) llvm/test/CodeGen/X86/abds-neg.ll (+218-192)
(modified) llvm/test/CodeGen/X86/abds.ll (+208-182)
(modified) llvm/test/CodeGen/X86/abdu-neg.ll (+149-133)
(modified) llvm/test/CodeGen/X86/abdu.ll (+120-105)
(modified) llvm/test/CodeGen/X86/abs.ll (+32-23)
(modified) llvm/test/CodeGen/X86/add-sub-bool.ll (+15-10)
(modified) llvm/test/CodeGen/X86/all-ones-vector.ll (+6-6)
(modified) llvm/test/CodeGen/X86/arg-copy-elide.ll (+4-4)
(modified) llvm/test/CodeGen/X86/avx512fp16-cvt.ll (+29-13)
(modified) llvm/test/CodeGen/X86/bitselect.ll (+29-26)
(modified) llvm/test/CodeGen/X86/bsf.ll (+78-66)
(modified) llvm/test/CodeGen/X86/bsr.ll (+82-76)
(modified) llvm/test/CodeGen/X86/bswap-wide-int.ll (+20-10)
(modified) llvm/test/CodeGen/X86/div-rem-pair-recomposition-signed.ll (+18-18)
(modified) llvm/test/CodeGen/X86/div-rem-pair-recomposition-unsigned.ll (+47-47)
(modified) llvm/test/CodeGen/X86/fp128-cast-strict.ll (+52-40)
(modified) llvm/test/CodeGen/X86/fp128-cast.ll (+71-54)
(modified) llvm/test/CodeGen/X86/fp128-libcalls-strict.ll (+1260-800)
(modified) llvm/test/CodeGen/X86/fp128-libcalls.ll (+1121-652)
(modified) llvm/test/CodeGen/X86/fshl.ll (+104-81)
(modified) llvm/test/CodeGen/X86/fshr.ll (+91-79)
(modified) llvm/test/CodeGen/X86/funnel-shift.ll (+44-30)
(modified) llvm/test/CodeGen/X86/i128-add.ll (+14-9)
(modified) llvm/test/CodeGen/X86/i128-fp128-abi.ll (+471-235)
(modified) llvm/test/CodeGen/X86/i128-sdiv.ll (+348-27)
(modified) llvm/test/CodeGen/X86/i128-udiv.ll (+590-7)
(modified) llvm/test/CodeGen/X86/iabs.ll (+23-20)
(modified) llvm/test/CodeGen/X86/icmp-shift-opt.ll (+69-33)
(modified) llvm/test/CodeGen/X86/mul128.ll (+46-51)
(modified) llvm/test/CodeGen/X86/neg-abs.ll (+32-23)
(modified) llvm/test/CodeGen/X86/popcnt.ll (+275-210)
(modified) llvm/test/CodeGen/X86/pr46004.ll (+19)
(modified) llvm/test/CodeGen/X86/scalar-fp-to-i32.ll (+54-22)
(modified) llvm/test/CodeGen/X86/scalar-fp-to-i64.ll (+54-22)
(modified) llvm/test/CodeGen/X86/scmp.ll (+21-18)
(modified) llvm/test/CodeGen/X86/sdiv_fix.ll (+50-49)
(modified) llvm/test/CodeGen/X86/sdiv_fix_sat.ll (+222-218)
(modified) llvm/test/CodeGen/X86/shift-combine.ll (+12-2)
(modified) llvm/test/CodeGen/X86/shift-i128.ll (+52-20)
(modified) llvm/test/CodeGen/X86/smax.ll (+42-36)
(modified) llvm/test/CodeGen/X86/smin.ll (+43-38)
(modified) llvm/test/CodeGen/X86/ucmp.ll (+19-15)
(modified) llvm/test/CodeGen/X86/udiv_fix.ll (+15-13)
(modified) llvm/test/CodeGen/X86/udiv_fix_sat.ll (+15-13)
(modified) llvm/test/CodeGen/X86/umax.ll (+72-63)
(modified) llvm/test/CodeGen/X86/umin.ll (+43-38)
(modified) llvm/test/CodeGen/X86/umulo-128-legalisation-lowering.ll (+3-3)
(modified) llvm/test/CodeGen/X86/wide-integer-cmp.ll (+8-6)

diff --git a/llvm/docs/ReleaseNotes.md b/llvm/docs/ReleaseNotes.md
index daf822388a2ff..e91460d3a551c 100644
--- a/llvm/docs/ReleaseNotes.md
+++ b/llvm/docs/ReleaseNotes.md
@@ -228,6 +228,8 @@ Changes to the X86 Backend
 --------------------------
 
 * `fp128` will now use `*f128` libcalls on 32-bit GNU targets as well.
+* On x86-32, `fp128` and `i128` are now passed with the expected 16-byte stack
+  alignment.
 
 Changes to the OCaml bindings
 -----------------------------
diff --git a/llvm/lib/Target/X86/X86CallingConv.cpp b/llvm/lib/Target/X86/X86CallingConv.cpp
index 0b4c63f7a81f7..eb39259f7166b 100644
--- a/llvm/lib/Target/X86/X86CallingConv.cpp
+++ b/llvm/lib/Target/X86/X86CallingConv.cpp
@@ -374,5 +374,37 @@ static bool CC_X86_64_I128(unsigned &ValNo, MVT &ValVT, MVT &LocVT,
   return true;
 }
 
+/// Special handling for i128 and fp128: on x86-32, i128 and fp128 get legalized
+/// as four i32s, but fp128 must be passed on the stack with 16-byte alignment.
+/// Technically only fp128 has a specified ABI, but it makes sense to handle
+/// i128 the same until we hear differently.
+static bool CC_X86_32_I128_FP128(unsigned &ValNo, MVT &ValVT, MVT &LocVT,
+                                 CCValAssign::LocInfo &LocInfo,
+                                 ISD::ArgFlagsTy &ArgFlags, CCState &State) {
+  assert(ValVT == MVT::i32 && "Should have i32 parts");
+  SmallVectorImpl<CCValAssign> &PendingMembers = State.getPendingLocs();
+  PendingMembers.push_back(
+      CCValAssign::getPending(ValNo, ValVT, LocVT, LocInfo));
+
+  if (!ArgFlags.isInConsecutiveRegsLast())
+    return true;
+
+  unsigned NumRegs = PendingMembers.size();
+  assert(NumRegs == 4 && "Should have two parts");
+
+  int64_t Offset = State.AllocateStack(16, Align(16));
+  PendingMembers[0].convertToMem(Offset);
+  PendingMembers[1].convertToMem(Offset + 4);
+  PendingMembers[2].convertToMem(Offset + 8);
+  PendingMembers[3].convertToMem(Offset + 12);
+
+  State.addLoc(PendingMembers[0]);
+  State.addLoc(PendingMembers[1]);
+  State.addLoc(PendingMembers[2]);
+  State.addLoc(PendingMembers[3]);
+  PendingMembers.clear();
+  return true;
+}
+
 // Provides entry points of CC_X86 and RetCC_X86.
 #include "X86GenCallingConv.inc"
diff --git a/llvm/lib/Target/X86/X86CallingConv.td b/llvm/lib/Target/X86/X86CallingConv.td
index 823e0caa02262..f020e0b55141c 100644
--- a/llvm/lib/Target/X86/X86CallingConv.td
+++ b/llvm/lib/Target/X86/X86CallingConv.td
@@ -859,6 +859,11 @@ def CC_X86_32_C : CallingConv<[
   // The 'nest' parameter, if any, is passed in ECX.
   CCIfNest<CCAssignToReg<[ECX]>>,
 
+  // i128 and fp128 need to be passed on the stack with a higher alignment than
+  // their legal types. Handle this with a custom function.
+  CCIfType<[i32],
+           CCIfConsecutiveRegs<CCCustom<"CC_X86_32_I128_FP128">>>,
+
   // On swifttailcc pass swiftself in ECX.
   CCIfCC<"CallingConv::SwiftTail",
          CCIfSwiftSelf<CCIfType<[i32], CCAssignToReg<[ECX]>>>>,
diff --git a/llvm/lib/Target/X86/X86ISelLoweringCall.cpp b/llvm/lib/Target/X86/X86ISelLoweringCall.cpp
index 9ad355311527b..b4639ac2577e8 100644
--- a/llvm/lib/Target/X86/X86ISelLoweringCall.cpp
+++ b/llvm/lib/Target/X86/X86ISelLoweringCall.cpp
@@ -237,9 +237,18 @@ EVT X86TargetLowering::getSetCCResultType(const DataLayout &DL,
 bool X86TargetLowering::functionArgumentNeedsConsecutiveRegisters(
     Type *Ty, CallingConv::ID CallConv, bool isVarArg,
     const DataLayout &DL) const {
-  // i128 split into i64 needs to be allocated to two consecutive registers,
-  // or spilled to the stack as a whole.
-  return Ty->isIntegerTy(128);
+  // On x86-64 i128 is split into two i64s and needs to be allocated to two
+  // consecutive registers, or spilled to the stack as a whole. On x86-32 i128
+  // is split to four i32s and never actually passed in registers, but we use
+  // the consecutive register mark to match it in TableGen.
+  if (Ty->isIntegerTy(128))
+    return true;
+
+  // On x86-32, fp128 acts the same as i128.
+  if (Subtarget.is32Bit() && Ty->isFP128Ty())
+    return true;
+
+  return false;
 }
 
 /// Helper for getByValTypeAlignment to determine
diff --git a/llvm/test/CodeGen/X86/abds-neg.ll b/llvm/test/CodeGen/X86/abds-neg.ll
index f6d66ab47ce05..2911edfbfd409 100644
--- a/llvm/test/CodeGen/X86/abds-neg.ll
+++ b/llvm/test/CodeGen/X86/abds-neg.ll
@@ -367,44 +367,49 @@ define i128 @abd_ext_i128(i128 %a, i128 %b) nounwind {
 ; X86-LABEL: abd_ext_i128:
 ; X86:       # %bb.0:
 ; X86-NEXT:    pushl %ebp
+; X86-NEXT:    movl %esp, %ebp
 ; X86-NEXT:    pushl %ebx
 ; X86-NEXT:    pushl %edi
 ; X86-NEXT:    pushl %esi
-; X86-NEXT:    pushl %eax
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %ebp
-; X86-NEXT:    subl %ecx, %eax
-; X86-NEXT:    movl %eax, (%esp) # 4-byte Spill
-; X86-NEXT:    sbbl %edx, %ebp
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %ebx
-; X86-NEXT:    sbbl %edi, %ebx
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %esi
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT:    sbbl %esi, %eax
-; X86-NEXT:    subl {{[0-9]+}}(%esp), %ecx
-; X86-NEXT:    sbbl {{[0-9]+}}(%esp), %edx
-; X86-NEXT:    sbbl {{[0-9]+}}(%esp), %edi
-; X86-NEXT:    sbbl {{[0-9]+}}(%esp), %esi
-; X86-NEXT:    cmovll %eax, %esi
-; X86-NEXT:    cmovll %ebx, %edi
-; X86-NEXT:    cmovll %ebp, %edx
-; X86-NEXT:    cmovll (%esp), %ecx # 4-byte Folded Reload
-; X86-NEXT:    xorl %ebx, %ebx
+; X86-NEXT:    andl $-16, %esp
+; X86-NEXT:    subl $16, %esp
+; X86-NEXT:    movl 40(%ebp), %ecx
+; X86-NEXT:    movl 44(%ebp), %eax
+; X86-NEXT:    movl 24(%ebp), %edx
+; X86-NEXT:    movl 28(%ebp), %esi
+; X86-NEXT:    subl %ecx, %edx
+; X86-NEXT:    movl %edx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; X86-NEXT:    movl %esi, %edx
+; X86-NEXT:    sbbl %eax, %edx
+; X86-NEXT:    movl %edx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; X86-NEXT:    movl 48(%ebp), %edx
+; X86-NEXT:    movl 32(%ebp), %ebx
+; X86-NEXT:    sbbl %edx, %ebx
+; X86-NEXT:    movl 52(%ebp), %esi
+; X86-NEXT:    movl 36(%ebp), %edi
+; X86-NEXT:    sbbl %esi, %edi
+; X86-NEXT:    subl 24(%ebp), %ecx
+; X86-NEXT:    sbbl 28(%ebp), %eax
+; X86-NEXT:    sbbl 32(%ebp), %edx
+; X86-NEXT:    sbbl 36(%ebp), %esi
+; X86-NEXT:    cmovll %edi, %esi
+; X86-NEXT:    cmovll %ebx, %edx
+; X86-NEXT:    cmovll {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Folded Reload
+; X86-NEXT:    cmovll {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Folded Reload
+; X86-NEXT:    xorl %edi, %edi
 ; X86-NEXT:    negl %ecx
-; X86-NEXT:    movl $0, %ebp
-; X86-NEXT:    sbbl %edx, %ebp
-; X86-NEXT:    movl $0, %edx
-; X86-NEXT:    sbbl %edi, %edx
-; X86-NEXT:    sbbl %esi, %ebx
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT:    movl %ecx, (%eax)
-; X86-NEXT:    movl %ebp, 4(%eax)
-; X86-NEXT:    movl %edx, 8(%eax)
-; X86-NEXT:    movl %ebx, 12(%eax)
-; X86-NEXT:    addl $4, %esp
+; X86-NEXT:    movl $0, %ebx
+; X86-NEXT:    sbbl %eax, %ebx
+; X86-NEXT:    movl $0, %eax
+; X86-NEXT:    sbbl %edx, %eax
+; X86-NEXT:    sbbl %esi, %edi
+; X86-NEXT:    movl 8(%ebp), %edx
+; X86-NEXT:    movl %ecx, (%edx)
+; X86-NEXT:    movl %ebx, 4(%edx)
+; X86-NEXT:    movl %eax, 8(%edx)
+; X86-NEXT:    movl %edi, 12(%edx)
+; X86-NEXT:    movl %edx, %eax
+; X86-NEXT:    leal -12(%ebp), %esp
 ; X86-NEXT:    popl %esi
 ; X86-NEXT:    popl %edi
 ; X86-NEXT:    popl %ebx
@@ -438,44 +443,49 @@ define i128 @abd_ext_i128_undef(i128 %a, i128 %b) nounwind {
 ; X86-LABEL: abd_ext_i128_undef:
 ; X86:       # %bb.0:
 ; X86-NEXT:    pushl %ebp
+; X86-NEXT:    movl %esp, %ebp
 ; X86-NEXT:    pushl %ebx
 ; X86-NEXT:    pushl %edi
 ; X86-NEXT:    pushl %esi
-; X86-NEXT:    pushl %eax
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %ebp
-; X86-NEXT:    subl %ecx, %eax
-; X86-NEXT:    movl %eax, (%esp) # 4-byte Spill
-; X86-NEXT:    sbbl %edx, %ebp
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %ebx
-; X86-NEXT:    sbbl %edi, %ebx
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %esi
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT:    sbbl %esi, %eax
-; X86-NEXT:    subl {{[0-9]+}}(%esp), %ecx
-; X86-NEXT:    sbbl {{[0-9]+}}(%esp), %edx
-; X86-NEXT:    sbbl {{[0-9]+}}(%esp), %edi
-; X86-NEXT:    sbbl {{[0-9]+}}(%esp), %esi
-; X86-NEXT:    cmovll %eax, %esi
-; X86-NEXT:    cmovll %ebx, %edi
-; X86-NEXT:    cmovll %ebp, %edx
-; X86-NEXT:    cmovll (%esp), %ecx # 4-byte Folded Reload
-; X86-NEXT:    xorl %ebx, %ebx
+; X86-NEXT:    andl $-16, %esp
+; X86-NEXT:    subl $16, %esp
+; X86-NEXT:    movl 40(%ebp), %ecx
+; X86-NEXT:    movl 44(%ebp), %eax
+; X86-NEXT:    movl 24(%ebp), %edx
+; X86-NEXT:    movl 28(%ebp), %esi
+; X86-NEXT:    subl %ecx, %edx
+; X86-NEXT:    movl %edx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; X86-NEXT:    movl %esi, %edx
+; X86-NEXT:    sbbl %eax, %edx
+; X86-NEXT:    movl %edx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; X86-NEXT:    movl 48(%ebp), %edx
+; X86-NEXT:    movl 32(%ebp), %ebx
+; X86-NEXT:    sbbl %edx, %ebx
+; X86-NEXT:    movl 52(%ebp), %esi
+; X86-NEXT:    movl 36(%ebp), %edi
+; X86-NEXT:    sbbl %esi, %edi
+; X86-NEXT:    subl 24(%ebp), %ecx
+; X86-NEXT:    sbbl 28(%ebp), %eax
+; X86-NEXT:    sbbl 32(%ebp), %edx
+; X86-NEXT:    sbbl 36(%ebp), %esi
+; X86-NEXT:    cmovll %edi, %esi
+; X86-NEXT:    cmovll %ebx, %edx
+; X86-NEXT:    cmovll {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Folded Reload
+; X86-NEXT:    cmovll {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Folded Reload
+; X86-NEXT:    xorl %edi, %edi
 ; X86-NEXT:    negl %ecx
-; X86-NEXT:    movl $0, %ebp
-; X86-NEXT:    sbbl %edx, %ebp
-; X86-NEXT:    movl $0, %edx
-; X86-NEXT:    sbbl %edi, %edx
-; X86-NEXT:    sbbl %esi, %ebx
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT:    movl %ecx, (%eax)
-; X86-NEXT:    movl %ebp, 4(%eax)
-; X86-NEXT:    movl %edx, 8(%eax)
-; X86-NEXT:    movl %ebx, 12(%eax)
-; X86-NEXT:    addl $4, %esp
+; X86-NEXT:    movl $0, %ebx
+; X86-NEXT:    sbbl %eax, %ebx
+; X86-NEXT:    movl $0, %eax
+; X86-NEXT:    sbbl %edx, %eax
+; X86-NEXT:    sbbl %esi, %edi
+; X86-NEXT:    movl 8(%ebp), %edx
+; X86-NEXT:    movl %ecx, (%edx)
+; X86-NEXT:    movl %ebx, 4(%edx)
+; X86-NEXT:    movl %eax, 8(%edx)
+; X86-NEXT:    movl %edi, 12(%edx)
+; X86-NEXT:    movl %edx, %eax
+; X86-NEXT:    leal -12(%ebp), %esp
 ; X86-NEXT:    popl %esi
 ; X86-NEXT:    popl %edi
 ; X86-NEXT:    popl %ebx
@@ -639,55 +649,59 @@ define i128 @abd_minmax_i128(i128 %a, i128 %b) nounwind {
 ; X86-LABEL: abd_minmax_i128:
 ; X86:       # %bb.0:
 ; X86-NEXT:    pushl %ebp
+; X86-NEXT:    movl %esp, %ebp
 ; X86-NEXT:    pushl %ebx
 ; X86-NEXT:    pushl %edi
 ; X86-NEXT:    pushl %esi
-; X86-NEXT:    pushl %eax
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %ebp
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %ebx
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %esi
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
-; X86-NEXT:    cmpl %eax, %esi
-; X86-NEXT:    sbbl %ebx, %ecx
-; X86-NEXT:    movl %edx, %ecx
-; X86-NEXT:    sbbl %ebp, %ecx
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
-; X86-NEXT:    movl %edx, %ecx
-; X86-NEXT:    sbbl %edi, %ecx
-; X86-NEXT:    movl %edi, %ecx
-; X86-NEXT:    cmovll %edx, %ecx
-; X86-NEXT:    movl %ecx, (%esp) # 4-byte Spill
-; X86-NEXT:    cmovll {{[0-9]+}}(%esp), %ebp
-; X86-NEXT:    movl %ebx, %ecx
-; X86-NEXT:    cmovll {{[0-9]+}}(%esp), %ecx
-; X86-NEXT:    movl %eax, %edx
-; X86-NEXT:    cmovll %esi, %edx
-; X86-NEXT:    cmpl %esi, %eax
-; X86-NEXT:    movl %ebx, %esi
-; X86-NEXT:    sbbl {{[0-9]+}}(%esp), %esi
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %esi
-; X86-NEXT:    sbbl {{[0-9]+}}(%esp), %esi
-; X86-NEXT:    movl %edi, %esi
-; X86-NEXT:    sbbl {{[0-9]+}}(%esp), %esi
-; X86-NEXT:    cmovll {{[0-9]+}}(%esp), %edi
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %esi
-; X86-NEXT:    cmovll {{[0-9]+}}(%esp), %esi
-; X86-NEXT:    cmovll {{[0-9]+}}(%esp), %ebx
-; X86-NEXT:    cmovll {{[0-9]+}}(%esp), %eax
-; X86-NEXT:    subl %eax, %edx
-; X86-NEXT:    sbbl %ebx, %ecx
-; X86-NEXT:    sbbl %esi, %ebp
-; X86-NEXT:    movl (%esp), %esi # 4-byte Reload
-; X86-NEXT:    sbbl %edi, %esi
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT:    movl %edx, (%eax)
-; X86-NEXT:    movl %ecx, 4(%eax)
-; X86-NEXT:    movl %ebp, 8(%eax)
-; X86-NEXT:    movl %esi, 12(%eax)
-; X86-NEXT:    addl $4, %esp
+; X86-NEXT:    andl $-16, %esp
+; X86-NEXT:    subl $16, %esp
+; X86-NEXT:    movl 40(%ebp), %esi
+; X86-NEXT:    movl 24(%ebp), %edi
+; X86-NEXT:    movl 28(%ebp), %eax
+; X86-NEXT:    cmpl %esi, %edi
+; X86-NEXT:    sbbl 44(%ebp), %eax
+; X86-NEXT:    movl 48(%ebp), %edx
+; X86-NEXT:    movl 32(%ebp), %eax
+; X86-NEXT:    sbbl %edx, %eax
+; X86-NEXT:    movl 52(%ebp), %ebx
+; X86-NEXT:    movl 36(%ebp), %ecx
+; X86-NEXT:    movl %ecx, %eax
+; X86-NEXT:    sbbl %ebx, %eax
+; X86-NEXT:    movl %ebx, %eax
+; X86-NEXT:    cmovll %ecx, %eax
+; X86-NEXT:    movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; X86-NEXT:    movl %edx, %eax
+; X86-NEXT:    cmovll 32(%ebp), %eax
+; X86-NEXT:    movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; X86-NEXT:    movl 44(%ebp), %eax
+; X86-NEXT:    cmovll 28(%ebp), %eax
+; X86-NEXT:    movl %esi, %ecx
+; X86-NEXT:    cmovll %edi, %ecx
+; X86-NEXT:    cmpl %edi, %esi
+; X86-NEXT:    movl 44(%ebp), %edi
+; X86-NEXT:    sbbl 28(%ebp), %edi
+; X86-NEXT:    movl %edx, %edi
+; X86-NEXT:    sbbl 32(%ebp), %edi
+; X86-NEXT:    movl %ebx, %edi
+; X86-NEXT:    sbbl 36(%ebp), %edi
+; X86-NEXT:    cmovll 36(%ebp), %ebx
+; X86-NEXT:    cmovll 32(%ebp), %edx
+; X86-NEXT:    movl 44(%ebp), %edi
+; X86-NEXT:    cmovll 28(%ebp), %edi
+; X86-NEXT:    cmovll 24(%ebp), %esi
+; X86-NEXT:    subl %esi, %ecx
+; X86-NEXT:    sbbl %edi, %eax
+; X86-NEXT:    movl {{[-0-9]+}}(%e{{[sb]}}p), %edi # 4-byte Reload
+; X86-NEXT:    sbbl %edx, %edi
+; X86-NEXT:    movl {{[-0-9]+}}(%e{{[sb]}}p), %esi # 4-byte Reload
+; X86-NEXT:    sbbl %ebx, %esi
+; X86-NEXT:    movl 8(%ebp), %edx
+; X86-NEXT:    movl %ecx, (%edx)
+; X86-NEXT:    movl %eax, 4(%edx)
+; X86-NEXT:    movl %edi, 8(%edx)
+; X86-NEXT:    movl %esi, 12(%edx)
+; X86-NEXT:    movl %edx, %eax
+; X86-NEXT:    leal -12(%ebp), %esp
 ; X86-NEXT:    popl %esi
 ; X86-NEXT:    popl %edi
 ; X86-NEXT:    popl %ebx
@@ -848,37 +862,41 @@ define i128 @abd_cmp_i128(i128 %a, i128 %b) nounwind {
 ; X86-LABEL: abd_cmp_i128:
 ; X86:       # %bb.0:
 ; X86-NEXT:    pushl %ebp
+; X86-NEXT:    movl %esp, %ebp
 ; X86-NEXT:    pushl %ebx
 ; X86-NEXT:    pushl %edi
 ; X86-NEXT:    pushl %esi
-; X86-NEXT:    pushl %eax
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %esi
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %ebx
-; X86-NEXT:    subl %edx, %eax
-; X86-NEXT:    movl %eax, (%esp) # 4-byte Spill
-; X86-NEXT:    sbbl %esi, %ebx
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %ebp
-; X86-NEXT:    sbbl %ecx, %ebp
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT:    sbbl %edi, %eax
-; X86-NEXT:    subl {{[0-9]+}}(%esp), %edx
-; X86-NEXT:    sbbl {{[0-9]+}}(%esp), %esi
-; X86-NEXT:    sbbl {{[0-9]+}}(%esp), %ecx
-; X86-NEXT:    sbbl {{[0-9]+}}(%esp), %edi
-; X86-NEXT:    cmovgel (%esp), %edx # 4-byte Folded Reload
-; X86-NEXT:    cmovgel %ebx, %esi
-; X86-NEXT:    cmovgel %ebp, %ecx
-; X86-NEXT:    cmovgel %eax, %edi
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT:    movl %edi, 12(%eax)
-; X86-NEXT:    movl %ecx, 8(%eax)
-; X86-NEXT:    movl %esi, 4(%eax)
-; X86-NEXT:    movl %edx, (%eax)
-; X86-NEXT:    addl $4, %esp
+; X86-NEXT:    andl $-16, %esp
+; X86-NEXT:    subl $16, %esp
+; X86-NEXT:    movl 24(%ebp), %ecx
+; X86-NEXT:    movl 28(%ebp), %edx
+; X86-NEXT:    movl 40(%ebp), %eax
+; X86-NEXT:    movl 44(%ebp), %esi
+; X86-NEXT:    subl %ecx, %eax
+; X86-NEXT:    movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; X86-NEXT:    movl %esi, %eax
+; X86-NEXT:    sbbl %edx, %eax
+; X86-NEXT:    movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
+; X86-NEXT:    movl 32(%ebp), %esi
+; X86-NEXT:    movl 48(%ebp), %edi
+; X86-NEXT:    sbbl %esi, %edi
+; X86-NEXT:    movl 36(%ebp), %ebx
+; X86-NEXT:    movl 52(%ebp), %eax
+; X86-NEXT:    sbbl %ebx, %eax
+; X86-NEXT:    subl 40(%ebp), %ecx
+; X86-NEXT:    sbbl 44(%ebp), %edx
+; X86-NEXT:    sbbl 48(%ebp), %esi
+; X86-NEXT:    sbbl 52(%ebp), %ebx
+; X86-NEXT:    cmovgel {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Folded Reload
+; X86-NEXT:    cmovgel {{[-0-9]+}}(%e{{[sb]}}p), %edx # 4-byte Folded Reload
+; X86-NEXT:    cmovgel %edi, %esi
+; X86-NEXT:    cmovgel %eax, %ebx
+; X86-NEXT:    movl 8(%ebp), %eax
+; X86-NEXT:    movl %ebx, 12(%eax)
+; X86-NEXT:    movl %esi, 8(%eax)
+; X86-NEXT:    movl %edx, 4(%eax)
+; X86-NEXT:    movl %ecx, (%eax)
+; X86-NEXT:    leal -12(%ebp), %esp
 ; X86-NEXT:    popl %esi
 ; X86-NEXT:    popl %edi
 ; X86-NEXT:    popl %ebx
@@ -1118,35 +1136,39 @@ define i128 @abd_subnsw_i128(i128 %a, i128 %b) nounwind {
 ; X86-LABEL: abd_subnsw_i128:
 ; X86:       # %bb.0:
 ; X86-NEXT:    pushl %ebp
+; X86-NEXT:    movl %esp, %ebp
 ; X86-NEXT:    pushl %ebx
 ; X86-NEXT:    pushl %edi
 ; X86-NEXT:    pushl %esi
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %esi
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT:    subl {{[0-9]+}}(%esp), %edi
-; X86-NEXT:    sbbl {{[0-9]+}}(%esp), %esi
-; X86-NEXT:    sbbl {{[0-9]+}}(%esp), %edx
-; X86-NEXT:    sbbl {{[0-9]+}}(%esp), %ecx
-; X86-NEXT:    movl %ecx, %ebx
-; X86-NEXT:    sarl $31, %ebx
-; X86-NEXT:    xorl %ebx, %ecx
-; X86-NEXT:    xorl %ebx, %edx
-; X86-NEXT:    xorl %ebx, %esi
-; X86-NEXT:    xorl %ebx, %edi
-; X86-NEXT:    movl %ebx, %ebp
-; X86-NEXT:    subl %edi, %ebp
-; X86-NEXT:    movl %ebx, %edi
-; X86-NEXT:    sbbl %esi, %edi
-; X86-NEXT:    movl %ebx, %esi
+; X86-NEXT:    andl $-16, %esp
+; X86-NEXT:    subl $16, %esp
+; X86-NEXT:    movl 36(%ebp), %eax
+; X86-NEXT:    movl 32(%ebp), %ecx
+; X86-NEXT:    movl 28(%ebp), %edx
+; X86-NEXT:    movl 24(%ebp), %esi
+; X86-NEXT:    subl 40(%ebp), %esi
+; X86-NEXT:    sbbl 44(%ebp), %edx
+; X86-NEXT:    sbbl 48(%ebp), %ecx
+; X86-NEXT:    sbbl 52(%ebp), %eax
+; X86-NEXT:    movl %eax, %edi
+; X86-NEXT:    sarl $31, %edi
+; X86-NEXT:    xorl %edi, %eax
+; X86-NEXT:    xorl %edi, %ecx
+; X86-NEXT:    xorl %edi, %edx
+; X86-NEXT:    xorl %edi, %esi
+; X86-NEXT:    movl %edi, %ebx
+; X86-NEXT:    subl %esi, %ebx
+; X86-NEXT:    movl %edi, %esi
 ; X86-NEXT:    sbbl %edx, %esi
-; X86-NEXT:    sbbl %ecx, %ebx
-; X86-NEXT:    movl %ebp, (%eax)
-; X86-NEXT:    movl %edi, 4(%eax)
-; X86-NEXT:    movl %esi, 8(%eax)
-; X86-NEXT:    movl %ebx, 12(%eax)
+; X86-NEXT:    movl %edi, %edx
+; X86-NEXT:    sbbl %ecx, %edx
+; X86-NEXT:    sbbl %eax, %edi
+; X86-NEXT:    movl 8(%ebp), %eax
+; X86-NEXT:    movl %ebx, (%eax)
+; X86-NEXT:    movl %esi, 4(%eax)
+; X86-NEXT:    movl %edx, 8(%eax)
+; X86-NEXT:    movl %edi, 12(%eax)
+; X86-NEXT:    leal -12(%ebp), %esp
 ; X86-NEXT:    popl %esi
 ; X86-NEXT:    popl %edi
 ; X86-NEXT:    popl %ebx
@@ -1175,35 +1197,39 @@ define i128 @abd_subnsw_i128_undef(i128 %a, i128 %b) nounwind {
 ; X86-LABEL: abd_subnsw_i128_undef:
 ; X86:       # %bb.0:
 ; X86-NEXT:    pushl %ebp
+; X86-NEXT:    movl %esp, %ebp
 ; X86-NEXT:    pushl %ebx
 ; X86-NEXT:    pushl %edi
 ; X86-NEXT:    pushl %esi
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %edx
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %esi
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT:    subl {{[0-9]+}}(%esp), %edi
-; X86-NEXT:    sbbl {{[0-9]+}}(%esp), %esi
-; X86-NEXT:    sbbl {{[0-9]+}}(%esp), %edx
-; X86-NEXT:    sbbl {{[0-9]+}}(%esp), %ecx
-; X86-NEXT:    movl %ecx, %ebx
-; X86-NEXT:    sarl $31, %ebx
-; X86-NEXT:    xorl %ebx, %ecx
-; X86-NEXT:    xorl %ebx, %edx
-; X86-NEXT:    xorl %ebx, %esi
-; X86-NEXT:    xorl %ebx, %edi
-; X86-NEXT:    movl %ebx, %ebp
-; X86-NEXT:    subl %edi, %ebp
-; X86-NEXT:    movl %ebx, %edi
-; X86-NEXT:    sbbl %esi, %edi
-; X86-NEXT:    m...
[truncated]

tgross35 · 2025-07-11T12:27:44Z

llvm/lib/Target/X86/X86ISelLoweringCall.cpp

 bool X86TargetLowering::functionArgumentNeedsConsecutiveRegisters(
    Type *Ty, CallingConv::ID CallConv, bool isVarArg,
    const DataLayout &DL) const {
-  // i128 split into i64 needs to be allocated to two consecutive registers,
-  // or spilled to the stack as a whole.
-  return Ty->isIntegerTy(128);
+  // On x86-64 i128 is split into two i64s and needs to be allocated to two
+  // consecutive registers, or spilled to the stack as a whole. On x86-32 i128
+  // is split to four i32s and never actually passed in registers, but we use
+  // the consecutive register mark to match it in TableGen.
+  if (Ty->isIntegerTy(128))
+    return true;
+
+  // On x86-32, fp128 acts the same as i128.
+  if (Subtarget.is32Bit() && Ty->isFP128Ty())
+    return true;
+
+  return false;


I'm not sure if this is the right approach; functionArgumentNeedsConsecutiveRegisters doesn't really seem like the correct thing because the type will never be passed in regesters, but I can't come up with another way to "mark" a register set to indicate that it needs custom lowering. Is there a better way to do this?

Cc @nikic since I think you rewrote the x86-64 custom lowering a couple of times.

This should probably also match vector types somehow because _m64, __m128, __m256, and __m512 are specified to have an alignment of 8, 16, 32, and 64 bytes, respectively.

Yeah, unfortunately the CC lowering is pretty limited in the information it can access, and backends have a lot of different hacks to work around that. I'd like to improve that, but this looks like an acceptable hack for now.

llvm/test/CodeGen/X86/all-ones-vector.ll

tgross35 · 2025-07-12T00:40:27Z

I also brought up discussion about specifying an ABI for __int128 https://groups.google.com/u/1/g/ia32-abi/c/TuMAt7mwbIU

tgross35 · 2025-07-15T00:41:53Z

Rebased to pick up a relevant test change. Also put up a separate PR with the pretest so that can get its own commit #148753.

nikic

LGTM from my side.

Looks like this needs a rebase.

The i386 psABI specifies that `__float128` has 16 byte alignment and must be passed on the stack; however, LLVM currently stores it in a stack slot that has an offset of 4. Add a custom lowering to correct this alignment to 16-byte. i386 does not specify an `__int128`, but it seems reasonable to keep the same behavior as `__float128` so this is changed as well. Fixes: llvm#77401

tgross35 · 2025-07-16T20:22:24Z

Thanks, rebased

bevin-hansson · 2025-07-17T14:51:18Z

llvm/lib/Target/X86/X86CallingConv.cpp

+  if (!ArgFlags.isInConsecutiveRegsLast())
+    return true;
+
+  unsigned NumRegs = PendingMembers.size();


Seems like this is an assertion-only variable. In non-assertion builds, this gives unused variable warnings.

It looks like somebody got this already fe1941967267e472

beetrees · 2025-07-17T17:32:37Z

llvm/lib/Target/X86/X86CallingConv.cpp

+    return true;
+
+  unsigned NumRegs = PendingMembers.size();
+  assert(NumRegs == 4 && "Should have two parts");


Suggested change

assert(NumRegs == 4 && "Should have two parts");

assert(NumRegs == 4 && "Should have four parts");

Minor nit.

Fix in #149386

tgross35 · 2025-07-26T07:54:45Z

@nikic is this something that would make sense to backport? I know ABI changes are typically a no, but this missed the cutoff by only a few days, and f128 is effectively completely broken without it (though i128 of course works fine). It would also be grouped with the windows f128 abi change 5ee1c0b.

(fe19419 should be picked as well if so)

nikic · 2025-07-26T08:07:34Z

/cherry-pick a78a0f8 fe19419

llvmbot · 2025-07-26T08:13:26Z

Failed to cherry-pick: a78a0f8

https://github.com/llvm/llvm-project/actions/runs/16537791057

Please manually backport the fix and push it to your github fork. Once this is done, please create a pull request

tgross35 · 2025-07-26T08:23:13Z

I'll open one

…8092) The i386 psABI specifies that `__float128` has 16 byte alignment and must be passed on the stack; however, LLVM currently stores it in a stack slot that has an offset of 4. Add a custom lowering to correct this alignment to 16-byte. i386 does not specify an `__int128`, but it seems reasonable to keep the same behavior as `__float128` so this is changed as well. There also isn't a good way to distinguish whether a set of four registers came from an integer or a float. The main test demonstrating this change is `store_perturbed` in `llvm/test/CodeGen/X86/i128-fp128-abi.ll`. Referenced ABI: https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/uploads/14c05f1b1e156e0e46b61bfa7c1df1e2/intel386-psABI-2020-08-07.pdf Fixes: llvm#77401 (cherry picked from commit a78a0f8)

tgross35 · 2025-07-26T09:17:32Z

Created #150746

…8092) The i386 psABI specifies that `__float128` has 16 byte alignment and must be passed on the stack; however, LLVM currently stores it in a stack slot that has an offset of 4. Add a custom lowering to correct this alignment to 16-byte. i386 does not specify an `__int128`, but it seems reasonable to keep the same behavior as `__float128` so this is changed as well. There also isn't a good way to distinguish whether a set of four registers came from an integer or a float. The main test demonstrating this change is `store_perturbed` in `llvm/test/CodeGen/X86/i128-fp128-abi.ll`. Referenced ABI: https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/uploads/14c05f1b1e156e0e46b61bfa7c1df1e2/intel386-psABI-2020-08-07.pdf Fixes: llvm#77401 (cherry picked from commit a78a0f8)

LLVM21 fixed the new float types on a number of targets: * SystemZ gained f16 support llvm/llvm-project#109164 * Hexagon now uses soft f16 to avoid recursion bugs llvm/llvm-project#130977 * Mips now correctly handles f128 llvm/llvm-project#117525 * f128 is now correctly aligned when passing the stack on x86 llvm/llvm-project#138092 Thus, enable the types on relevant targets for LLVM > 21.0.0. NVPTX also gained handling of f128 as a storage type, but it lacks support for basic math operations so is still disabled here.

LLVM21 fixed the new float types on a number of targets: * SystemZ gained f16 support llvm/llvm-project#109164 * Hexagon now uses soft f16 to avoid recursion bugs llvm/llvm-project#130977 * Mips now correctly handles f128 (actually done in LLVM20) llvm/llvm-project#117525 * f128 is now correctly aligned when passing the stack on x86 llvm/llvm-project#138092 Thus, enable the types on relevant targets for LLVM > 21.0.0. NVPTX also gained handling of f128 as a storage type, but it lacks support for basic math operations so is still disabled here.

LLVM21 fixed the new float types on a number of targets: * SystemZ gained f16 support llvm/llvm-project#109164 * Hexagon now uses soft f16 to avoid recursion bugs llvm/llvm-project#130977 * Mips now correctly handles f128 (actually since LLVM20) llvm/llvm-project#117525 * f128 is now correctly aligned when passing the stack on x86 llvm/llvm-project#138092 Thus, enable the types on relevant targets for LLVM > 21.0.0. NVPTX also gained handling of f128 as a storage type, but it lacks support for basic math operations so is still disabled here.

Enable f16 and f128 on targets that were fixed in LLVM21 LLVM21 fixed the new float types on a number of targets: * SystemZ gained f16 support llvm/llvm-project#109164 * Hexagon now uses soft f16 to avoid recursion bugs llvm/llvm-project#130977 * Mips now correctly handles f128 (actually since LLVM20) llvm/llvm-project#117525 * f128 is now correctly aligned when passing the stack on x86 llvm/llvm-project#138092 Thus, enable the types on relevant targets for LLVM > 21.0.0. NVPTX also gained handling of f128 as a storage type, but it lacks support for basic math operations so is still disabled here. try-job: dist-i586-gnu-i586-i686-musl try-job: dist-i686-linux try-job: dist-i686-msvc try-job: dist-s390x-linux try-job: dist-various-1 try-job: dist-various-2 try-job: dist-x86_64-linux try-job: i686-gnu-1 try-job: i686-gnu-2 try-job: i686-msvc-1 try-job: i686-msvc-2 try-job: test-various

Rollup merge of #144987 - tgross35:llvm21-f16-f128, r=nikic Enable f16 and f128 on targets that were fixed in LLVM21 LLVM21 fixed the new float types on a number of targets: * SystemZ gained f16 support llvm/llvm-project#109164 * Hexagon now uses soft f16 to avoid recursion bugs llvm/llvm-project#130977 * Mips now correctly handles f128 (actually since LLVM20) llvm/llvm-project#117525 * f128 is now correctly aligned when passing the stack on x86 llvm/llvm-project#138092 Thus, enable the types on relevant targets for LLVM > 21.0.0. NVPTX also gained handling of f128 as a storage type, but it lacks support for basic math operations so is still disabled here. try-job: dist-i586-gnu-i586-i686-musl try-job: dist-i686-linux try-job: dist-i686-msvc try-job: dist-s390x-linux try-job: dist-various-1 try-job: dist-various-2 try-job: dist-x86_64-linux try-job: i686-gnu-1 try-job: i686-gnu-2 try-job: i686-msvc-1 try-job: i686-msvc-2 try-job: test-various

Enable f16 and f128 on targets that were fixed in LLVM21 LLVM21 fixed the new float types on a number of targets: * SystemZ gained f16 support llvm/llvm-project#109164 * Hexagon now uses soft f16 to avoid recursion bugs llvm/llvm-project#130977 * Mips now correctly handles f128 (actually since LLVM20) llvm/llvm-project#117525 * f128 is now correctly aligned when passing the stack on x86 llvm/llvm-project#138092 Thus, enable the types on relevant targets for LLVM > 21.0.0. NVPTX also gained handling of f128 as a storage type, but it lacks support for basic math operations so is still disabled here. try-job: dist-i586-gnu-i586-i686-musl try-job: dist-i686-linux try-job: dist-i686-msvc try-job: dist-s390x-linux try-job: dist-various-1 try-job: dist-various-2 try-job: dist-x86_64-linux try-job: i686-gnu-1 try-job: i686-gnu-2 try-job: i686-msvc-1 try-job: i686-msvc-2 try-job: test-various

tgross35 changed the title ~~Align f128 to 16 bytes when passing on x86~~ [X86] Align f128 to 16 bytes when passing on x86-32 May 1, 2025

tgross35 force-pushed the x86-32-f128-abi branch from a3b1041 to 3b2a385 Compare July 11, 2025 10:46

tgross35 force-pushed the x86-32-f128-abi branch from 3b2a385 to ee3b81f Compare July 11, 2025 10:53

tgross35 changed the title ~~[X86] Align f128 to 16 bytes when passing on x86-32~~ [X86] Align f128 and i128 to 16 bytes when passing on x86-32 Jul 11, 2025

tgross35 marked this pull request as ready for review July 11, 2025 12:24

llvmbot added the backend:X86 label Jul 11, 2025

tgross35 commented Jul 11, 2025

View reviewed changes

llvm/test/CodeGen/X86/all-ones-vector.ll Outdated Show resolved Hide resolved

tgross35 requested a review from phoebewang July 11, 2025 21:26

tgross35 force-pushed the x86-32-f128-abi branch from 9191404 to 3daf079 Compare July 15, 2025 00:34

tgross35 mentioned this pull request Jul 15, 2025

[X86] Update the fp128/i128 test to show stack alignment (NFC) #148753

Merged

tgross35 force-pushed the x86-32-f128-abi branch from 3daf079 to ccdf8f7 Compare July 15, 2025 03:29

nikic approved these changes Jul 16, 2025

View reviewed changes

tgross35 force-pushed the x86-32-f128-abi branch from ccdf8f7 to ca1efdd Compare July 16, 2025 18:52

tgross35 added 2 commits July 16, 2025 14:59

Remove unneeded auto test updates

9a4e8f8

Remove accidental duplicate comment

fa3dfa8

nikic merged commit a78a0f8 into llvm:main Jul 17, 2025
10 checks passed

tgross35 deleted the x86-32-f128-abi branch July 17, 2025 09:32

bevin-hansson reviewed Jul 17, 2025

View reviewed changes

beetrees reviewed Jul 17, 2025

View reviewed changes

This was referenced Jul 23, 2025

test abhinavgaba/llvm-project#2

Closed

Add dataFence plugin interface abhinavgaba/llvm-project#3

Open

nikic added this to the LLVM 21.x Release milestone Jul 26, 2025

github-project-automation bot added this to LLVM Release Status Jul 26, 2025

github-project-automation bot moved this to Needs Triage in LLVM Release Status Jul 26, 2025

llvmbot added the release:cherry-pick-failed label Jul 26, 2025

nikic moved this from Needs Triage to Done in LLVM Release Status Jul 26, 2025

durin42 mentioned this pull request Jul 28, 2025

LLVM 21: f128/i128 now 16-byte aligned on x86-32 rust-lang/rust#144606

Closed

tgross35 mentioned this pull request Aug 7, 2025

Enable f16 and f128 on targets that were fixed in LLVM21 rust-lang/rust#144987

Merged

	assert(NumRegs == 4 && "Should have two parts");
	assert(NumRegs == 4 && "Should have four parts");

[X86] Align f128 and i128 to 16 bytes when passing on x86-32 #138092

[X86] Align f128 and i128 to 16 bytes when passing on x86-32 #138092

Conversation

tgross35 commented May 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tgross35 commented May 1, 2025

Uh oh!

arsenm commented May 1, 2025

Uh oh!

tgross35 commented May 1, 2025

Uh oh!

arsenm commented May 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tgross35 commented May 1, 2025

Uh oh!

arsenm commented May 1, 2025

Uh oh!

github-actions bot commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jul 11, 2025

Uh oh!

tgross35 Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

tgross35 Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

nikic Jul 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tgross35 commented Jul 12, 2025

Uh oh!

tgross35 commented Jul 15, 2025

Uh oh!

nikic left a comment

Choose a reason for hiding this comment

Uh oh!

tgross35 commented Jul 16, 2025

Uh oh!

Uh oh!

bevin-hansson Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

tgross35 Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

beetrees Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

tgross35 Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

tgross35 commented Jul 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nikic commented Jul 26, 2025

Uh oh!

llvmbot commented Jul 26, 2025

Uh oh!

tgross35 commented Jul 26, 2025

Uh oh!

tgross35 commented Jul 26, 2025

Uh oh!

Uh oh!

tgross35 commented May 1, 2025 •

edited

Loading

arsenm commented May 1, 2025 •

edited

Loading

github-actions bot commented Jul 11, 2025 •

edited

Loading

tgross35 commented Jul 26, 2025 •

edited

Loading