[AMDGPU] Fix a potential integer overflow in GCNRegPressure when true16 is enabled #144968

shiltian · 2025-06-20T02:03:17Z

Fixes SWDEV-537014.

shiltian · 2025-06-20T02:03:31Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

llvmbot · 2025-06-20T02:03:45Z

@llvm/pr-subscribers-backend-amdgpu

Author: Shilei Tian (shiltian)

Changes

Fixes SWDEV-537014.

Full diff: https://github.com/llvm/llvm-project/pull/144968.diff

2 Files Affected:

(modified) llvm/lib/Target/AMDGPU/GCNRegPressure.cpp (+17-1)
(added) llvm/test/CodeGen/AMDGPU/gcn-reg-pressure-true16-integer-overflow.mir (+70)

diff --git a/llvm/lib/Target/AMDGPU/GCNRegPressure.cpp b/llvm/lib/Target/AMDGPU/GCNRegPressure.cpp
index ce213b91b1f7e..7281524b4d53d 100644
--- a/llvm/lib/Target/AMDGPU/GCNRegPressure.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNRegPressure.cpp
@@ -66,7 +66,23 @@ void GCNRegPressure::inc(unsigned Reg,
       Value[TupleIdx] += Sign * TRI->getRegClassWeight(RC).RegWeight;
     }
     // Pressure scales with number of new registers covered by the new mask.
-    Sign *= SIRegisterInfo::getNumCoveredRegs(~PrevMask & NewMask);
+    // Note that, when true16 is enabled, we can no longer use the following
+    // code to calculate the difference of number of 32-bit registers between
+    // the two mask:
+    //
+    // Sign *= SIRegisterInfo::getNumCoveredRegs(~PrevMask & NewMask);
+    //
+    // The reason is, the new mask `~PrevMask & NewMask` doesn't treat a 16-bit
+    // register use as a whole 32-bit register use.
+    //
+    // Let's take a look at an example. Assume PrevMask = 0b0010, and NewMask =
+    // 0b1111. The difference in this case should be 1, because even though
+    // PrevMask only uses half of a 32-bit register, we still need to count it
+    // as a whole. However, `~PrevMask & NewMask` gives us 0b1101, and then
+    // `getNumCoveredRegs` will return 2 in this case, which can cause integer
+    // overflow if Sign = -1.
+    Sign *= SIRegisterInfo::getNumCoveredRegs(NewMask) -
+            SIRegisterInfo::getNumCoveredRegs(PrevMask);
   }
   Value[RegKind] += Sign;
 }
diff --git a/llvm/test/CodeGen/AMDGPU/gcn-reg-pressure-true16-integer-overflow.mir b/llvm/test/CodeGen/AMDGPU/gcn-reg-pressure-true16-integer-overflow.mir
new file mode 100644
index 0000000000000..9aac3d74eea4f
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/gcn-reg-pressure-true16-integer-overflow.mir
@@ -0,0 +1,70 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -x mir -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1102 -run-pass=machine-scheduler %s -o - | FileCheck %s
+
+--- |
+  declare void @llvm.amdgcn.s.waitcnt(i32 immarg)
+
+  declare <2 x i32> @llvm.amdgcn.raw.buffer.load.v2i32(<4 x i32>, i32, i32, i32 immarg)
+
+  define amdgpu_kernel void @foo(ptr %p) {
+  entry:
+    %foo.kernarg.segment = call nonnull align 16 dereferenceable(264) ptr addrspace(4) @llvm.amdgcn.kernarg.segment.ptr()
+    %p.kernarg.offset1 = bitcast ptr addrspace(4) %foo.kernarg.segment to ptr addrspace(4)
+    %p.load = load ptr, ptr addrspace(4) %p.kernarg.offset1, align 16
+    %call = tail call <2 x i32> @llvm.amdgcn.raw.buffer.load.v2i32(<4 x i32> zeroinitializer, i32 0, i32 0, i32 0)
+    %cast = bitcast <2 x i32> %call to <8 x i8>
+    %shuffle = shufflevector <8 x i8> zeroinitializer, <8 x i8> %cast, <2 x i32> <i32 3, i32 11>
+    %zext = zext <2 x i8> %shuffle to <2 x i16>
+    %shl = shl <2 x i16> %zext, splat (i16 8)
+    store <2 x i16> %shl, ptr %p.load, align 4
+    tail call void @llvm.amdgcn.s.waitcnt(i32 0)
+    ret void
+  }
+
+  declare noundef align 4 ptr addrspace(4) @llvm.amdgcn.kernarg.segment.ptr()
+...
+---
+name:            foo
+tracksRegLiveness: true
+liveins:
+  - { reg: '$sgpr4_sgpr5', virtual-reg: '%3' }
+body:             |
+  bb.0.entry:
+    liveins: $sgpr4_sgpr5
+
+    ; CHECK-LABEL: name: foo
+    ; CHECK: liveins: $sgpr4_sgpr5
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:sgpr_64(p4) = COPY $sgpr4_sgpr5
+    ; CHECK-NEXT: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 0
+    ; CHECK-NEXT: undef [[COPY1:%[0-9]+]].sub0:sgpr_128 = COPY [[S_MOV_B32_]]
+    ; CHECK-NEXT: [[COPY1:%[0-9]+]].sub1:sgpr_128 = COPY [[S_MOV_B32_]]
+    ; CHECK-NEXT: [[COPY1:%[0-9]+]].sub2:sgpr_128 = COPY [[S_MOV_B32_]]
+    ; CHECK-NEXT: [[COPY1:%[0-9]+]].sub3:sgpr_128 = COPY [[S_MOV_B32_]]
+    ; CHECK-NEXT: [[BUFFER_LOAD_DWORDX2_OFFSET:%[0-9]+]]:vreg_64 = BUFFER_LOAD_DWORDX2_OFFSET [[COPY1]], 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s64), align 1, addrspace 8)
+    ; CHECK-NEXT: [[S_LOAD_DWORDX2_IMM:%[0-9]+]]:sreg_64_xexec = S_LOAD_DWORDX2_IMM [[COPY]](p4), 0, 0 :: (dereferenceable invariant load (s64) from %ir.p.kernarg.offset1, align 16, addrspace 4)
+    ; CHECK-NEXT: [[V_LSHRREV_B64_e64_:%[0-9]+]]:vreg_64 = V_LSHRREV_B64_e64 24, [[BUFFER_LOAD_DWORDX2_OFFSET]], implicit $exec
+    ; CHECK-NEXT: undef [[COPY2:%[0-9]+]].lo16:vgpr_32 = COPY [[V_LSHRREV_B64_e64_]].lo16
+    ; CHECK-NEXT: [[V_LSHLREV_B32_e64_:%[0-9]+]]:vgpr_32 = V_LSHLREV_B32_e64 16, [[COPY2]], implicit $exec
+    ; CHECK-NEXT: [[COPY3:%[0-9]+]]:vreg_64 = COPY [[S_LOAD_DWORDX2_IMM]]
+    ; CHECK-NEXT: [[V_PK_LSHLREV_B16_:%[0-9]+]]:vgpr_32 = V_PK_LSHLREV_B16 0, 8, 8, [[V_LSHLREV_B32_e64_]], 0, 0, 0, 0, 0, implicit $exec
+    ; CHECK-NEXT: FLAT_STORE_DWORD [[COPY3]], [[V_PK_LSHLREV_B16_]], 0, 0, implicit $exec, implicit $flat_scr :: (store (s32) into %ir.p.load)
+    ; CHECK-NEXT: S_WAITCNT 0
+    ; CHECK-NEXT: S_ENDPGM 0
+    %3:sgpr_64(p4) = COPY killed $sgpr4_sgpr5
+    %13:sreg_64_xexec = S_LOAD_DWORDX2_IMM killed %3(p4), 0, 0 :: (dereferenceable invariant load (s64) from %ir.p.kernarg.offset1, align 16, addrspace 4)
+    %14:sreg_32 = S_MOV_B32 0
+    undef %15.sub0:sgpr_128 = COPY %14
+    %15.sub1:sgpr_128 = COPY %14
+    %15.sub2:sgpr_128 = COPY %14
+    %15.sub3:sgpr_128 = COPY killed %14
+    %16:vreg_64 = BUFFER_LOAD_DWORDX2_OFFSET killed %15, 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s64), align 1, addrspace 8)
+    %26:vreg_64 = V_LSHRREV_B64_e64 24, killed %16, implicit $exec
+    undef %28.lo16:vgpr_32 = COPY killed %26.lo16
+    %30:vgpr_32 = V_LSHLREV_B32_e64 16, killed %28, implicit $exec
+    %24:vgpr_32 = V_PK_LSHLREV_B16 0, 8, 8, killed %30, 0, 0, 0, 0, 0, implicit $exec
+    %25:vreg_64 = COPY killed %13
+    FLAT_STORE_DWORD killed %25, killed %24, 0, 0, implicit $exec, implicit $flat_scr :: (store (s32) into %ir.p.load)
+    S_WAITCNT 0
+    S_ENDPGM 0
+...

llvm/test/CodeGen/AMDGPU/gcn-reg-pressure-true16-integer-overflow.mir

llvm/lib/Target/AMDGPU/GCNRegPressure.cpp

arsenm · 2025-06-20T02:10:40Z

llvm/lib/Target/AMDGPU/GCNRegPressure.cpp

+    Sign *= SIRegisterInfo::getNumCoveredRegs(NewMask) -
+            SIRegisterInfo::getNumCoveredRegs(PrevMask);


Should add a register count difference utility in SIRegisterInfo, can probably avoid doing popcount twice

Basically we will need to convert any pair of bits in a mask to 0b11 if it is 0b01 or 0b10. After that, we can continue to use ~PrevMask & NewMask and then call getNumCoveredRegs once (popcount once as well). However, I don't know how to do the preprocessing via bit manipulation. Any suggestions?

If we can't use bit manipulation, there is probably no point of adding a new utility function, since it would just do what we are doing here.

Taking inspiration from getNumCoveredRegs, I think

PrevMask | ((PrevMask & 0xAAAAAAAAAAAAAAAAULL) >> 1) | ((PrevMask & 0x5555555555555555ULL) << 1)

should work to transform 0b01/0b10 into 0b11.

I took an alternative approach, since the two functions are called in the beginning of this function anyway.

…16 is enabled Fixes SWDEV-537014.

llvmbot added the backend:AMDGPU label Jun 20, 2025

shiltian requested review from arsenm, vpykhtin, jrbyrnes, rampitec and lucas-rami June 20, 2025 02:03

arsenm reviewed Jun 20, 2025

View reviewed changes

shiltian force-pushed the users/shiltian/gcn-reg-pressure-crash-div-by-zero branch 2 times, most recently from cfdc82a to 4ada892 Compare June 20, 2025 02:26

[AMDGPU] Fix a potential integer overflow in GCNRegPressure when true…

5722a6e

…16 is enabled Fixes SWDEV-537014.

shiltian force-pushed the users/shiltian/gcn-reg-pressure-crash-div-by-zero branch from 4ada892 to 5722a6e Compare June 20, 2025 12:44

shiltian requested a review from arsenm June 20, 2025 12:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU] Fix a potential integer overflow in GCNRegPressure when true16 is enabled #144968

[AMDGPU] Fix a potential integer overflow in GCNRegPressure when true16 is enabled #144968

shiltian commented Jun 20, 2025

Uh oh!

shiltian commented Jun 20, 2025

Uh oh!

llvmbot commented Jun 20, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

arsenm Jun 20, 2025

Uh oh!

shiltian Jun 20, 2025

Uh oh!

shiltian Jun 20, 2025

Uh oh!

lucas-rami Jun 20, 2025

Uh oh!

shiltian Jun 20, 2025

Uh oh!

Uh oh!

		Sign *= SIRegisterInfo::getNumCoveredRegs(NewMask) -
		SIRegisterInfo::getNumCoveredRegs(PrevMask);

[AMDGPU] Fix a potential integer overflow in GCNRegPressure when true16 is enabled #144968

Are you sure you want to change the base?

[AMDGPU] Fix a potential integer overflow in GCNRegPressure when true16 is enabled #144968

Conversation

shiltian commented Jun 20, 2025

Uh oh!

shiltian commented Jun 20, 2025

Uh oh!

llvmbot commented Jun 20, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

arsenm Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

shiltian Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

shiltian Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

lucas-rami Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

shiltian Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!