[LV] Fix gap mask requirement for interleaved access #151105

Mel-Chen · 2025-07-29T08:41:49Z

When interleaved stores contain gaps, a mask is required to skip the gaps, regardless of whether scalar epilogues are allowed.
This patch corrects the condition under which a gap mask is needed, ensuring consistency between the legacy and VPlan-based cost models and avoiding assertion failures.

Related #149981
Based #151112

llvmbot · 2025-07-29T08:42:20Z

@llvm/pr-subscribers-llvm-analysis
@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-backend-risc-v

Author: Mel Chen (Mel-Chen)

Changes

When interleaved stores contain gaps, a mask is required to skip the gaps, regardless of whether scalar epilogues are allowed.
This patch corrects the condition under which a gap mask is needed, ensuring consistency between the legacy and VPlan-based cost models and avoiding assertion failures.

Related #149981

Full diff: https://github.com/llvm/llvm-project/pull/151105.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+2-1)
(added) llvm/test/Transforms/LoopVectorize/RISCV/interleaved-store-with-gap.ll (+65)

diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index 8de05c16041fa..45a3a52bf3b96 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -2493,7 +2493,8 @@ void VPlanTransforms::createInterleaveGroups(
       }
 
     bool NeedsMaskForGaps =
-        IG->requiresScalarEpilogue() && !ScalarEpilogueAllowed;
+        (IG->requiresScalarEpilogue() && !ScalarEpilogueAllowed) ||
+        (!StoredValues.empty() && (IG->getNumMembers() < IG->getFactor()));
 
     Instruction *IRInsertPos = IG->getInsertPos();
     auto *InsertPos =
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/interleaved-store-with-gap.ll b/llvm/test/Transforms/LoopVectorize/RISCV/interleaved-store-with-gap.ll
new file mode 100644
index 0000000000000..8f504cba4242f
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/interleaved-store-with-gap.ll
@@ -0,0 +1,65 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -mtriple=riscv64 -mattr=+v -passes=loop-vectorize -mattr=+v \
+; RUN: -scalable-vectorization=off -enable-masked-interleaved-mem-accesses \
+; RUN: -force-vector-interleave=1 -riscv-v-vector-bits-min=1024 -S < %s | FileCheck %s
+
+define void @store_factor_2_with_tail_gap(i64 %n, ptr %a) {
+; CHECK-LABEL: define void @store_factor_2_with_tail_gap(
+; CHECK-SAME: i64 [[N:%.*]], ptr [[A:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:  [[ENTRY:.*]]:
+; CHECK-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], 16
+; CHECK-NEXT:    br i1 [[MIN_ITERS_CHECK]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; CHECK:       [[VECTOR_PH]]:
+; CHECK-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[N]], 16
+; CHECK-NEXT:    [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
+; CHECK-NEXT:    br label %[[VECTOR_BODY:.*]]
+; CHECK:       [[VECTOR_BODY]]:
+; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; CHECK-NEXT:    [[VEC_IND:%.*]] = phi <16 x i64> [ <i64 0, i64 1, i64 2, i64 3, i64 4, i64 5, i64 6, i64 7, i64 8, i64 9, i64 10, i64 11, i64 12, i64 13, i64 14, i64 15>, %[[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; CHECK-NEXT:    [[TMP0:%.*]] = shl nsw i64 [[INDEX]], 1
+; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP0]]
+; CHECK-NEXT:    [[TMP2:%.*]] = shufflevector <16 x i64> [[VEC_IND]], <16 x i64> poison, <32 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
+; CHECK-NEXT:    [[INTERLEAVED_VEC:%.*]] = shufflevector <32 x i64> [[TMP2]], <32 x i64> poison, <32 x i32> <i32 0, i32 16, i32 1, i32 17, i32 2, i32 18, i32 3, i32 19, i32 4, i32 20, i32 5, i32 21, i32 6, i32 22, i32 7, i32 23, i32 8, i32 24, i32 9, i32 25, i32 10, i32 26, i32 11, i32 27, i32 12, i32 28, i32 13, i32 29, i32 14, i32 30, i32 15, i32 31>
+; CHECK-NEXT:    call void @llvm.masked.store.v32i64.p0(<32 x i64> [[INTERLEAVED_VEC]], ptr [[TMP1]], i32 8, <32 x i1> <i1 true, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 false>)
+; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16
+; CHECK-NEXT:    [[VEC_IND_NEXT]] = add <16 x i64> [[VEC_IND]], splat (i64 16)
+; CHECK-NEXT:    [[TMP3:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; CHECK-NEXT:    br i1 [[TMP3]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; CHECK:       [[MIDDLE_BLOCK]]:
+; CHECK-NEXT:    [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
+; CHECK-NEXT:    br i1 [[CMP_N]], label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK:       [[SCALAR_PH]]:
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
+; CHECK-NEXT:    br label %[[FOR_BODY:.*]]
+; CHECK:       [[FOR_BODY]]:
+; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
+; CHECK-NEXT:    [[TMP4:%.*]] = shl nsw i64 [[INDVARS_IV]], 1
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP4]]
+; CHECK-NEXT:    store i64 [[INDVARS_IV]], ptr [[ARRAYIDX]], align 8
+; CHECK-NEXT:    [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
+; CHECK-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]
+; CHECK-NEXT:    br i1 [[EXITCOND_NOT]], label %[[EXIT]], label %[[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
+; CHECK:       [[EXIT]]:
+; CHECK-NEXT:    ret void
+;
+entry:
+  br label %for.body
+
+for.body:
+  %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
+  %0 = shl nsw i64 %indvars.iv, 1
+  %arrayidx = getelementptr inbounds i64, ptr %a, i64 %0
+  store i64 %indvars.iv, ptr %arrayidx, align 8
+  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
+  %exitcond.not = icmp eq i64 %indvars.iv.next, %n
+  br i1 %exitcond.not, label %exit, label %for.body
+
+exit:
+  ret void
+}
+;.
+; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]]}
+; CHECK: [[META1]] = !{!"llvm.loop.isvectorized", i32 1}
+; CHECK: [[META2]] = !{!"llvm.loop.unroll.runtime.disable"}
+; CHECK: [[LOOP3]] = distinct !{[[LOOP3]], [[META2]], [[META1]]}
+;.

fhahn · 2025-07-29T08:53:50Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

@@ -2493,7 +2493,8 @@ void VPlanTransforms::createInterleaveGroups(
      }

    bool NeedsMaskForGaps =
-        IG->requiresScalarEpilogue() && !ScalarEpilogueAllowed;
+        (IG->requiresScalarEpilogue() && !ScalarEpilogueAllowed) ||
+        (!StoredValues.empty() && (IG->getNumMembers() < IG->getFactor()));


Better make the check for gaps part of InterleaveGroup and use it in both places?

I agree. #151112

I think we can also do this check in VPInterleaveRecipe::execute itself, but for the sake of keeping this in sync with UseMaskForGaps from LoopVectorizationCostModel::getInterleaveGroupCost its probably better to just do it here.

lukel97

LGTM. I guess this became out of sync in 67278b8?

llvm/test/Transforms/LoopVectorize/RISCV/interleaved-store-with-gap.ll

lukel97 · 2025-07-29T10:54:50Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

@@ -2493,7 +2493,8 @@ void VPlanTransforms::createInterleaveGroups(
      }

    bool NeedsMaskForGaps =
-        IG->requiresScalarEpilogue() && !ScalarEpilogueAllowed;
+        (IG->requiresScalarEpilogue() && !ScalarEpilogueAllowed) ||
+        (!StoredValues.empty() && (IG->getNumMembers() < IG->getFactor()));


I think we can also do this check in VPInterleaveRecipe::execute itself, but for the sake of keeping this in sync with UseMaskForGaps from LoopVectorizationCostModel::getInterleaveGroupCost its probably better to just do it here.

fhahn · 2025-07-29T10:57:54Z

llvm/test/Transforms/LoopVectorize/RISCV/interleaved-store-with-gap.ll

+; RUN: -scalable-vectorization=off -enable-masked-interleaved-mem-accesses \
+; RUN: -force-vector-interleave=1 -riscv-v-vector-bits-min=1024 -S < %s | FileCheck %s
+
+define void @store_factor_2_with_tail_gap(i64 %n, ptr %a) {


Can we also have a target-independent test that checks for the mask?

There is no change in the generated IR. In fact, VPInterleaveRecipe::execute does not produce incorrect IR. Even when NeedsMaskForGaps is set incorrectly, we still correctly generate the gap mask for interleaved stores with gaps.

// Vectorize the interleaved store group. Value *MaskForGaps = createBitMaskForGaps(State.Builder, State.VF.getKnownMinValue(), *Group); assert((!MaskForGaps || !State.VF.isScalable()) && "masking gaps for scalable vectors is not yet supported."); ArrayRef<VPValue *> StoredValues = getStoredValues(); // Collect the stored vector from each member. SmallVector<Value *, 4> StoredVecs; unsigned StoredIdx = 0; for (unsigned i = 0; i < InterleaveFactor; i++) { assert((Group->getMember(i) || MaskForGaps) && "Fail to get a member from an interleaved store group"); Instruction *Member = Group->getMember(i); ...

The issue only affects VPInterleaveRecipe::computeCost.

Mel-Chen requested review from asb, fhahn, preames, lukel97 and alexey-bataev July 29, 2025 08:41

llvmbot added backend:RISC-V vectorizers llvm:transforms labels Jul 29, 2025

fhahn reviewed Jul 29, 2025

View reviewed changes

Mel-Chen added 2 commits July 29, 2025 02:19

nfc, isFull for group

6993d3c

fix

51b6f48

lukel97 approved these changes Jul 29, 2025

View reviewed changes

fhahn reviewed Jul 29, 2025

View reviewed changes

Mel-Chen added 3 commits July 30, 2025 00:16

apply isFull

68f81ee

harden assertion

0cccb2d

apply --check-globals=none

5bfcb30

Mel-Chen force-pushed the fix-store-gap branch from 04d7890 to 5bfcb30 Compare July 30, 2025 07:56

llvmbot added the llvm:analysis Includes value tracking, cost tables and constant folding label Jul 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LV] Fix gap mask requirement for interleaved access #151105

[LV] Fix gap mask requirement for interleaved access #151105

Mel-Chen commented Jul 29, 2025 •

edited

Loading

Uh oh!

llvmbot commented Jul 29, 2025 •

edited

Loading

Uh oh!

fhahn Jul 29, 2025

Uh oh!

Mel-Chen Jul 29, 2025

Uh oh!

lukel97 Jul 29, 2025

Uh oh!

lukel97 left a comment

Uh oh!

Uh oh!

lukel97 Jul 29, 2025

Uh oh!

fhahn Jul 29, 2025

Uh oh!

Mel-Chen Jul 30, 2025

Uh oh!

Uh oh!

[LV] Fix gap mask requirement for interleaved access #151105

Are you sure you want to change the base?

[LV] Fix gap mask requirement for interleaved access #151105

Conversation

Mel-Chen commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fhahn Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

Mel-Chen Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

lukel97 Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

lukel97 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lukel97 Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

fhahn Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

Mel-Chen Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Mel-Chen commented Jul 29, 2025 •

edited

Loading

llvmbot commented Jul 29, 2025 •

edited

Loading