[LV] Don't consider VPValues without underlying value as generating vectors #150992

lukel97 · 2025-07-28T16:38:48Z

In some cases during tail folding, the plan won't actually vectorize any values from the original IR, but it may vectorize e.g. VPWidenCanonicalIVRecipe because it's used for generating the active lane mask.

This is enough for willGenerateVectors to consider the VPlan as generating vectors, even though everything useful is scalarized.

This patch fixes this by checking that recipes have an underlying value set first.

Because some vectorized recipes may be hoisted into the preheader via LICM, we need to also start the search in the preheader to prevent regressions.

This also means we now scan the original scalar loop, so we need to handle VPIRInstructionSC.

llvmbot · 2025-07-28T16:39:18Z

@llvm/pr-subscribers-backend-systemz

@llvm/pr-subscribers-llvm-transforms

Author: Luke Lau (lukel97)

Changes

In some cases during tail folding, the plan won't actually vectorize any values from the original IR, but it may vectorize e.g. VPWidenCanonicalIVRecipe because it's used for generating the active lane mask.

This is enough for willGenerateVectors to consider the VPlan as generating vectors, even though everything useful is scalarized.

This patch fixes this by checking that recipes have an underlying value set first.

Because some vectorized recipes may be hoisted into the preheader via LICM, we need to also start the search in the preheader to prevent regressions.

This also means we now scan the original scalar loop, so we need to handle VPIRInstructionSC.

Full diff: https://github.com/llvm/llvm-project/pull/150992.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+5-1)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/induction-costs-sve.ll (+3-52)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 7b7efb8c6309e..ca6d05f0b39a5 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -4172,7 +4172,7 @@ static bool willGenerateVectors(VPlan &Plan, ElementCount VF,
   // Set of already visited types.
   DenseSet<Type *> Visited;
   for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
-           vp_depth_first_shallow(Plan.getVectorLoopRegion()->getEntry()))) {
+           vp_depth_first_deep(Plan.getVectorPreheader()))) {
     for (VPRecipeBase &R : *VPBB) {
       if (EphemeralRecipes.contains(&R))
         continue;
@@ -4192,6 +4192,7 @@ static bool willGenerateVectors(VPlan &Plan, ElementCount VF,
       case VPDef::VPEVLBasedIVPHISC:
       case VPDef::VPPredInstPHISC:
       case VPDef::VPBranchOnMaskSC:
+      case VPDef::VPIRInstructionSC:
         continue;
       case VPDef::VPReductionSC:
       case VPDef::VPActiveLaneMaskPHISC:
@@ -4247,6 +4248,9 @@ static bool willGenerateVectors(VPlan &Plan, ElementCount VF,
       // operand.
       VPValue *ToCheck =
           R.getNumDefinedValues() >= 1 ? R.getVPValue(0) : R.getOperand(1);
+      // Don't consider values that didn't come from the original scalar IR.
+      if (!ToCheck->getUnderlyingValue())
+        continue;
       Type *ScalarTy = TypeInfo.inferScalarType(ToCheck);
       if (!Visited.insert({ScalarTy}).second)
         continue;
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/induction-costs-sve.ll b/llvm/test/Transforms/LoopVectorize/AArch64/induction-costs-sve.ll
index 8b354d91909b1..c5a02f1572634 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/induction-costs-sve.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/induction-costs-sve.ll
@@ -790,65 +790,16 @@ define void @exit_cond_zext_iv(ptr %dst, i64 %N) {
 ; PRED-LABEL: define void @exit_cond_zext_iv(
 ; PRED-SAME: ptr [[DST:%.*]], i64 [[N:%.*]]) {
 ; PRED-NEXT:  [[ENTRY:.*]]:
-; PRED-NEXT:    [[UMAX1:%.*]] = call i64 @llvm.umax.i64(i64 [[N]], i64 1)
-; PRED-NEXT:    br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_SCEVCHECK:.*]]
-; PRED:       [[VECTOR_SCEVCHECK]]:
-; PRED-NEXT:    [[UMAX:%.*]] = call i64 @llvm.umax.i64(i64 [[N]], i64 1)
-; PRED-NEXT:    [[TMP0:%.*]] = add i64 [[UMAX]], -1
-; PRED-NEXT:    [[TMP2:%.*]] = trunc i64 [[TMP0]] to i32
-; PRED-NEXT:    [[TMP3:%.*]] = add i32 1, [[TMP2]]
-; PRED-NEXT:    [[TMP4:%.*]] = icmp ult i32 [[TMP3]], 1
-; PRED-NEXT:    [[TMP5:%.*]] = icmp ugt i64 [[TMP0]], 4294967295
-; PRED-NEXT:    [[TMP6:%.*]] = or i1 [[TMP4]], [[TMP5]]
-; PRED-NEXT:    br i1 [[TMP6]], label %[[SCALAR_PH]], label %[[VECTOR_PH:.*]]
-; PRED:       [[VECTOR_PH]]:
-; PRED-NEXT:    [[N_RND_UP:%.*]] = add i64 [[UMAX1]], 1
-; PRED-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], 2
-; PRED-NEXT:    [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
-; PRED-NEXT:    [[TRIP_COUNT_MINUS_1:%.*]] = sub i64 [[UMAX1]], 1
-; PRED-NEXT:    [[BROADCAST_SPLATINSERT2:%.*]] = insertelement <2 x i64> poison, i64 [[TRIP_COUNT_MINUS_1]], i64 0
-; PRED-NEXT:    [[BROADCAST_SPLAT3:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT2]], <2 x i64> poison, <2 x i32> zeroinitializer
-; PRED-NEXT:    br label %[[VECTOR_BODY:.*]]
-; PRED:       [[VECTOR_BODY]]:
-; PRED-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[PRED_STORE_CONTINUE5:.*]] ]
-; PRED-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <2 x i64> poison, i64 [[INDEX]], i64 0
-; PRED-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT]], <2 x i64> poison, <2 x i32> zeroinitializer
-; PRED-NEXT:    [[VEC_IV:%.*]] = add <2 x i64> [[BROADCAST_SPLAT]], <i64 0, i64 1>
-; PRED-NEXT:    [[TMP7:%.*]] = icmp ule <2 x i64> [[VEC_IV]], [[BROADCAST_SPLAT3]]
-; PRED-NEXT:    [[TMP8:%.*]] = extractelement <2 x i1> [[TMP7]], i32 0
-; PRED-NEXT:    br i1 [[TMP8]], label %[[PRED_STORE_IF:.*]], label %[[PRED_STORE_CONTINUE:.*]]
-; PRED:       [[PRED_STORE_IF]]:
-; PRED-NEXT:    [[TMP9:%.*]] = add i64 [[INDEX]], 0
-; PRED-NEXT:    [[TMP10:%.*]] = getelementptr { [100 x i32], i32, i32 }, ptr [[DST]], i64 [[TMP9]], i32 2
-; PRED-NEXT:    store i32 0, ptr [[TMP10]], align 8
-; PRED-NEXT:    br label %[[PRED_STORE_CONTINUE]]
-; PRED:       [[PRED_STORE_CONTINUE]]:
-; PRED-NEXT:    [[TMP11:%.*]] = extractelement <2 x i1> [[TMP7]], i32 1
-; PRED-NEXT:    br i1 [[TMP11]], label %[[PRED_STORE_IF4:.*]], label %[[PRED_STORE_CONTINUE5]]
-; PRED:       [[PRED_STORE_IF4]]:
-; PRED-NEXT:    [[TMP12:%.*]] = add i64 [[INDEX]], 1
-; PRED-NEXT:    [[TMP13:%.*]] = getelementptr { [100 x i32], i32, i32 }, ptr [[DST]], i64 [[TMP12]], i32 2
-; PRED-NEXT:    store i32 0, ptr [[TMP13]], align 8
-; PRED-NEXT:    br label %[[PRED_STORE_CONTINUE5]]
-; PRED:       [[PRED_STORE_CONTINUE5]]:
-; PRED-NEXT:    [[INDEX_NEXT]] = add i64 [[INDEX]], 2
-; PRED-NEXT:    [[TMP14:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
-; PRED-NEXT:    br i1 [[TMP14]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]
-; PRED:       [[MIDDLE_BLOCK]]:
-; PRED-NEXT:    br label %[[EXIT:.*]]
-; PRED:       [[SCALAR_PH]]:
-; PRED-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i32 [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_SCEVCHECK]] ]
-; PRED-NEXT:    [[BC_RESUME_VAL6:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_SCEVCHECK]] ]
 ; PRED-NEXT:    br label %[[LOOP:.*]]
 ; PRED:       [[LOOP]]:
-; PRED-NEXT:    [[IV_1:%.*]] = phi i32 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[IV_1_NEXT:%.*]], %[[LOOP]] ]
-; PRED-NEXT:    [[IV_CONV:%.*]] = phi i64 [ [[BC_RESUME_VAL6]], %[[SCALAR_PH]] ], [ [[IV_EXT:%.*]], %[[LOOP]] ]
+; PRED-NEXT:    [[IV_1:%.*]] = phi i32 [ 0, %[[ENTRY]] ], [ [[IV_1_NEXT:%.*]], %[[LOOP]] ]
+; PRED-NEXT:    [[IV_CONV:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ [[IV_EXT:%.*]], %[[LOOP]] ]
 ; PRED-NEXT:    [[GEP:%.*]] = getelementptr { [100 x i32], i32, i32 }, ptr [[DST]], i64 [[IV_CONV]], i32 2
 ; PRED-NEXT:    store i32 0, ptr [[GEP]], align 8
 ; PRED-NEXT:    [[IV_1_NEXT]] = add i32 [[IV_1]], 1
 ; PRED-NEXT:    [[IV_EXT]] = zext i32 [[IV_1_NEXT]] to i64
 ; PRED-NEXT:    [[C:%.*]] = icmp ult i64 [[IV_EXT]], [[N]]
-; PRED-NEXT:    br i1 [[C]], label %[[LOOP]], label %[[EXIT]], !llvm.loop [[LOOP11:![0-9]+]]
+; PRED-NEXT:    br i1 [[C]], label %[[LOOP]], label %[[EXIT:.*]]
 ; PRED:       [[EXIT]]:
 ; PRED-NEXT:    ret void
 ;

llvmbot · 2025-07-28T16:39:19Z

@llvm/pr-subscribers-vectorizers

Author: Luke Lau (lukel97)

Changes

In some cases during tail folding, the plan won't actually vectorize any values from the original IR, but it may vectorize e.g. VPWidenCanonicalIVRecipe because it's used for generating the active lane mask.

This is enough for willGenerateVectors to consider the VPlan as generating vectors, even though everything useful is scalarized.

This patch fixes this by checking that recipes have an underlying value set first.

Because some vectorized recipes may be hoisted into the preheader via LICM, we need to also start the search in the preheader to prevent regressions.

This also means we now scan the original scalar loop, so we need to handle VPIRInstructionSC.

Full diff: https://github.com/llvm/llvm-project/pull/150992.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+5-1)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/induction-costs-sve.ll (+3-52)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 7b7efb8c6309e..ca6d05f0b39a5 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -4172,7 +4172,7 @@ static bool willGenerateVectors(VPlan &Plan, ElementCount VF,
   // Set of already visited types.
   DenseSet<Type *> Visited;
   for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
-           vp_depth_first_shallow(Plan.getVectorLoopRegion()->getEntry()))) {
+           vp_depth_first_deep(Plan.getVectorPreheader()))) {
     for (VPRecipeBase &R : *VPBB) {
       if (EphemeralRecipes.contains(&R))
         continue;
@@ -4192,6 +4192,7 @@ static bool willGenerateVectors(VPlan &Plan, ElementCount VF,
       case VPDef::VPEVLBasedIVPHISC:
       case VPDef::VPPredInstPHISC:
       case VPDef::VPBranchOnMaskSC:
+      case VPDef::VPIRInstructionSC:
         continue;
       case VPDef::VPReductionSC:
       case VPDef::VPActiveLaneMaskPHISC:
@@ -4247,6 +4248,9 @@ static bool willGenerateVectors(VPlan &Plan, ElementCount VF,
       // operand.
       VPValue *ToCheck =
           R.getNumDefinedValues() >= 1 ? R.getVPValue(0) : R.getOperand(1);
+      // Don't consider values that didn't come from the original scalar IR.
+      if (!ToCheck->getUnderlyingValue())
+        continue;
       Type *ScalarTy = TypeInfo.inferScalarType(ToCheck);
       if (!Visited.insert({ScalarTy}).second)
         continue;
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/induction-costs-sve.ll b/llvm/test/Transforms/LoopVectorize/AArch64/induction-costs-sve.ll
index 8b354d91909b1..c5a02f1572634 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/induction-costs-sve.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/induction-costs-sve.ll
@@ -790,65 +790,16 @@ define void @exit_cond_zext_iv(ptr %dst, i64 %N) {
 ; PRED-LABEL: define void @exit_cond_zext_iv(
 ; PRED-SAME: ptr [[DST:%.*]], i64 [[N:%.*]]) {
 ; PRED-NEXT:  [[ENTRY:.*]]:
-; PRED-NEXT:    [[UMAX1:%.*]] = call i64 @llvm.umax.i64(i64 [[N]], i64 1)
-; PRED-NEXT:    br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_SCEVCHECK:.*]]
-; PRED:       [[VECTOR_SCEVCHECK]]:
-; PRED-NEXT:    [[UMAX:%.*]] = call i64 @llvm.umax.i64(i64 [[N]], i64 1)
-; PRED-NEXT:    [[TMP0:%.*]] = add i64 [[UMAX]], -1
-; PRED-NEXT:    [[TMP2:%.*]] = trunc i64 [[TMP0]] to i32
-; PRED-NEXT:    [[TMP3:%.*]] = add i32 1, [[TMP2]]
-; PRED-NEXT:    [[TMP4:%.*]] = icmp ult i32 [[TMP3]], 1
-; PRED-NEXT:    [[TMP5:%.*]] = icmp ugt i64 [[TMP0]], 4294967295
-; PRED-NEXT:    [[TMP6:%.*]] = or i1 [[TMP4]], [[TMP5]]
-; PRED-NEXT:    br i1 [[TMP6]], label %[[SCALAR_PH]], label %[[VECTOR_PH:.*]]
-; PRED:       [[VECTOR_PH]]:
-; PRED-NEXT:    [[N_RND_UP:%.*]] = add i64 [[UMAX1]], 1
-; PRED-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], 2
-; PRED-NEXT:    [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
-; PRED-NEXT:    [[TRIP_COUNT_MINUS_1:%.*]] = sub i64 [[UMAX1]], 1
-; PRED-NEXT:    [[BROADCAST_SPLATINSERT2:%.*]] = insertelement <2 x i64> poison, i64 [[TRIP_COUNT_MINUS_1]], i64 0
-; PRED-NEXT:    [[BROADCAST_SPLAT3:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT2]], <2 x i64> poison, <2 x i32> zeroinitializer
-; PRED-NEXT:    br label %[[VECTOR_BODY:.*]]
-; PRED:       [[VECTOR_BODY]]:
-; PRED-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[PRED_STORE_CONTINUE5:.*]] ]
-; PRED-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <2 x i64> poison, i64 [[INDEX]], i64 0
-; PRED-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT]], <2 x i64> poison, <2 x i32> zeroinitializer
-; PRED-NEXT:    [[VEC_IV:%.*]] = add <2 x i64> [[BROADCAST_SPLAT]], <i64 0, i64 1>
-; PRED-NEXT:    [[TMP7:%.*]] = icmp ule <2 x i64> [[VEC_IV]], [[BROADCAST_SPLAT3]]
-; PRED-NEXT:    [[TMP8:%.*]] = extractelement <2 x i1> [[TMP7]], i32 0
-; PRED-NEXT:    br i1 [[TMP8]], label %[[PRED_STORE_IF:.*]], label %[[PRED_STORE_CONTINUE:.*]]
-; PRED:       [[PRED_STORE_IF]]:
-; PRED-NEXT:    [[TMP9:%.*]] = add i64 [[INDEX]], 0
-; PRED-NEXT:    [[TMP10:%.*]] = getelementptr { [100 x i32], i32, i32 }, ptr [[DST]], i64 [[TMP9]], i32 2
-; PRED-NEXT:    store i32 0, ptr [[TMP10]], align 8
-; PRED-NEXT:    br label %[[PRED_STORE_CONTINUE]]
-; PRED:       [[PRED_STORE_CONTINUE]]:
-; PRED-NEXT:    [[TMP11:%.*]] = extractelement <2 x i1> [[TMP7]], i32 1
-; PRED-NEXT:    br i1 [[TMP11]], label %[[PRED_STORE_IF4:.*]], label %[[PRED_STORE_CONTINUE5]]
-; PRED:       [[PRED_STORE_IF4]]:
-; PRED-NEXT:    [[TMP12:%.*]] = add i64 [[INDEX]], 1
-; PRED-NEXT:    [[TMP13:%.*]] = getelementptr { [100 x i32], i32, i32 }, ptr [[DST]], i64 [[TMP12]], i32 2
-; PRED-NEXT:    store i32 0, ptr [[TMP13]], align 8
-; PRED-NEXT:    br label %[[PRED_STORE_CONTINUE5]]
-; PRED:       [[PRED_STORE_CONTINUE5]]:
-; PRED-NEXT:    [[INDEX_NEXT]] = add i64 [[INDEX]], 2
-; PRED-NEXT:    [[TMP14:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
-; PRED-NEXT:    br i1 [[TMP14]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]
-; PRED:       [[MIDDLE_BLOCK]]:
-; PRED-NEXT:    br label %[[EXIT:.*]]
-; PRED:       [[SCALAR_PH]]:
-; PRED-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i32 [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_SCEVCHECK]] ]
-; PRED-NEXT:    [[BC_RESUME_VAL6:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_SCEVCHECK]] ]
 ; PRED-NEXT:    br label %[[LOOP:.*]]
 ; PRED:       [[LOOP]]:
-; PRED-NEXT:    [[IV_1:%.*]] = phi i32 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[IV_1_NEXT:%.*]], %[[LOOP]] ]
-; PRED-NEXT:    [[IV_CONV:%.*]] = phi i64 [ [[BC_RESUME_VAL6]], %[[SCALAR_PH]] ], [ [[IV_EXT:%.*]], %[[LOOP]] ]
+; PRED-NEXT:    [[IV_1:%.*]] = phi i32 [ 0, %[[ENTRY]] ], [ [[IV_1_NEXT:%.*]], %[[LOOP]] ]
+; PRED-NEXT:    [[IV_CONV:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ [[IV_EXT:%.*]], %[[LOOP]] ]
 ; PRED-NEXT:    [[GEP:%.*]] = getelementptr { [100 x i32], i32, i32 }, ptr [[DST]], i64 [[IV_CONV]], i32 2
 ; PRED-NEXT:    store i32 0, ptr [[GEP]], align 8
 ; PRED-NEXT:    [[IV_1_NEXT]] = add i32 [[IV_1]], 1
 ; PRED-NEXT:    [[IV_EXT]] = zext i32 [[IV_1_NEXT]] to i64
 ; PRED-NEXT:    [[C:%.*]] = icmp ult i64 [[IV_EXT]], [[N]]
-; PRED-NEXT:    br i1 [[C]], label %[[LOOP]], label %[[EXIT]], !llvm.loop [[LOOP11:![0-9]+]]
+; PRED-NEXT:    br i1 [[C]], label %[[LOOP]], label %[[EXIT:.*]]
 ; PRED:       [[EXIT]]:
 ; PRED-NEXT:    ret void
 ;

artagnon

I think this makes sense to me, but I'm afraid I have to leave this up to @fhahn for a sanity-check.

artagnon · 2025-07-29T06:39:28Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

@@ -4247,6 +4248,9 @@ static bool willGenerateVectors(VPlan &Plan, ElementCount VF,
      // operand.
      VPValue *ToCheck =
          R.getNumDefinedValues() >= 1 ? R.getVPValue(0) : R.getOperand(1);
+      // Don't consider values that didn't come from the original scalar IR.
+      if (!ToCheck->getUnderlyingValue())


Use isLiveIn and getLiveInIRValue for clarity?

I wonder if getLiveInIRValue is necessary: perhaps checking isLiveIn is sufficient?

I think live ins are different, they're just LLVM IR values plumbed directly into VPlan like constants/loop invariant stuff etc.

IIUC getUnderlyingValue should return the original scalar IR value that a widened recipe came from, which might be loop variant.

david-arm · 2025-07-29T08:37:52Z

llvm/test/Transforms/LoopVectorize/AArch64/induction-costs-sve.ll

@@ -790,65 +790,16 @@ define void @exit_cond_zext_iv(ptr %dst, i64 %N) {
 ; PRED-LABEL: define void @exit_cond_zext_iv(
 ; PRED-SAME: ptr [[DST:%.*]], i64 [[N:%.*]]) {
 ; PRED-NEXT:  [[ENTRY:.*]]:
-; PRED-NEXT:    [[UMAX1:%.*]] = call i64 @llvm.umax.i64(i64 [[N]], i64 1)


Looks like @fhahn add this test to gain coverage of loops with casted indices. See 6084dcb. Presumably the reason we're not vectorising this loop any more is because for fixed-width vectors there are no vector operations (because the store is scalarised?) and for scalable vectors the cost is too high. It would be good if you could rewrite this test to avoid having to use scatter instructions in the loop, i.e. by modifying this

%gep = getelementptr {[100 x i32], i32, i32}, ptr %dst, i64 %iv.conv, i32 2

so that we're doing contiguous stores. Then you'll presumably need to add a new dedicated test to demonstrate the effect of the code change in this PR.

Good point, I ended up also needing to add +sve to this test, because without it the masked load always gets scalarized. I've added a new dedicated test in llvm/test/Transforms/LoopVectorize/AArch64/no-vector-instructions.ll

…ectors In some cases during tail folding, the plan won't actually vectorize any values from the original IR, but it may vectorize e.g. VPWidenCanonicalIVRecipe because it's used for generating the active lane mask. This is enough for willGenerateVectors to consider the VPlan as generating vectors, even though everything useful is scalarized. This patch fixes this by checking that recipes have an underlying value set first. Because some vectorized recipes may be hoisted into the preheader via LICM, we need to also start the search in the preheader to prevent regressions. This also means we now scan the original scalar loop, so we need to handle VPIRInstructionSC.

fhahn · 2025-07-29T10:13:43Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

@@ -4172,7 +4172,7 @@ static bool willGenerateVectors(VPlan &Plan, ElementCount VF,
  // Set of already visited types.
  DenseSet<Type *> Visited;
  for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
-           vp_depth_first_shallow(Plan.getVectorLoopRegion()->getEntry()))) {
+           vp_depth_first_deep(Plan.getVectorPreheader()))) {


Hm, intuitively this doesn't seem right to check anything outside the vector region

My understanding was that the vector preheader is outside of the vector region? So this will traverse through vector.ph -> vector loop region -> middle.block -> scalar.ph -> etc.

lukel97 · 2025-07-31T08:06:29Z

Closing for now because I have too many reviews open, and this PR isn't actually that important. Just wanted to post it here for reference!

lukel97 requested review from fhahn, david-arm and artagnon July 28, 2025 16:38

llvmbot added vectorizers llvm:transforms labels Jul 28, 2025

lukel97 mentioned this pull request Jul 28, 2025

[RISCV] Cost bf16/f16 vector non-unit memory accesses as legal without zvfhmin/zvfbfmin #150882

Merged

artagnon reviewed Jul 29, 2025

View reviewed changes

david-arm reviewed Jul 29, 2025

View reviewed changes

lukel97 added 2 commits July 29, 2025 17:43

Precommit tests

6abe4d9

lukel97 force-pushed the loop-vectorize/willGenerateVectors-underlyingValue branch from 3595042 to 320fe44 Compare July 29, 2025 10:00

llvmbot added the backend:SystemZ label Jul 29, 2025

fhahn reviewed Jul 29, 2025

View reviewed changes

lukel97 closed this Jul 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LV] Don't consider VPValues without underlying value as generating vectors #150992

[LV] Don't consider VPValues without underlying value as generating vectors #150992

lukel97 commented Jul 28, 2025

Uh oh!

llvmbot commented Jul 28, 2025 •

edited

Loading

Uh oh!

llvmbot commented Jul 28, 2025

Uh oh!

artagnon left a comment

Uh oh!

artagnon Jul 29, 2025

Uh oh!

artagnon Jul 29, 2025

Uh oh!

lukel97 Jul 29, 2025 •

edited

Loading

Uh oh!

david-arm Jul 29, 2025

Uh oh!

lukel97 Jul 29, 2025

Uh oh!

fhahn Jul 29, 2025

Uh oh!

lukel97 Jul 29, 2025

Uh oh!

lukel97 commented Jul 31, 2025

Uh oh!

Uh oh!

[LV] Don't consider VPValues without underlying value as generating vectors #150992

[LV] Don't consider VPValues without underlying value as generating vectors #150992

Conversation

lukel97 commented Jul 28, 2025

Uh oh!

llvmbot commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jul 28, 2025

Uh oh!

artagnon left a comment

Choose a reason for hiding this comment

Uh oh!

artagnon Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

artagnon Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

lukel97 Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

david-arm Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

lukel97 Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

fhahn Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

lukel97 Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

lukel97 commented Jul 31, 2025

Uh oh!

Uh oh!

llvmbot commented Jul 28, 2025 •

edited

Loading

lukel97 Jul 29, 2025 •

edited

Loading