-
Notifications
You must be signed in to change notification settings - Fork 14.6k
[LV] Don't consider VPValues without underlying value as generating vectors #150992
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LV] Don't consider VPValues without underlying value as generating vectors #150992
Conversation
@llvm/pr-subscribers-backend-systemz @llvm/pr-subscribers-llvm-transforms Author: Luke Lau (lukel97) ChangesIn some cases during tail folding, the plan won't actually vectorize any values from the original IR, but it may vectorize e.g. VPWidenCanonicalIVRecipe because it's used for generating the active lane mask. This is enough for willGenerateVectors to consider the VPlan as generating vectors, even though everything useful is scalarized. This patch fixes this by checking that recipes have an underlying value set first. Because some vectorized recipes may be hoisted into the preheader via LICM, we need to also start the search in the preheader to prevent regressions. This also means we now scan the original scalar loop, so we need to handle VPIRInstructionSC. Full diff: https://github.com/llvm/llvm-project/pull/150992.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 7b7efb8c6309e..ca6d05f0b39a5 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -4172,7 +4172,7 @@ static bool willGenerateVectors(VPlan &Plan, ElementCount VF,
// Set of already visited types.
DenseSet<Type *> Visited;
for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
- vp_depth_first_shallow(Plan.getVectorLoopRegion()->getEntry()))) {
+ vp_depth_first_deep(Plan.getVectorPreheader()))) {
for (VPRecipeBase &R : *VPBB) {
if (EphemeralRecipes.contains(&R))
continue;
@@ -4192,6 +4192,7 @@ static bool willGenerateVectors(VPlan &Plan, ElementCount VF,
case VPDef::VPEVLBasedIVPHISC:
case VPDef::VPPredInstPHISC:
case VPDef::VPBranchOnMaskSC:
+ case VPDef::VPIRInstructionSC:
continue;
case VPDef::VPReductionSC:
case VPDef::VPActiveLaneMaskPHISC:
@@ -4247,6 +4248,9 @@ static bool willGenerateVectors(VPlan &Plan, ElementCount VF,
// operand.
VPValue *ToCheck =
R.getNumDefinedValues() >= 1 ? R.getVPValue(0) : R.getOperand(1);
+ // Don't consider values that didn't come from the original scalar IR.
+ if (!ToCheck->getUnderlyingValue())
+ continue;
Type *ScalarTy = TypeInfo.inferScalarType(ToCheck);
if (!Visited.insert({ScalarTy}).second)
continue;
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/induction-costs-sve.ll b/llvm/test/Transforms/LoopVectorize/AArch64/induction-costs-sve.ll
index 8b354d91909b1..c5a02f1572634 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/induction-costs-sve.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/induction-costs-sve.ll
@@ -790,65 +790,16 @@ define void @exit_cond_zext_iv(ptr %dst, i64 %N) {
; PRED-LABEL: define void @exit_cond_zext_iv(
; PRED-SAME: ptr [[DST:%.*]], i64 [[N:%.*]]) {
; PRED-NEXT: [[ENTRY:.*]]:
-; PRED-NEXT: [[UMAX1:%.*]] = call i64 @llvm.umax.i64(i64 [[N]], i64 1)
-; PRED-NEXT: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_SCEVCHECK:.*]]
-; PRED: [[VECTOR_SCEVCHECK]]:
-; PRED-NEXT: [[UMAX:%.*]] = call i64 @llvm.umax.i64(i64 [[N]], i64 1)
-; PRED-NEXT: [[TMP0:%.*]] = add i64 [[UMAX]], -1
-; PRED-NEXT: [[TMP2:%.*]] = trunc i64 [[TMP0]] to i32
-; PRED-NEXT: [[TMP3:%.*]] = add i32 1, [[TMP2]]
-; PRED-NEXT: [[TMP4:%.*]] = icmp ult i32 [[TMP3]], 1
-; PRED-NEXT: [[TMP5:%.*]] = icmp ugt i64 [[TMP0]], 4294967295
-; PRED-NEXT: [[TMP6:%.*]] = or i1 [[TMP4]], [[TMP5]]
-; PRED-NEXT: br i1 [[TMP6]], label %[[SCALAR_PH]], label %[[VECTOR_PH:.*]]
-; PRED: [[VECTOR_PH]]:
-; PRED-NEXT: [[N_RND_UP:%.*]] = add i64 [[UMAX1]], 1
-; PRED-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], 2
-; PRED-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
-; PRED-NEXT: [[TRIP_COUNT_MINUS_1:%.*]] = sub i64 [[UMAX1]], 1
-; PRED-NEXT: [[BROADCAST_SPLATINSERT2:%.*]] = insertelement <2 x i64> poison, i64 [[TRIP_COUNT_MINUS_1]], i64 0
-; PRED-NEXT: [[BROADCAST_SPLAT3:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT2]], <2 x i64> poison, <2 x i32> zeroinitializer
-; PRED-NEXT: br label %[[VECTOR_BODY:.*]]
-; PRED: [[VECTOR_BODY]]:
-; PRED-NEXT: [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[PRED_STORE_CONTINUE5:.*]] ]
-; PRED-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <2 x i64> poison, i64 [[INDEX]], i64 0
-; PRED-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT]], <2 x i64> poison, <2 x i32> zeroinitializer
-; PRED-NEXT: [[VEC_IV:%.*]] = add <2 x i64> [[BROADCAST_SPLAT]], <i64 0, i64 1>
-; PRED-NEXT: [[TMP7:%.*]] = icmp ule <2 x i64> [[VEC_IV]], [[BROADCAST_SPLAT3]]
-; PRED-NEXT: [[TMP8:%.*]] = extractelement <2 x i1> [[TMP7]], i32 0
-; PRED-NEXT: br i1 [[TMP8]], label %[[PRED_STORE_IF:.*]], label %[[PRED_STORE_CONTINUE:.*]]
-; PRED: [[PRED_STORE_IF]]:
-; PRED-NEXT: [[TMP9:%.*]] = add i64 [[INDEX]], 0
-; PRED-NEXT: [[TMP10:%.*]] = getelementptr { [100 x i32], i32, i32 }, ptr [[DST]], i64 [[TMP9]], i32 2
-; PRED-NEXT: store i32 0, ptr [[TMP10]], align 8
-; PRED-NEXT: br label %[[PRED_STORE_CONTINUE]]
-; PRED: [[PRED_STORE_CONTINUE]]:
-; PRED-NEXT: [[TMP11:%.*]] = extractelement <2 x i1> [[TMP7]], i32 1
-; PRED-NEXT: br i1 [[TMP11]], label %[[PRED_STORE_IF4:.*]], label %[[PRED_STORE_CONTINUE5]]
-; PRED: [[PRED_STORE_IF4]]:
-; PRED-NEXT: [[TMP12:%.*]] = add i64 [[INDEX]], 1
-; PRED-NEXT: [[TMP13:%.*]] = getelementptr { [100 x i32], i32, i32 }, ptr [[DST]], i64 [[TMP12]], i32 2
-; PRED-NEXT: store i32 0, ptr [[TMP13]], align 8
-; PRED-NEXT: br label %[[PRED_STORE_CONTINUE5]]
-; PRED: [[PRED_STORE_CONTINUE5]]:
-; PRED-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 2
-; PRED-NEXT: [[TMP14:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
-; PRED-NEXT: br i1 [[TMP14]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]
-; PRED: [[MIDDLE_BLOCK]]:
-; PRED-NEXT: br label %[[EXIT:.*]]
-; PRED: [[SCALAR_PH]]:
-; PRED-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_SCEVCHECK]] ]
-; PRED-NEXT: [[BC_RESUME_VAL6:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_SCEVCHECK]] ]
; PRED-NEXT: br label %[[LOOP:.*]]
; PRED: [[LOOP]]:
-; PRED-NEXT: [[IV_1:%.*]] = phi i32 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[IV_1_NEXT:%.*]], %[[LOOP]] ]
-; PRED-NEXT: [[IV_CONV:%.*]] = phi i64 [ [[BC_RESUME_VAL6]], %[[SCALAR_PH]] ], [ [[IV_EXT:%.*]], %[[LOOP]] ]
+; PRED-NEXT: [[IV_1:%.*]] = phi i32 [ 0, %[[ENTRY]] ], [ [[IV_1_NEXT:%.*]], %[[LOOP]] ]
+; PRED-NEXT: [[IV_CONV:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ [[IV_EXT:%.*]], %[[LOOP]] ]
; PRED-NEXT: [[GEP:%.*]] = getelementptr { [100 x i32], i32, i32 }, ptr [[DST]], i64 [[IV_CONV]], i32 2
; PRED-NEXT: store i32 0, ptr [[GEP]], align 8
; PRED-NEXT: [[IV_1_NEXT]] = add i32 [[IV_1]], 1
; PRED-NEXT: [[IV_EXT]] = zext i32 [[IV_1_NEXT]] to i64
; PRED-NEXT: [[C:%.*]] = icmp ult i64 [[IV_EXT]], [[N]]
-; PRED-NEXT: br i1 [[C]], label %[[LOOP]], label %[[EXIT]], !llvm.loop [[LOOP11:![0-9]+]]
+; PRED-NEXT: br i1 [[C]], label %[[LOOP]], label %[[EXIT:.*]]
; PRED: [[EXIT]]:
; PRED-NEXT: ret void
;
|
@llvm/pr-subscribers-vectorizers Author: Luke Lau (lukel97) ChangesIn some cases during tail folding, the plan won't actually vectorize any values from the original IR, but it may vectorize e.g. VPWidenCanonicalIVRecipe because it's used for generating the active lane mask. This is enough for willGenerateVectors to consider the VPlan as generating vectors, even though everything useful is scalarized. This patch fixes this by checking that recipes have an underlying value set first. Because some vectorized recipes may be hoisted into the preheader via LICM, we need to also start the search in the preheader to prevent regressions. This also means we now scan the original scalar loop, so we need to handle VPIRInstructionSC. Full diff: https://github.com/llvm/llvm-project/pull/150992.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 7b7efb8c6309e..ca6d05f0b39a5 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -4172,7 +4172,7 @@ static bool willGenerateVectors(VPlan &Plan, ElementCount VF,
// Set of already visited types.
DenseSet<Type *> Visited;
for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
- vp_depth_first_shallow(Plan.getVectorLoopRegion()->getEntry()))) {
+ vp_depth_first_deep(Plan.getVectorPreheader()))) {
for (VPRecipeBase &R : *VPBB) {
if (EphemeralRecipes.contains(&R))
continue;
@@ -4192,6 +4192,7 @@ static bool willGenerateVectors(VPlan &Plan, ElementCount VF,
case VPDef::VPEVLBasedIVPHISC:
case VPDef::VPPredInstPHISC:
case VPDef::VPBranchOnMaskSC:
+ case VPDef::VPIRInstructionSC:
continue;
case VPDef::VPReductionSC:
case VPDef::VPActiveLaneMaskPHISC:
@@ -4247,6 +4248,9 @@ static bool willGenerateVectors(VPlan &Plan, ElementCount VF,
// operand.
VPValue *ToCheck =
R.getNumDefinedValues() >= 1 ? R.getVPValue(0) : R.getOperand(1);
+ // Don't consider values that didn't come from the original scalar IR.
+ if (!ToCheck->getUnderlyingValue())
+ continue;
Type *ScalarTy = TypeInfo.inferScalarType(ToCheck);
if (!Visited.insert({ScalarTy}).second)
continue;
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/induction-costs-sve.ll b/llvm/test/Transforms/LoopVectorize/AArch64/induction-costs-sve.ll
index 8b354d91909b1..c5a02f1572634 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/induction-costs-sve.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/induction-costs-sve.ll
@@ -790,65 +790,16 @@ define void @exit_cond_zext_iv(ptr %dst, i64 %N) {
; PRED-LABEL: define void @exit_cond_zext_iv(
; PRED-SAME: ptr [[DST:%.*]], i64 [[N:%.*]]) {
; PRED-NEXT: [[ENTRY:.*]]:
-; PRED-NEXT: [[UMAX1:%.*]] = call i64 @llvm.umax.i64(i64 [[N]], i64 1)
-; PRED-NEXT: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_SCEVCHECK:.*]]
-; PRED: [[VECTOR_SCEVCHECK]]:
-; PRED-NEXT: [[UMAX:%.*]] = call i64 @llvm.umax.i64(i64 [[N]], i64 1)
-; PRED-NEXT: [[TMP0:%.*]] = add i64 [[UMAX]], -1
-; PRED-NEXT: [[TMP2:%.*]] = trunc i64 [[TMP0]] to i32
-; PRED-NEXT: [[TMP3:%.*]] = add i32 1, [[TMP2]]
-; PRED-NEXT: [[TMP4:%.*]] = icmp ult i32 [[TMP3]], 1
-; PRED-NEXT: [[TMP5:%.*]] = icmp ugt i64 [[TMP0]], 4294967295
-; PRED-NEXT: [[TMP6:%.*]] = or i1 [[TMP4]], [[TMP5]]
-; PRED-NEXT: br i1 [[TMP6]], label %[[SCALAR_PH]], label %[[VECTOR_PH:.*]]
-; PRED: [[VECTOR_PH]]:
-; PRED-NEXT: [[N_RND_UP:%.*]] = add i64 [[UMAX1]], 1
-; PRED-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], 2
-; PRED-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
-; PRED-NEXT: [[TRIP_COUNT_MINUS_1:%.*]] = sub i64 [[UMAX1]], 1
-; PRED-NEXT: [[BROADCAST_SPLATINSERT2:%.*]] = insertelement <2 x i64> poison, i64 [[TRIP_COUNT_MINUS_1]], i64 0
-; PRED-NEXT: [[BROADCAST_SPLAT3:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT2]], <2 x i64> poison, <2 x i32> zeroinitializer
-; PRED-NEXT: br label %[[VECTOR_BODY:.*]]
-; PRED: [[VECTOR_BODY]]:
-; PRED-NEXT: [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[PRED_STORE_CONTINUE5:.*]] ]
-; PRED-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <2 x i64> poison, i64 [[INDEX]], i64 0
-; PRED-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT]], <2 x i64> poison, <2 x i32> zeroinitializer
-; PRED-NEXT: [[VEC_IV:%.*]] = add <2 x i64> [[BROADCAST_SPLAT]], <i64 0, i64 1>
-; PRED-NEXT: [[TMP7:%.*]] = icmp ule <2 x i64> [[VEC_IV]], [[BROADCAST_SPLAT3]]
-; PRED-NEXT: [[TMP8:%.*]] = extractelement <2 x i1> [[TMP7]], i32 0
-; PRED-NEXT: br i1 [[TMP8]], label %[[PRED_STORE_IF:.*]], label %[[PRED_STORE_CONTINUE:.*]]
-; PRED: [[PRED_STORE_IF]]:
-; PRED-NEXT: [[TMP9:%.*]] = add i64 [[INDEX]], 0
-; PRED-NEXT: [[TMP10:%.*]] = getelementptr { [100 x i32], i32, i32 }, ptr [[DST]], i64 [[TMP9]], i32 2
-; PRED-NEXT: store i32 0, ptr [[TMP10]], align 8
-; PRED-NEXT: br label %[[PRED_STORE_CONTINUE]]
-; PRED: [[PRED_STORE_CONTINUE]]:
-; PRED-NEXT: [[TMP11:%.*]] = extractelement <2 x i1> [[TMP7]], i32 1
-; PRED-NEXT: br i1 [[TMP11]], label %[[PRED_STORE_IF4:.*]], label %[[PRED_STORE_CONTINUE5]]
-; PRED: [[PRED_STORE_IF4]]:
-; PRED-NEXT: [[TMP12:%.*]] = add i64 [[INDEX]], 1
-; PRED-NEXT: [[TMP13:%.*]] = getelementptr { [100 x i32], i32, i32 }, ptr [[DST]], i64 [[TMP12]], i32 2
-; PRED-NEXT: store i32 0, ptr [[TMP13]], align 8
-; PRED-NEXT: br label %[[PRED_STORE_CONTINUE5]]
-; PRED: [[PRED_STORE_CONTINUE5]]:
-; PRED-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 2
-; PRED-NEXT: [[TMP14:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
-; PRED-NEXT: br i1 [[TMP14]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]
-; PRED: [[MIDDLE_BLOCK]]:
-; PRED-NEXT: br label %[[EXIT:.*]]
-; PRED: [[SCALAR_PH]]:
-; PRED-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_SCEVCHECK]] ]
-; PRED-NEXT: [[BC_RESUME_VAL6:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_SCEVCHECK]] ]
; PRED-NEXT: br label %[[LOOP:.*]]
; PRED: [[LOOP]]:
-; PRED-NEXT: [[IV_1:%.*]] = phi i32 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[IV_1_NEXT:%.*]], %[[LOOP]] ]
-; PRED-NEXT: [[IV_CONV:%.*]] = phi i64 [ [[BC_RESUME_VAL6]], %[[SCALAR_PH]] ], [ [[IV_EXT:%.*]], %[[LOOP]] ]
+; PRED-NEXT: [[IV_1:%.*]] = phi i32 [ 0, %[[ENTRY]] ], [ [[IV_1_NEXT:%.*]], %[[LOOP]] ]
+; PRED-NEXT: [[IV_CONV:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ [[IV_EXT:%.*]], %[[LOOP]] ]
; PRED-NEXT: [[GEP:%.*]] = getelementptr { [100 x i32], i32, i32 }, ptr [[DST]], i64 [[IV_CONV]], i32 2
; PRED-NEXT: store i32 0, ptr [[GEP]], align 8
; PRED-NEXT: [[IV_1_NEXT]] = add i32 [[IV_1]], 1
; PRED-NEXT: [[IV_EXT]] = zext i32 [[IV_1_NEXT]] to i64
; PRED-NEXT: [[C:%.*]] = icmp ult i64 [[IV_EXT]], [[N]]
-; PRED-NEXT: br i1 [[C]], label %[[LOOP]], label %[[EXIT]], !llvm.loop [[LOOP11:![0-9]+]]
+; PRED-NEXT: br i1 [[C]], label %[[LOOP]], label %[[EXIT:.*]]
; PRED: [[EXIT]]:
; PRED-NEXT: ret void
;
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this makes sense to me, but I'm afraid I have to leave this up to @fhahn for a sanity-check.
@@ -4247,6 +4248,9 @@ static bool willGenerateVectors(VPlan &Plan, ElementCount VF, | |||
// operand. | |||
VPValue *ToCheck = | |||
R.getNumDefinedValues() >= 1 ? R.getVPValue(0) : R.getOperand(1); | |||
// Don't consider values that didn't come from the original scalar IR. | |||
if (!ToCheck->getUnderlyingValue()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use isLiveIn and getLiveInIRValue for clarity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if getLiveInIRValue is necessary: perhaps checking isLiveIn is sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think live ins are different, they're just LLVM IR values plumbed directly into VPlan like constants/loop invariant stuff etc.
IIUC getUnderlyingValue should return the original scalar IR value that a widened recipe came from, which might be loop variant.
@@ -790,65 +790,16 @@ define void @exit_cond_zext_iv(ptr %dst, i64 %N) { | |||
; PRED-LABEL: define void @exit_cond_zext_iv( | |||
; PRED-SAME: ptr [[DST:%.*]], i64 [[N:%.*]]) { | |||
; PRED-NEXT: [[ENTRY:.*]]: | |||
; PRED-NEXT: [[UMAX1:%.*]] = call i64 @llvm.umax.i64(i64 [[N]], i64 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like @fhahn add this test to gain coverage of loops with casted indices. See 6084dcb. Presumably the reason we're not vectorising this loop any more is because for fixed-width vectors there are no vector operations (because the store is scalarised?) and for scalable vectors the cost is too high. It would be good if you could rewrite this test to avoid having to use scatter instructions in the loop, i.e. by modifying this
%gep = getelementptr {[100 x i32], i32, i32}, ptr %dst, i64 %iv.conv, i32 2
so that we're doing contiguous stores. Then you'll presumably need to add a new dedicated test to demonstrate the effect of the code change in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I ended up also needing to add +sve to this test, because without it the masked load always gets scalarized. I've added a new dedicated test in llvm/test/Transforms/LoopVectorize/AArch64/no-vector-instructions.ll
…ectors In some cases during tail folding, the plan won't actually vectorize any values from the original IR, but it may vectorize e.g. VPWidenCanonicalIVRecipe because it's used for generating the active lane mask. This is enough for willGenerateVectors to consider the VPlan as generating vectors, even though everything useful is scalarized. This patch fixes this by checking that recipes have an underlying value set first. Because some vectorized recipes may be hoisted into the preheader via LICM, we need to also start the search in the preheader to prevent regressions. This also means we now scan the original scalar loop, so we need to handle VPIRInstructionSC.
3595042
to
320fe44
Compare
@@ -4172,7 +4172,7 @@ static bool willGenerateVectors(VPlan &Plan, ElementCount VF, | |||
// Set of already visited types. | |||
DenseSet<Type *> Visited; | |||
for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>( | |||
vp_depth_first_shallow(Plan.getVectorLoopRegion()->getEntry()))) { | |||
vp_depth_first_deep(Plan.getVectorPreheader()))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, intuitively this doesn't seem right to check anything outside the vector region
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding was that the vector preheader is outside of the vector region? So this will traverse through vector.ph -> vector loop region -> middle.block -> scalar.ph -> etc.
Closing for now because I have too many reviews open, and this PR isn't actually that important. Just wanted to post it here for reference! |
In some cases during tail folding, the plan won't actually vectorize any values from the original IR, but it may vectorize e.g. VPWidenCanonicalIVRecipe because it's used for generating the active lane mask.
This is enough for willGenerateVectors to consider the VPlan as generating vectors, even though everything useful is scalarized.
This patch fixes this by checking that recipes have an underlying value set first.
Because some vectorized recipes may be hoisted into the preheader via LICM, we need to also start the search in the preheader to prevent regressions.
This also means we now scan the original scalar loop, so we need to handle VPIRInstructionSC.