[RISCV] Cost bf16/f16 vector non-unit memory accesses as legal without zvfhmin/zvfbfmin #150882

lukel97 · 2025-07-28T05:40:36Z

When vectorizing with predication some loops that were previously vectorized without zvfhmin/zvfbfmin will no longer be vectorized because the masked load/store or gather/scatter cost returns illegal.

This is due to a discrepancy where for these costs we check isLegalElementTypeForRVV but for regular memory accesses we don't.

But for bf16 and f16 vectors we don't actually need the extension support for loads and stores, so this adds a new function which takes this into account.

For regular memory accesses we should probably also e.g. return an invalid cost for i64 elements on zve32x, but it doesn't look like we have tests for this yet.

We also should probably not be vectorizing these bf16/f16 loops to begin with if we don't have zvfhmin/zvfbfmin and zfhmin/zfbfmin. I think this is due to the scalar costs being too cheap. I've added tests for this in a100f63 to fix in another patch.

…n/zvfbfmin When vectorizing with predication some loops that were previously vectorized without zvfhmin/zvfbfmin will no longer be vectorized because the masked load/store or gather/scatter cost returns illegal. This is due to a discrepancy where for these costs we check isLegalElementTypeForRVV but for regular memory accesses we don't. But for bf16 and f16 vectors we don't actually need the extension support for loads and stores, so this adds a new function which takes this into account. For regular memory accesses we should probably also e.g. return an invalid cost for i64 elements on zve32x, but it doesn't look like we have tests for this yet. We also should probably not be vectorizing these bf16/f16 loops to begin with if we don't have zvfhmin/zvfbfmin and zfhmin/zfbfmin. I think this is due to the scalar costs being too cheap. I've added tests for this in a100f63 to fix in another patch.

llvmbot · 2025-07-28T05:41:04Z

@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-backend-risc-v

Author: Luke Lau (lukel97)

Changes

When vectorizing with predication some loops that were previously vectorized without zvfhmin/zvfbfmin will no longer be vectorized because the masked load/store or gather/scatter cost returns illegal.

This is due to a discrepancy where for these costs we check isLegalElementTypeForRVV but for regular memory accesses we don't.

But for bf16 and f16 vectors we don't actually need the extension support for loads and stores, so this adds a new function which takes this into account.

For regular memory accesses we should probably also e.g. return an invalid cost for i64 elements on zve32x, but it doesn't look like we have tests for this yet.

We also should probably not be vectorizing these bf16/f16 loops to begin with if we don't have zvfhmin/zvfbfmin and zfhmin/zfbfmin. I think this is due to the scalar costs being too cheap. I've added tests for this in a100f63 to fix in another patch.

Patch is 29.16 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/150882.diff

8 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+22-1)
(modified) llvm/lib/Target/RISCV/RISCVISelLowering.h (+1)
(modified) llvm/lib/Target/RISCV/RISCVInterleavedAccess.cpp (+1-1)
(modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h (+2-2)
(modified) llvm/test/Analysis/CostModel/RISCV/masked_ldst.ll (+4-4)
(modified) llvm/test/Transforms/InterleavedAccess/RISCV/interleaved-accesses.ll (+76)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/bf16.ll (+108)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/f16.ll (+52)

diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 54845e5374131..607edd3d859f8 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -2739,6 +2739,27 @@ bool RISCVTargetLowering::isLegalElementTypeForRVV(EVT ScalarTy) const {
   }
 }
 
+bool RISCVTargetLowering::isLegalLoadStoreElementTypeForRVV(
+    EVT ScalarTy) const {
+  if (!ScalarTy.isSimple())
+    return false;
+  switch (ScalarTy.getSimpleVT().SimpleTy) {
+  case MVT::iPTR:
+    return Subtarget.is64Bit() ? Subtarget.hasVInstructionsI64() : true;
+  case MVT::i8:
+  case MVT::i16:
+  case MVT::i32:
+  case MVT::f16:
+  case MVT::bf16:
+  case MVT::f32:
+    return true;
+  case MVT::i64:
+  case MVT::f64:
+    return Subtarget.hasVInstructionsI64();
+  default:
+    return false;
+  }
+}
 
 unsigned RISCVTargetLowering::combineRepeatedFPDivisors() const {
   return NumRepeatedDivisors;
@@ -24239,7 +24260,7 @@ bool RISCVTargetLowering::isLegalStridedLoadStore(EVT DataType,
     return false;
 
   EVT ScalarType = DataType.getScalarType();
-  if (!isLegalElementTypeForRVV(ScalarType))
+  if (!isLegalLoadStoreElementTypeForRVV(ScalarType))
     return false;
 
   if (!Subtarget.enableUnalignedVectorMem() &&
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.h b/llvm/lib/Target/RISCV/RISCVISelLowering.h
index ca70c46988b4e..a788c0b72184b 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.h
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.h
@@ -384,6 +384,7 @@ class RISCVTargetLowering : public TargetLowering {
   bool shouldRemoveExtendFromGSIndex(SDValue Extend, EVT DataVT) const override;
 
   bool isLegalElementTypeForRVV(EVT ScalarTy) const;
+  bool isLegalLoadStoreElementTypeForRVV(EVT ScalarTy) const;
 
   bool shouldConvertFpToSat(unsigned Op, EVT FPVT, EVT VT) const override;
 
diff --git a/llvm/lib/Target/RISCV/RISCVInterleavedAccess.cpp b/llvm/lib/Target/RISCV/RISCVInterleavedAccess.cpp
index 30d8f850763a2..3cbe668b08244 100644
--- a/llvm/lib/Target/RISCV/RISCVInterleavedAccess.cpp
+++ b/llvm/lib/Target/RISCV/RISCVInterleavedAccess.cpp
@@ -32,7 +32,7 @@ bool RISCVTargetLowering::isLegalInterleavedAccessType(
   if (!isTypeLegal(VT))
     return false;
 
-  if (!isLegalElementTypeForRVV(VT.getScalarType()) ||
+  if (!isLegalLoadStoreElementTypeForRVV(VT.getScalarType()) ||
       !allowsMemoryAccessForAlignment(VTy->getContext(), DL, VT, AddrSpace,
                                       Alignment))
     return false;
diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
index d62d99cf31899..f0510ec65b9d4 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
@@ -265,7 +265,7 @@ class RISCVTTIImpl final : public BasicTTIImplBase<RISCVTTIImpl> {
     if (!ST->enableUnalignedVectorMem() && Alignment < ElemType.getStoreSize())
       return false;
 
-    return TLI->isLegalElementTypeForRVV(ElemType);
+    return TLI->isLegalLoadStoreElementTypeForRVV(ElemType);
   }
 
   bool isLegalMaskedLoad(Type *DataType, Align Alignment,
@@ -297,7 +297,7 @@ class RISCVTTIImpl final : public BasicTTIImplBase<RISCVTTIImpl> {
     if (!ST->enableUnalignedVectorMem() && Alignment < ElemType.getStoreSize())
       return false;
 
-    return TLI->isLegalElementTypeForRVV(ElemType);
+    return TLI->isLegalLoadStoreElementTypeForRVV(ElemType);
   }
 
   bool isLegalMaskedGather(Type *DataType, Align Alignment) const override {
diff --git a/llvm/test/Analysis/CostModel/RISCV/masked_ldst.ll b/llvm/test/Analysis/CostModel/RISCV/masked_ldst.ll
index 892277a2d5740..68c89c3f77b3f 100644
--- a/llvm/test/Analysis/CostModel/RISCV/masked_ldst.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/masked_ldst.ll
@@ -13,14 +13,14 @@ define void @fixed() {
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v2i32 = call <2 x i32> @llvm.masked.load.v2i32.p0(ptr undef, i32 8, <2 x i1> undef, <2 x i32> undef)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v4i32 = call <4 x i32> @llvm.masked.load.v4i32.p0(ptr undef, i32 8, <4 x i1> undef, <4 x i32> undef)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v2i64 = call <2 x i64> @llvm.masked.load.v2i64.p0(ptr undef, i32 8, <2 x i1> undef, <2 x i64> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 9 for instruction: %v2f16 = call <2 x half> @llvm.masked.load.v2f16.p0(ptr undef, i32 8, <2 x i1> undef, <2 x half> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 19 for instruction: %v4f16 = call <4 x half> @llvm.masked.load.v4f16.p0(ptr undef, i32 8, <4 x i1> undef, <4 x half> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 39 for instruction: %v8f16 = call <8 x half> @llvm.masked.load.v8f16.p0(ptr undef, i32 8, <8 x i1> undef, <8 x half> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v2f16 = call <2 x half> @llvm.masked.load.v2f16.p0(ptr undef, i32 8, <2 x i1> undef, <2 x half> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %v4f16 = call <4 x half> @llvm.masked.load.v4f16.p0(ptr undef, i32 8, <4 x i1> undef, <4 x half> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %v8f16 = call <8 x half> @llvm.masked.load.v8f16.p0(ptr undef, i32 8, <8 x i1> undef, <8 x half> undef)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v2f32 = call <2 x float> @llvm.masked.load.v2f32.p0(ptr undef, i32 8, <2 x i1> undef, <2 x float> undef)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v4f32 = call <4 x float> @llvm.masked.load.v4f32.p0(ptr undef, i32 8, <4 x i1> undef, <4 x float> undef)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v2f64 = call <2 x double> @llvm.masked.load.v2f64.p0(ptr undef, i32 8, <2 x i1> undef, <2 x double> undef)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v4i64 = call <4 x i64> @llvm.masked.load.v4i64.p0(ptr undef, i32 8, <4 x i1> undef, <4 x i64> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 223 for instruction: %v32f16 = call <32 x half> @llvm.masked.load.v32f16.p0(ptr undef, i32 8, <32 x i1> undef, <32 x half> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 32 for instruction: %v32f16 = call <32 x half> @llvm.masked.load.v32f16.p0(ptr undef, i32 8, <32 x i1> undef, <32 x half> undef)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
 ;
 entry:
diff --git a/llvm/test/Transforms/InterleavedAccess/RISCV/interleaved-accesses.ll b/llvm/test/Transforms/InterleavedAccess/RISCV/interleaved-accesses.ll
index 672e94962da6d..b505917402e31 100644
--- a/llvm/test/Transforms/InterleavedAccess/RISCV/interleaved-accesses.ll
+++ b/llvm/test/Transforms/InterleavedAccess/RISCV/interleaved-accesses.ll
@@ -874,3 +874,79 @@ define void @load_factor2_fp128(ptr %ptr) {
   %v1 = shufflevector <4 x fp128> %interleaved.vec, <4 x fp128> poison, <2 x i32> <i32 1, i32 3>
   ret void
 }
+
+define void @load_factor2_f32(ptr %ptr) {
+; RV32-LABEL: @load_factor2_f32(
+; RV32-NEXT:    [[TMP1:%.*]] = call { <8 x float>, <8 x float> } @llvm.riscv.seg2.load.mask.v8f32.p0.i32(ptr [[PTR:%.*]], <8 x i1> splat (i1 true), i32 8)
+; RV32-NEXT:    [[TMP2:%.*]] = extractvalue { <8 x float>, <8 x float> } [[TMP1]], 1
+; RV32-NEXT:    [[TMP3:%.*]] = extractvalue { <8 x float>, <8 x float> } [[TMP1]], 0
+; RV32-NEXT:    ret void
+;
+; RV64-LABEL: @load_factor2_f32(
+; RV64-NEXT:    [[TMP1:%.*]] = call { <8 x float>, <8 x float> } @llvm.riscv.seg2.load.mask.v8f32.p0.i64(ptr [[PTR:%.*]], <8 x i1> splat (i1 true), i64 8)
+; RV64-NEXT:    [[TMP2:%.*]] = extractvalue { <8 x float>, <8 x float> } [[TMP1]], 1
+; RV64-NEXT:    [[TMP3:%.*]] = extractvalue { <8 x float>, <8 x float> } [[TMP1]], 0
+; RV64-NEXT:    ret void
+;
+  %interleaved.vec = load <16 x float>, ptr %ptr
+  %v0 = shufflevector <16 x float> %interleaved.vec, <16 x float> poison, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
+  %v1 = shufflevector <16 x float> %interleaved.vec, <16 x float> poison, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>
+  ret void
+}
+
+define void @load_factor2_f64(ptr %ptr) {
+; RV32-LABEL: @load_factor2_f64(
+; RV32-NEXT:    [[TMP1:%.*]] = call { <8 x double>, <8 x double> } @llvm.riscv.seg2.load.mask.v8f64.p0.i32(ptr [[PTR:%.*]], <8 x i1> splat (i1 true), i32 8)
+; RV32-NEXT:    [[TMP2:%.*]] = extractvalue { <8 x double>, <8 x double> } [[TMP1]], 1
+; RV32-NEXT:    [[TMP3:%.*]] = extractvalue { <8 x double>, <8 x double> } [[TMP1]], 0
+; RV32-NEXT:    ret void
+;
+; RV64-LABEL: @load_factor2_f64(
+; RV64-NEXT:    [[TMP1:%.*]] = call { <8 x double>, <8 x double> } @llvm.riscv.seg2.load.mask.v8f64.p0.i64(ptr [[PTR:%.*]], <8 x i1> splat (i1 true), i64 8)
+; RV64-NEXT:    [[TMP2:%.*]] = extractvalue { <8 x double>, <8 x double> } [[TMP1]], 1
+; RV64-NEXT:    [[TMP3:%.*]] = extractvalue { <8 x double>, <8 x double> } [[TMP1]], 0
+; RV64-NEXT:    ret void
+;
+  %interleaved.vec = load <16 x double>, ptr %ptr
+  %v0 = shufflevector <16 x double> %interleaved.vec, <16 x double> poison, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
+  %v1 = shufflevector <16 x double> %interleaved.vec, <16 x double> poison, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>
+  ret void
+}
+
+define void @load_factor2_bf16(ptr %ptr) {
+; RV32-LABEL: @load_factor2_bf16(
+; RV32-NEXT:    [[INTERLEAVED_VEC:%.*]] = load <16 x bfloat>, ptr [[PTR:%.*]], align 32
+; RV32-NEXT:    [[V0:%.*]] = shufflevector <16 x bfloat> [[INTERLEAVED_VEC]], <16 x bfloat> poison, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
+; RV32-NEXT:    [[V1:%.*]] = shufflevector <16 x bfloat> [[INTERLEAVED_VEC]], <16 x bfloat> poison, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>
+; RV32-NEXT:    ret void
+;
+; RV64-LABEL: @load_factor2_bf16(
+; RV64-NEXT:    [[INTERLEAVED_VEC:%.*]] = load <16 x bfloat>, ptr [[PTR:%.*]], align 32
+; RV64-NEXT:    [[V0:%.*]] = shufflevector <16 x bfloat> [[INTERLEAVED_VEC]], <16 x bfloat> poison, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
+; RV64-NEXT:    [[V1:%.*]] = shufflevector <16 x bfloat> [[INTERLEAVED_VEC]], <16 x bfloat> poison, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>
+; RV64-NEXT:    ret void
+;
+  %interleaved.vec = load <16 x bfloat>, ptr %ptr
+  %v0 = shufflevector <16 x bfloat> %interleaved.vec, <16 x bfloat> poison, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
+  %v1 = shufflevector <16 x bfloat> %interleaved.vec, <16 x bfloat> poison, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>
+  ret void
+}
+
+define void @load_factor2_f16(ptr %ptr) {
+; RV32-LABEL: @load_factor2_f16(
+; RV32-NEXT:    [[INTERLEAVED_VEC:%.*]] = load <16 x half>, ptr [[PTR:%.*]], align 32
+; RV32-NEXT:    [[V0:%.*]] = shufflevector <16 x half> [[INTERLEAVED_VEC]], <16 x half> poison, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
+; RV32-NEXT:    [[V1:%.*]] = shufflevector <16 x half> [[INTERLEAVED_VEC]], <16 x half> poison, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>
+; RV32-NEXT:    ret void
+;
+; RV64-LABEL: @load_factor2_f16(
+; RV64-NEXT:    [[INTERLEAVED_VEC:%.*]] = load <16 x half>, ptr [[PTR:%.*]], align 32
+; RV64-NEXT:    [[V0:%.*]] = shufflevector <16 x half> [[INTERLEAVED_VEC]], <16 x half> poison, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
+; RV64-NEXT:    [[V1:%.*]] = shufflevector <16 x half> [[INTERLEAVED_VEC]], <16 x half> poison, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>
+; RV64-NEXT:    ret void
+;
+  %interleaved.vec = load <16 x half>, ptr %ptr
+  %v0 = shufflevector <16 x half> %interleaved.vec, <16 x half> poison, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
+  %v1 = shufflevector <16 x half> %interleaved.vec, <16 x half> poison, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>
+  ret void
+}
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/bf16.ll b/llvm/test/Transforms/LoopVectorize/RISCV/bf16.ll
index d5b25bfe349b9..03c6f089df9aa 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/bf16.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/bf16.ll
@@ -1,5 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
 ; RUN: opt < %s -passes=loop-vectorize -mtriple riscv64 -mattr=+v -S | FileCheck %s -check-prefix=NO-ZVFBFMIN
+; RUN: opt < %s -passes=loop-vectorize -mtriple riscv64 -mattr=+v -S -prefer-predicate-over-epilogue=predicate-else-scalar-epilogue | FileCheck %s -check-prefix=NO-ZVFBFMIN-PREDICATED
 ; RUN: opt < %s -passes=loop-vectorize -mtriple riscv64 -mattr=+v,+zvfbfmin -S | FileCheck %s -check-prefix=ZVFBFMIN
 
 define void @fadd(ptr noalias %a, ptr noalias %b, i64 %n) {
@@ -21,6 +22,52 @@ define void @fadd(ptr noalias %a, ptr noalias %b, i64 %n) {
 ; NO-ZVFBFMIN:       [[EXIT]]:
 ; NO-ZVFBFMIN-NEXT:    ret void
 ;
+; NO-ZVFBFMIN-PREDICATED-LABEL: define void @fadd(
+; NO-ZVFBFMIN-PREDICATED-SAME: ptr noalias [[A:%.*]], ptr noalias [[B:%.*]], i64 [[N:%.*]]) #[[ATTR0:[0-9]+]] {
+; NO-ZVFBFMIN-PREDICATED-NEXT:  [[ENTRY:.*]]:
+; NO-ZVFBFMIN-PREDICATED-NEXT:    br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; NO-ZVFBFMIN-PREDICATED:       [[VECTOR_PH]]:
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[N_RND_UP:%.*]] = add i64 [[N]], 15
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], 16
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[TRIP_COUNT_MINUS_1:%.*]] = sub i64 [[N]], 1
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <16 x i64> poison, i64 [[TRIP_COUNT_MINUS_1]], i64 0
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <16 x i64> [[BROADCAST_SPLATINSERT]], <16 x i64> poison, <16 x i32> zeroinitializer
+; NO-ZVFBFMIN-PREDICATED-NEXT:    br label %[[VECTOR_BODY:.*]]
+; NO-ZVFBFMIN-PREDICATED:       [[VECTOR_BODY]]:
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <16 x i64> poison, i64 [[INDEX]], i64 0
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[BROADCAST_SPLAT2:%.*]] = shufflevector <16 x i64> [[BROADCAST_SPLATINSERT1]], <16 x i64> poison, <16 x i32> zeroinitializer
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[VEC_IV:%.*]] = add <16 x i64> [[BROADCAST_SPLAT2]], <i64 0, i64 1, i64 2, i64 3, i64 4, i64 5, i64 6, i64 7, i64 8, i64 9, i64 10, i64 11, i64 12, i64 13, i64 14, i64 15>
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[TMP0:%.*]] = icmp ule <16 x i64> [[VEC_IV]], [[BROADCAST_SPLAT]]
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[TMP1:%.*]] = getelementptr bfloat, ptr [[A]], i64 [[INDEX]]
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[TMP2:%.*]] = getelementptr bfloat, ptr [[B]], i64 [[INDEX]]
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[WIDE_MASKED_LOAD:%.*]] = call <16 x bfloat> @llvm.masked.load.v16bf16.p0(ptr [[TMP1]], i32 2, <16 x i1> [[TMP0]], <16 x bfloat> poison)
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[WIDE_MASKED_LOAD3:%.*]] = call <16 x bfloat> @llvm.masked.load.v16bf16.p0(ptr [[TMP2]], i32 2, <16 x i1> [[TMP0]], <16 x bfloat> poison)
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[TMP3:%.*]] = fadd <16 x bfloat> [[WIDE_MASKED_LOAD]], [[WIDE_MASKED_LOAD3]]
+; NO-ZVFBFMIN-PREDICATED-NEXT:    call void @llvm.masked.store.v16bf16.p0(<16 x bfloat> [[TMP3]], ptr [[TMP1]], i32 2, <16 x i1> [[TMP0]])
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[INDEX_NEXT]] = add i64 [[INDEX]], 16
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[TMP4:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; NO-ZVFBFMIN-PREDICATED-NEXT:    br i1 [[TMP4]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; NO-ZVFBFMIN-PREDICATED:       [[MIDDLE_BLOCK]]:
+; NO-ZVFBFMIN-PREDICATED-NEXT:    br label %[[EXIT:.*]]
+; NO-ZVFBFMIN-PREDICATED:       [[SCALAR_PH]]:
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ]
+; NO-ZVFBFMIN-PREDICATED-NEXT:    br label %[[LOOP:.*]]
+; NO-ZVFBFMIN-PREDICATED:       [[LOOP]]:
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[I:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[I_NEXT:%.*]], %[[LOOP]] ]
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[A_GEP:%.*]] = getelementptr bfloat, ptr [[A]], i64 [[I]]
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[B_GEP:%.*]] = getelementptr bfloat, ptr [[B]], i64 [[I]]
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[X:%.*]] = load bfloat, ptr [[A_GEP]], align 2
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[Y:%.*]] = load bfloat, ptr [[B_GEP]], align 2
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[Z:%.*]] = fadd bfloat [[X]], [[Y]]
+; NO-ZVFBFMIN-PREDICATED-NEXT:    store bfloat [[Z]], ptr [[A_GEP]], align 2
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[I_NEXT]] = add i64 [[I]], 1
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[DONE:%.*]] = icmp eq i64 [[I_NEXT]], [[N]]
+; NO-ZVFBFMIN-PREDICATED-NEXT:    br i1 [[DONE]], label %[[EXIT]], label %[[LOOP]], !llvm.loop [[LOOP3:![0-9]+]]
+; NO-ZVFBFMIN-PREDICATED:       [[EXIT]]:
+; NO-ZVFBFMIN-PREDICATED-NEXT:    ret void
+;
 ; ZVFBFMIN-LABEL: define void @fadd(
 ; ZVFBFMIN-SAME: ptr noalias [[A:%.*]], ptr noalias [[B:%.*]], i64 [[N:%.*]]) #[[ATTR0:[0-9]+]] {
 ; ZVFBFMIN-NEXT:  [[ENTRY:.*]]:
@@ -133,6 +180,60 @@ define void @vfwmaccbf16.vv(ptr noalias %a, ptr noalias %b, ptr noalias %c, i64
 ; NO-ZVFBFMIN:       [[EXIT]]:
 ; NO-ZVFBFMIN-NEXT:    ret void
 ;
+; NO-ZVFBFMIN-PREDICATED-LABEL: define void @vfwmaccbf16.vv(
+; NO-ZVFBFMIN-PREDICATED-SAME: ptr noalias [[A:%.*]], ptr noalias [[B:%.*]], ptr noalias [[C:%.*]], i64 [[N:%.*]]) #[[ATTR0]] {
+; NO-ZVFBFMIN-PREDICATED-NEXT:  [[ENTRY:.*]]:
+; NO-ZVFBFMIN-PREDICATED-NEXT:    br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; NO-ZVFBFMIN-PREDICATED:       [[VECTOR_PH]]:
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[N_RND_UP:%.*]] = add i64 [[N]], 3
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], 4
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[TRIP_COUNT_MINUS_1:%.*]] = sub i64 [[N]], 1
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i64> poison, i64 [[TRIP_COUNT_MINUS_1]], i64 0
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i64> [[BROADCAST_SPLATINSERT]], <4 x i64> poison, <4 x i32> zeroinitializer
+; NO-ZVFBFMIN-PREDICATED-NEXT:    br label %[[VECTOR_BODY:.*]]
+; NO-ZVFBFMIN-PREDICATED:       [[VECTOR_BODY]]:
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[I:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <4 x i64> poison, i64 [[I]], i64 0
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[BROADCAST_SPLAT2:%.*]] = shufflevector <4 x i64> [[BROADCAST_SPLATINSERT1]], <4 x i64> poison, <4 x i32> zeroinitializer
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[VEC_IV:%.*]] = add <4 x i64> [[BROADCAST_SPLAT2]], <i64 0, i64 1, i64 2, i64 3>
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[TMP0:%.*]] = icmp ule <4 x i64> [[VEC_IV]], [[BROADCAST_SPLAT]]
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[A_GEP:%.*]] = getelementptr bfloat, ptr [[A]], i64 [[I]]
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[B_GEP:%.*]] = getelementptr bfloat, ptr [[B]], i64 [[I]]
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[C_GEP:%.*]] = getelementptr float, ptr [[C]], i64 [[I]]
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[WIDE_MASKED_LOAD:%.*]] = call <4 x bfloat> @llvm.masked.load.v4bf16.p0(ptr [[A_GEP]], i32 2, <4 x i1> [[TMP0]], <4 x bfloat> poison)
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[WIDE_MASKED_LOAD3:%.*]] = call <4 x bfloat> @llvm.masked.load.v4bf16.p0(ptr [[B_GEP]], i32 2, <4 x i1> [[TMP0]], <4 x bfloat> poison)
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[WIDE_MASKED_LOAD4:%.*]] = call <4 x float> @llvm.masked.load.v4f32.p0(ptr [[C_GEP]], i32 4, <4 x i1> [[TMP0]], <4 x float> poison)
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[TMP4:%.*]] = fpext <4 x bfloat> [[WIDE_MASKED_LOAD]] to <4 x float>
+...
[truncated]

llvmbot · 2025-07-28T05:41:05Z

@llvm/pr-subscribers-llvm-transforms

Author: Luke Lau (lukel97)

Changes

When vectorizing with predication some loops that were previously vectorized without zvfhmin/zvfbfmin will no longer be vectorized because the masked load/store or gather/scatter cost returns illegal.

This is due to a discrepancy where for these costs we check isLegalElementTypeForRVV but for regular memory accesses we don't.

But for bf16 and f16 vectors we don't actually need the extension support for loads and stores, so this adds a new function which takes this into account.

For regular memory accesses we should probably also e.g. return an invalid cost for i64 elements on zve32x, but it doesn't look like we have tests for this yet.

We also should probably not be vectorizing these bf16/f16 loops to begin with if we don't have zvfhmin/zvfbfmin and zfhmin/zfbfmin. I think this is due to the scalar costs being too cheap. I've added tests for this in a100f63 to fix in another patch.

Patch is 29.16 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/150882.diff

8 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+22-1)
(modified) llvm/lib/Target/RISCV/RISCVISelLowering.h (+1)
(modified) llvm/lib/Target/RISCV/RISCVInterleavedAccess.cpp (+1-1)
(modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h (+2-2)
(modified) llvm/test/Analysis/CostModel/RISCV/masked_ldst.ll (+4-4)
(modified) llvm/test/Transforms/InterleavedAccess/RISCV/interleaved-accesses.ll (+76)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/bf16.ll (+108)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/f16.ll (+52)

diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 54845e5374131..607edd3d859f8 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -2739,6 +2739,27 @@ bool RISCVTargetLowering::isLegalElementTypeForRVV(EVT ScalarTy) const {
   }
 }
 
+bool RISCVTargetLowering::isLegalLoadStoreElementTypeForRVV(
+    EVT ScalarTy) const {
+  if (!ScalarTy.isSimple())
+    return false;
+  switch (ScalarTy.getSimpleVT().SimpleTy) {
+  case MVT::iPTR:
+    return Subtarget.is64Bit() ? Subtarget.hasVInstructionsI64() : true;
+  case MVT::i8:
+  case MVT::i16:
+  case MVT::i32:
+  case MVT::f16:
+  case MVT::bf16:
+  case MVT::f32:
+    return true;
+  case MVT::i64:
+  case MVT::f64:
+    return Subtarget.hasVInstructionsI64();
+  default:
+    return false;
+  }
+}
 
 unsigned RISCVTargetLowering::combineRepeatedFPDivisors() const {
   return NumRepeatedDivisors;
@@ -24239,7 +24260,7 @@ bool RISCVTargetLowering::isLegalStridedLoadStore(EVT DataType,
     return false;
 
   EVT ScalarType = DataType.getScalarType();
-  if (!isLegalElementTypeForRVV(ScalarType))
+  if (!isLegalLoadStoreElementTypeForRVV(ScalarType))
     return false;
 
   if (!Subtarget.enableUnalignedVectorMem() &&
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.h b/llvm/lib/Target/RISCV/RISCVISelLowering.h
index ca70c46988b4e..a788c0b72184b 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.h
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.h
@@ -384,6 +384,7 @@ class RISCVTargetLowering : public TargetLowering {
   bool shouldRemoveExtendFromGSIndex(SDValue Extend, EVT DataVT) const override;
 
   bool isLegalElementTypeForRVV(EVT ScalarTy) const;
+  bool isLegalLoadStoreElementTypeForRVV(EVT ScalarTy) const;
 
   bool shouldConvertFpToSat(unsigned Op, EVT FPVT, EVT VT) const override;
 
diff --git a/llvm/lib/Target/RISCV/RISCVInterleavedAccess.cpp b/llvm/lib/Target/RISCV/RISCVInterleavedAccess.cpp
index 30d8f850763a2..3cbe668b08244 100644
--- a/llvm/lib/Target/RISCV/RISCVInterleavedAccess.cpp
+++ b/llvm/lib/Target/RISCV/RISCVInterleavedAccess.cpp
@@ -32,7 +32,7 @@ bool RISCVTargetLowering::isLegalInterleavedAccessType(
   if (!isTypeLegal(VT))
     return false;
 
-  if (!isLegalElementTypeForRVV(VT.getScalarType()) ||
+  if (!isLegalLoadStoreElementTypeForRVV(VT.getScalarType()) ||
       !allowsMemoryAccessForAlignment(VTy->getContext(), DL, VT, AddrSpace,
                                       Alignment))
     return false;
diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
index d62d99cf31899..f0510ec65b9d4 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
@@ -265,7 +265,7 @@ class RISCVTTIImpl final : public BasicTTIImplBase<RISCVTTIImpl> {
     if (!ST->enableUnalignedVectorMem() && Alignment < ElemType.getStoreSize())
       return false;
 
-    return TLI->isLegalElementTypeForRVV(ElemType);
+    return TLI->isLegalLoadStoreElementTypeForRVV(ElemType);
   }
 
   bool isLegalMaskedLoad(Type *DataType, Align Alignment,
@@ -297,7 +297,7 @@ class RISCVTTIImpl final : public BasicTTIImplBase<RISCVTTIImpl> {
     if (!ST->enableUnalignedVectorMem() && Alignment < ElemType.getStoreSize())
       return false;
 
-    return TLI->isLegalElementTypeForRVV(ElemType);
+    return TLI->isLegalLoadStoreElementTypeForRVV(ElemType);
   }
 
   bool isLegalMaskedGather(Type *DataType, Align Alignment) const override {
diff --git a/llvm/test/Analysis/CostModel/RISCV/masked_ldst.ll b/llvm/test/Analysis/CostModel/RISCV/masked_ldst.ll
index 892277a2d5740..68c89c3f77b3f 100644
--- a/llvm/test/Analysis/CostModel/RISCV/masked_ldst.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/masked_ldst.ll
@@ -13,14 +13,14 @@ define void @fixed() {
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v2i32 = call <2 x i32> @llvm.masked.load.v2i32.p0(ptr undef, i32 8, <2 x i1> undef, <2 x i32> undef)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v4i32 = call <4 x i32> @llvm.masked.load.v4i32.p0(ptr undef, i32 8, <4 x i1> undef, <4 x i32> undef)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v2i64 = call <2 x i64> @llvm.masked.load.v2i64.p0(ptr undef, i32 8, <2 x i1> undef, <2 x i64> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 9 for instruction: %v2f16 = call <2 x half> @llvm.masked.load.v2f16.p0(ptr undef, i32 8, <2 x i1> undef, <2 x half> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 19 for instruction: %v4f16 = call <4 x half> @llvm.masked.load.v4f16.p0(ptr undef, i32 8, <4 x i1> undef, <4 x half> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 39 for instruction: %v8f16 = call <8 x half> @llvm.masked.load.v8f16.p0(ptr undef, i32 8, <8 x i1> undef, <8 x half> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v2f16 = call <2 x half> @llvm.masked.load.v2f16.p0(ptr undef, i32 8, <2 x i1> undef, <2 x half> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %v4f16 = call <4 x half> @llvm.masked.load.v4f16.p0(ptr undef, i32 8, <4 x i1> undef, <4 x half> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %v8f16 = call <8 x half> @llvm.masked.load.v8f16.p0(ptr undef, i32 8, <8 x i1> undef, <8 x half> undef)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v2f32 = call <2 x float> @llvm.masked.load.v2f32.p0(ptr undef, i32 8, <2 x i1> undef, <2 x float> undef)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v4f32 = call <4 x float> @llvm.masked.load.v4f32.p0(ptr undef, i32 8, <4 x i1> undef, <4 x float> undef)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v2f64 = call <2 x double> @llvm.masked.load.v2f64.p0(ptr undef, i32 8, <2 x i1> undef, <2 x double> undef)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v4i64 = call <4 x i64> @llvm.masked.load.v4i64.p0(ptr undef, i32 8, <4 x i1> undef, <4 x i64> undef)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 223 for instruction: %v32f16 = call <32 x half> @llvm.masked.load.v32f16.p0(ptr undef, i32 8, <32 x i1> undef, <32 x half> undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 32 for instruction: %v32f16 = call <32 x half> @llvm.masked.load.v32f16.p0(ptr undef, i32 8, <32 x i1> undef, <32 x half> undef)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
 ;
 entry:
diff --git a/llvm/test/Transforms/InterleavedAccess/RISCV/interleaved-accesses.ll b/llvm/test/Transforms/InterleavedAccess/RISCV/interleaved-accesses.ll
index 672e94962da6d..b505917402e31 100644
--- a/llvm/test/Transforms/InterleavedAccess/RISCV/interleaved-accesses.ll
+++ b/llvm/test/Transforms/InterleavedAccess/RISCV/interleaved-accesses.ll
@@ -874,3 +874,79 @@ define void @load_factor2_fp128(ptr %ptr) {
   %v1 = shufflevector <4 x fp128> %interleaved.vec, <4 x fp128> poison, <2 x i32> <i32 1, i32 3>
   ret void
 }
+
+define void @load_factor2_f32(ptr %ptr) {
+; RV32-LABEL: @load_factor2_f32(
+; RV32-NEXT:    [[TMP1:%.*]] = call { <8 x float>, <8 x float> } @llvm.riscv.seg2.load.mask.v8f32.p0.i32(ptr [[PTR:%.*]], <8 x i1> splat (i1 true), i32 8)
+; RV32-NEXT:    [[TMP2:%.*]] = extractvalue { <8 x float>, <8 x float> } [[TMP1]], 1
+; RV32-NEXT:    [[TMP3:%.*]] = extractvalue { <8 x float>, <8 x float> } [[TMP1]], 0
+; RV32-NEXT:    ret void
+;
+; RV64-LABEL: @load_factor2_f32(
+; RV64-NEXT:    [[TMP1:%.*]] = call { <8 x float>, <8 x float> } @llvm.riscv.seg2.load.mask.v8f32.p0.i64(ptr [[PTR:%.*]], <8 x i1> splat (i1 true), i64 8)
+; RV64-NEXT:    [[TMP2:%.*]] = extractvalue { <8 x float>, <8 x float> } [[TMP1]], 1
+; RV64-NEXT:    [[TMP3:%.*]] = extractvalue { <8 x float>, <8 x float> } [[TMP1]], 0
+; RV64-NEXT:    ret void
+;
+  %interleaved.vec = load <16 x float>, ptr %ptr
+  %v0 = shufflevector <16 x float> %interleaved.vec, <16 x float> poison, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
+  %v1 = shufflevector <16 x float> %interleaved.vec, <16 x float> poison, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>
+  ret void
+}
+
+define void @load_factor2_f64(ptr %ptr) {
+; RV32-LABEL: @load_factor2_f64(
+; RV32-NEXT:    [[TMP1:%.*]] = call { <8 x double>, <8 x double> } @llvm.riscv.seg2.load.mask.v8f64.p0.i32(ptr [[PTR:%.*]], <8 x i1> splat (i1 true), i32 8)
+; RV32-NEXT:    [[TMP2:%.*]] = extractvalue { <8 x double>, <8 x double> } [[TMP1]], 1
+; RV32-NEXT:    [[TMP3:%.*]] = extractvalue { <8 x double>, <8 x double> } [[TMP1]], 0
+; RV32-NEXT:    ret void
+;
+; RV64-LABEL: @load_factor2_f64(
+; RV64-NEXT:    [[TMP1:%.*]] = call { <8 x double>, <8 x double> } @llvm.riscv.seg2.load.mask.v8f64.p0.i64(ptr [[PTR:%.*]], <8 x i1> splat (i1 true), i64 8)
+; RV64-NEXT:    [[TMP2:%.*]] = extractvalue { <8 x double>, <8 x double> } [[TMP1]], 1
+; RV64-NEXT:    [[TMP3:%.*]] = extractvalue { <8 x double>, <8 x double> } [[TMP1]], 0
+; RV64-NEXT:    ret void
+;
+  %interleaved.vec = load <16 x double>, ptr %ptr
+  %v0 = shufflevector <16 x double> %interleaved.vec, <16 x double> poison, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
+  %v1 = shufflevector <16 x double> %interleaved.vec, <16 x double> poison, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>
+  ret void
+}
+
+define void @load_factor2_bf16(ptr %ptr) {
+; RV32-LABEL: @load_factor2_bf16(
+; RV32-NEXT:    [[INTERLEAVED_VEC:%.*]] = load <16 x bfloat>, ptr [[PTR:%.*]], align 32
+; RV32-NEXT:    [[V0:%.*]] = shufflevector <16 x bfloat> [[INTERLEAVED_VEC]], <16 x bfloat> poison, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
+; RV32-NEXT:    [[V1:%.*]] = shufflevector <16 x bfloat> [[INTERLEAVED_VEC]], <16 x bfloat> poison, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>
+; RV32-NEXT:    ret void
+;
+; RV64-LABEL: @load_factor2_bf16(
+; RV64-NEXT:    [[INTERLEAVED_VEC:%.*]] = load <16 x bfloat>, ptr [[PTR:%.*]], align 32
+; RV64-NEXT:    [[V0:%.*]] = shufflevector <16 x bfloat> [[INTERLEAVED_VEC]], <16 x bfloat> poison, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
+; RV64-NEXT:    [[V1:%.*]] = shufflevector <16 x bfloat> [[INTERLEAVED_VEC]], <16 x bfloat> poison, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>
+; RV64-NEXT:    ret void
+;
+  %interleaved.vec = load <16 x bfloat>, ptr %ptr
+  %v0 = shufflevector <16 x bfloat> %interleaved.vec, <16 x bfloat> poison, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
+  %v1 = shufflevector <16 x bfloat> %interleaved.vec, <16 x bfloat> poison, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>
+  ret void
+}
+
+define void @load_factor2_f16(ptr %ptr) {
+; RV32-LABEL: @load_factor2_f16(
+; RV32-NEXT:    [[INTERLEAVED_VEC:%.*]] = load <16 x half>, ptr [[PTR:%.*]], align 32
+; RV32-NEXT:    [[V0:%.*]] = shufflevector <16 x half> [[INTERLEAVED_VEC]], <16 x half> poison, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
+; RV32-NEXT:    [[V1:%.*]] = shufflevector <16 x half> [[INTERLEAVED_VEC]], <16 x half> poison, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>
+; RV32-NEXT:    ret void
+;
+; RV64-LABEL: @load_factor2_f16(
+; RV64-NEXT:    [[INTERLEAVED_VEC:%.*]] = load <16 x half>, ptr [[PTR:%.*]], align 32
+; RV64-NEXT:    [[V0:%.*]] = shufflevector <16 x half> [[INTERLEAVED_VEC]], <16 x half> poison, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
+; RV64-NEXT:    [[V1:%.*]] = shufflevector <16 x half> [[INTERLEAVED_VEC]], <16 x half> poison, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>
+; RV64-NEXT:    ret void
+;
+  %interleaved.vec = load <16 x half>, ptr %ptr
+  %v0 = shufflevector <16 x half> %interleaved.vec, <16 x half> poison, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
+  %v1 = shufflevector <16 x half> %interleaved.vec, <16 x half> poison, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>
+  ret void
+}
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/bf16.ll b/llvm/test/Transforms/LoopVectorize/RISCV/bf16.ll
index d5b25bfe349b9..03c6f089df9aa 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/bf16.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/bf16.ll
@@ -1,5 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
 ; RUN: opt < %s -passes=loop-vectorize -mtriple riscv64 -mattr=+v -S | FileCheck %s -check-prefix=NO-ZVFBFMIN
+; RUN: opt < %s -passes=loop-vectorize -mtriple riscv64 -mattr=+v -S -prefer-predicate-over-epilogue=predicate-else-scalar-epilogue | FileCheck %s -check-prefix=NO-ZVFBFMIN-PREDICATED
 ; RUN: opt < %s -passes=loop-vectorize -mtriple riscv64 -mattr=+v,+zvfbfmin -S | FileCheck %s -check-prefix=ZVFBFMIN
 
 define void @fadd(ptr noalias %a, ptr noalias %b, i64 %n) {
@@ -21,6 +22,52 @@ define void @fadd(ptr noalias %a, ptr noalias %b, i64 %n) {
 ; NO-ZVFBFMIN:       [[EXIT]]:
 ; NO-ZVFBFMIN-NEXT:    ret void
 ;
+; NO-ZVFBFMIN-PREDICATED-LABEL: define void @fadd(
+; NO-ZVFBFMIN-PREDICATED-SAME: ptr noalias [[A:%.*]], ptr noalias [[B:%.*]], i64 [[N:%.*]]) #[[ATTR0:[0-9]+]] {
+; NO-ZVFBFMIN-PREDICATED-NEXT:  [[ENTRY:.*]]:
+; NO-ZVFBFMIN-PREDICATED-NEXT:    br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; NO-ZVFBFMIN-PREDICATED:       [[VECTOR_PH]]:
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[N_RND_UP:%.*]] = add i64 [[N]], 15
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], 16
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[TRIP_COUNT_MINUS_1:%.*]] = sub i64 [[N]], 1
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <16 x i64> poison, i64 [[TRIP_COUNT_MINUS_1]], i64 0
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <16 x i64> [[BROADCAST_SPLATINSERT]], <16 x i64> poison, <16 x i32> zeroinitializer
+; NO-ZVFBFMIN-PREDICATED-NEXT:    br label %[[VECTOR_BODY:.*]]
+; NO-ZVFBFMIN-PREDICATED:       [[VECTOR_BODY]]:
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <16 x i64> poison, i64 [[INDEX]], i64 0
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[BROADCAST_SPLAT2:%.*]] = shufflevector <16 x i64> [[BROADCAST_SPLATINSERT1]], <16 x i64> poison, <16 x i32> zeroinitializer
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[VEC_IV:%.*]] = add <16 x i64> [[BROADCAST_SPLAT2]], <i64 0, i64 1, i64 2, i64 3, i64 4, i64 5, i64 6, i64 7, i64 8, i64 9, i64 10, i64 11, i64 12, i64 13, i64 14, i64 15>
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[TMP0:%.*]] = icmp ule <16 x i64> [[VEC_IV]], [[BROADCAST_SPLAT]]
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[TMP1:%.*]] = getelementptr bfloat, ptr [[A]], i64 [[INDEX]]
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[TMP2:%.*]] = getelementptr bfloat, ptr [[B]], i64 [[INDEX]]
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[WIDE_MASKED_LOAD:%.*]] = call <16 x bfloat> @llvm.masked.load.v16bf16.p0(ptr [[TMP1]], i32 2, <16 x i1> [[TMP0]], <16 x bfloat> poison)
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[WIDE_MASKED_LOAD3:%.*]] = call <16 x bfloat> @llvm.masked.load.v16bf16.p0(ptr [[TMP2]], i32 2, <16 x i1> [[TMP0]], <16 x bfloat> poison)
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[TMP3:%.*]] = fadd <16 x bfloat> [[WIDE_MASKED_LOAD]], [[WIDE_MASKED_LOAD3]]
+; NO-ZVFBFMIN-PREDICATED-NEXT:    call void @llvm.masked.store.v16bf16.p0(<16 x bfloat> [[TMP3]], ptr [[TMP1]], i32 2, <16 x i1> [[TMP0]])
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[INDEX_NEXT]] = add i64 [[INDEX]], 16
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[TMP4:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; NO-ZVFBFMIN-PREDICATED-NEXT:    br i1 [[TMP4]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; NO-ZVFBFMIN-PREDICATED:       [[MIDDLE_BLOCK]]:
+; NO-ZVFBFMIN-PREDICATED-NEXT:    br label %[[EXIT:.*]]
+; NO-ZVFBFMIN-PREDICATED:       [[SCALAR_PH]]:
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ]
+; NO-ZVFBFMIN-PREDICATED-NEXT:    br label %[[LOOP:.*]]
+; NO-ZVFBFMIN-PREDICATED:       [[LOOP]]:
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[I:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[I_NEXT:%.*]], %[[LOOP]] ]
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[A_GEP:%.*]] = getelementptr bfloat, ptr [[A]], i64 [[I]]
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[B_GEP:%.*]] = getelementptr bfloat, ptr [[B]], i64 [[I]]
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[X:%.*]] = load bfloat, ptr [[A_GEP]], align 2
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[Y:%.*]] = load bfloat, ptr [[B_GEP]], align 2
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[Z:%.*]] = fadd bfloat [[X]], [[Y]]
+; NO-ZVFBFMIN-PREDICATED-NEXT:    store bfloat [[Z]], ptr [[A_GEP]], align 2
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[I_NEXT]] = add i64 [[I]], 1
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[DONE:%.*]] = icmp eq i64 [[I_NEXT]], [[N]]
+; NO-ZVFBFMIN-PREDICATED-NEXT:    br i1 [[DONE]], label %[[EXIT]], label %[[LOOP]], !llvm.loop [[LOOP3:![0-9]+]]
+; NO-ZVFBFMIN-PREDICATED:       [[EXIT]]:
+; NO-ZVFBFMIN-PREDICATED-NEXT:    ret void
+;
 ; ZVFBFMIN-LABEL: define void @fadd(
 ; ZVFBFMIN-SAME: ptr noalias [[A:%.*]], ptr noalias [[B:%.*]], i64 [[N:%.*]]) #[[ATTR0:[0-9]+]] {
 ; ZVFBFMIN-NEXT:  [[ENTRY:.*]]:
@@ -133,6 +180,60 @@ define void @vfwmaccbf16.vv(ptr noalias %a, ptr noalias %b, ptr noalias %c, i64
 ; NO-ZVFBFMIN:       [[EXIT]]:
 ; NO-ZVFBFMIN-NEXT:    ret void
 ;
+; NO-ZVFBFMIN-PREDICATED-LABEL: define void @vfwmaccbf16.vv(
+; NO-ZVFBFMIN-PREDICATED-SAME: ptr noalias [[A:%.*]], ptr noalias [[B:%.*]], ptr noalias [[C:%.*]], i64 [[N:%.*]]) #[[ATTR0]] {
+; NO-ZVFBFMIN-PREDICATED-NEXT:  [[ENTRY:.*]]:
+; NO-ZVFBFMIN-PREDICATED-NEXT:    br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; NO-ZVFBFMIN-PREDICATED:       [[VECTOR_PH]]:
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[N_RND_UP:%.*]] = add i64 [[N]], 3
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], 4
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[TRIP_COUNT_MINUS_1:%.*]] = sub i64 [[N]], 1
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i64> poison, i64 [[TRIP_COUNT_MINUS_1]], i64 0
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i64> [[BROADCAST_SPLATINSERT]], <4 x i64> poison, <4 x i32> zeroinitializer
+; NO-ZVFBFMIN-PREDICATED-NEXT:    br label %[[VECTOR_BODY:.*]]
+; NO-ZVFBFMIN-PREDICATED:       [[VECTOR_BODY]]:
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[I:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <4 x i64> poison, i64 [[I]], i64 0
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[BROADCAST_SPLAT2:%.*]] = shufflevector <4 x i64> [[BROADCAST_SPLATINSERT1]], <4 x i64> poison, <4 x i32> zeroinitializer
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[VEC_IV:%.*]] = add <4 x i64> [[BROADCAST_SPLAT2]], <i64 0, i64 1, i64 2, i64 3>
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[TMP0:%.*]] = icmp ule <4 x i64> [[VEC_IV]], [[BROADCAST_SPLAT]]
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[A_GEP:%.*]] = getelementptr bfloat, ptr [[A]], i64 [[I]]
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[B_GEP:%.*]] = getelementptr bfloat, ptr [[B]], i64 [[I]]
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[C_GEP:%.*]] = getelementptr float, ptr [[C]], i64 [[I]]
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[WIDE_MASKED_LOAD:%.*]] = call <4 x bfloat> @llvm.masked.load.v4bf16.p0(ptr [[A_GEP]], i32 2, <4 x i1> [[TMP0]], <4 x bfloat> poison)
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[WIDE_MASKED_LOAD3:%.*]] = call <4 x bfloat> @llvm.masked.load.v4bf16.p0(ptr [[B_GEP]], i32 2, <4 x i1> [[TMP0]], <4 x bfloat> poison)
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[WIDE_MASKED_LOAD4:%.*]] = call <4 x float> @llvm.masked.load.v4f32.p0(ptr [[C_GEP]], i32 4, <4 x i1> [[TMP0]], <4 x float> poison)
+; NO-ZVFBFMIN-PREDICATED-NEXT:    [[TMP4:%.*]] = fpext <4 x bfloat> [[WIDE_MASKED_LOAD]] to <4 x float>
+...
[truncated]

ElvisWang123

LGTM!

preames

LGTM

llvm-ci · 2025-07-28T15:07:44Z

LLVM Buildbot has detected a new failure on builder lld-x86_64-ubuntu-fast running on as-builder-4 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/33/builds/21089

Here is the relevant piece of the build log for the reference

Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: Transforms/LoopVectorize/RISCV/bf16.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
/home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/opt < /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/Transforms/LoopVectorize/RISCV/bf16.ll -passes=loop-vectorize -mtriple riscv64 -mattr=+v -S | /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/FileCheck /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/Transforms/LoopVectorize/RISCV/bf16.ll -check-prefix=NO-ZVFBFMIN # RUN: at line 2
+ /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/opt -passes=loop-vectorize -mtriple riscv64 -mattr=+v -S
+ /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/FileCheck /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/Transforms/LoopVectorize/RISCV/bf16.ll -check-prefix=NO-ZVFBFMIN
/home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/opt < /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/Transforms/LoopVectorize/RISCV/bf16.ll -passes=loop-vectorize -mtriple riscv64 -mattr=+v -S -prefer-predicate-over-epilogue=predicate-else-scalar-epilogue | /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/FileCheck /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/Transforms/LoopVectorize/RISCV/bf16.ll -check-prefix=NO-ZVFBFMIN-PREDICATED # RUN: at line 3
+ /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/opt -passes=loop-vectorize -mtriple riscv64 -mattr=+v -S -prefer-predicate-over-epilogue=predicate-else-scalar-epilogue
+ /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/FileCheck /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/Transforms/LoopVectorize/RISCV/bf16.ll -check-prefix=NO-ZVFBFMIN-PREDICATED
/home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/Transforms/LoopVectorize/RISCV/bf16.ll:28:32: error: NO-ZVFBFMIN-PREDICATED-NEXT: expected string not found in input
; NO-ZVFBFMIN-PREDICATED-NEXT: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
                               ^
<stdin>:7:7: note: scanning from here
entry:
      ^
<stdin>:20:2: note: possible intended match here
 br i1 %done, label %exit, label %loop
 ^
/home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/Transforms/LoopVectorize/RISCV/bf16.ll:186:32: error: NO-ZVFBFMIN-PREDICATED-NEXT: expected string not found in input
; NO-ZVFBFMIN-PREDICATED-NEXT: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
                               ^
<stdin>:27:7: note: scanning from here
entry:
      ^
<stdin>:74:2: note: possible intended match here
 br i1 %done, label %exit, label %loop, !llvm.loop !3
 ^

Input file: <stdin>
Check file: /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/Transforms/LoopVectorize/RISCV/bf16.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
            1: ; ModuleID = '<stdin>' 
            2: source_filename = "<stdin>" 
            3: target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n32:64-S128" 
            4: target triple = "riscv64" 
            5:  
            6: define void @fadd(ptr noalias %a, ptr noalias %b, i64 %n) #0 { 
            7: entry: 
next:28'0            X error: no match found
            8:  br label %loop 
next:28'0      ~~~~~~~~~~~~~~~~
            9:  
next:28'0      ~
           10: loop: ; preds = %loop, %entry 
...

This fixes a failing test after the changes in #150908 affected the result in #150882.

llvm-ci · 2025-07-28T15:20:21Z

LLVM Buildbot has detected a new failure on builder clang-debian-cpp20 running on clang-debian-cpp20 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/108/builds/15843

Here is the relevant piece of the build log for the reference

Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: Transforms/LoopVectorize/RISCV/bf16.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
/vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/bin/opt < /vol/worker/clang-debian-cpp20/clang-debian-cpp20/llvm-project/llvm/test/Transforms/LoopVectorize/RISCV/bf16.ll -passes=loop-vectorize -mtriple riscv64 -mattr=+v -S | /vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/bin/FileCheck /vol/worker/clang-debian-cpp20/clang-debian-cpp20/llvm-project/llvm/test/Transforms/LoopVectorize/RISCV/bf16.ll -check-prefix=NO-ZVFBFMIN # RUN: at line 2
+ /vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/bin/opt -passes=loop-vectorize -mtriple riscv64 -mattr=+v -S
+ /vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/bin/FileCheck /vol/worker/clang-debian-cpp20/clang-debian-cpp20/llvm-project/llvm/test/Transforms/LoopVectorize/RISCV/bf16.ll -check-prefix=NO-ZVFBFMIN
/vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/bin/opt < /vol/worker/clang-debian-cpp20/clang-debian-cpp20/llvm-project/llvm/test/Transforms/LoopVectorize/RISCV/bf16.ll -passes=loop-vectorize -mtriple riscv64 -mattr=+v -S -prefer-predicate-over-epilogue=predicate-else-scalar-epilogue | /vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/bin/FileCheck /vol/worker/clang-debian-cpp20/clang-debian-cpp20/llvm-project/llvm/test/Transforms/LoopVectorize/RISCV/bf16.ll -check-prefix=NO-ZVFBFMIN-PREDICATED # RUN: at line 3
+ /vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/bin/opt -passes=loop-vectorize -mtriple riscv64 -mattr=+v -S -prefer-predicate-over-epilogue=predicate-else-scalar-epilogue
+ /vol/worker/clang-debian-cpp20/clang-debian-cpp20/build/bin/FileCheck /vol/worker/clang-debian-cpp20/clang-debian-cpp20/llvm-project/llvm/test/Transforms/LoopVectorize/RISCV/bf16.ll -check-prefix=NO-ZVFBFMIN-PREDICATED
/vol/worker/clang-debian-cpp20/clang-debian-cpp20/llvm-project/llvm/test/Transforms/LoopVectorize/RISCV/bf16.ll:28:32: error: NO-ZVFBFMIN-PREDICATED-NEXT: expected string not found in input
; NO-ZVFBFMIN-PREDICATED-NEXT: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
                               ^
<stdin>:7:7: note: scanning from here
entry:
      ^
<stdin>:20:2: note: possible intended match here
 br i1 %done, label %exit, label %loop
 ^
/vol/worker/clang-debian-cpp20/clang-debian-cpp20/llvm-project/llvm/test/Transforms/LoopVectorize/RISCV/bf16.ll:186:32: error: NO-ZVFBFMIN-PREDICATED-NEXT: expected string not found in input
; NO-ZVFBFMIN-PREDICATED-NEXT: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
                               ^
<stdin>:27:7: note: scanning from here
entry:
      ^
<stdin>:74:2: note: possible intended match here
 br i1 %done, label %exit, label %loop, !llvm.loop !3
 ^

Input file: <stdin>
Check file: /vol/worker/clang-debian-cpp20/clang-debian-cpp20/llvm-project/llvm/test/Transforms/LoopVectorize/RISCV/bf16.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
            1: ; ModuleID = '<stdin>' 
            2: source_filename = "<stdin>" 
            3: target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n32:64-S128" 
            4: target triple = "riscv64" 
            5:  
            6: define void @fadd(ptr noalias %a, ptr noalias %b, i64 %n) #0 { 
            7: entry: 
next:28'0            X error: no match found
            8:  br label %loop 
next:28'0      ~~~~~~~~~~~~~~~~
            9:  
next:28'0      ~
           10: loop: ; preds = %loop, %entry 
...

fhahn · 2025-07-28T15:21:45Z

llvm/test/Transforms/LoopVectorize/RISCV/bf16.ll

+; NO-ZVFBFMIN-PREDICATED-LABEL: define void @fadd(
+; NO-ZVFBFMIN-PREDICATED-SAME: ptr noalias [[A:%.*]], ptr noalias [[B:%.*]], i64 [[N:%.*]]) #[[ATTR0:[0-9]+]] {
+; NO-ZVFBFMIN-PREDICATED-NEXT:  [[ENTRY:.*]]:
+; NO-ZVFBFMIN-PREDICATED-NEXT:    br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]


It looks like the test is failing on current main. It doesn't get vectorized because only scalar instructions willl be generated. Is that expected?

The test should be fixed with 5f2092d, it was due to the interaction with #150908.

It shouldn't be vectorized because all the actual recipes with underlying instructions get scalarized, since there's no vector bf16 support.

I think there's a separate issue with tail folding where things like VPWidenCanonicalIVRecipe can end up counting as "emitting vector instructions", despite not being in the original IR. I think we need to check getUnderlyingValue() in willGenerateVectors

I've thrown up #150992 as a possible fix

llvm-ci · 2025-07-28T15:36:44Z

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-expensive-checks-debian running on gribozavr4 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/16/builds/23408

Here is the relevant piece of the build log for the reference

Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: Transforms/LoopVectorize/RISCV/f16.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
/b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/opt < /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/Transforms/LoopVectorize/RISCV/f16.ll -passes=loop-vectorize -mtriple riscv64 -mattr=+v -S | /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/Transforms/LoopVectorize/RISCV/f16.ll -check-prefix=NO-ZVFHMIN # RUN: at line 2
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/opt -passes=loop-vectorize -mtriple riscv64 -mattr=+v -S
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/Transforms/LoopVectorize/RISCV/f16.ll -check-prefix=NO-ZVFHMIN
/b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/opt < /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/Transforms/LoopVectorize/RISCV/f16.ll -passes=loop-vectorize -mtriple riscv64 -mattr=+v -S -prefer-predicate-over-epilogue=predicate-else-scalar-epilogue | /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/Transforms/LoopVectorize/RISCV/f16.ll -check-prefix=NO-ZVFHMIN-PREDICATED # RUN: at line 3
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/opt -passes=loop-vectorize -mtriple riscv64 -mattr=+v -S -prefer-predicate-over-epilogue=predicate-else-scalar-epilogue
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/Transforms/LoopVectorize/RISCV/f16.ll -check-prefix=NO-ZVFHMIN-PREDICATED
/b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/Transforms/LoopVectorize/RISCV/f16.ll:28:31: error: NO-ZVFHMIN-PREDICATED-NEXT: expected string not found in input
; NO-ZVFHMIN-PREDICATED-NEXT: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
                              ^
<stdin>:7:7: note: scanning from here
entry:
      ^
<stdin>:20:2: note: possible intended match here
 br i1 %done, label %exit, label %loop
 ^

Input file: <stdin>
Check file: /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/Transforms/LoopVectorize/RISCV/f16.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           1: ; ModuleID = '<stdin>' 
           2: source_filename = "<stdin>" 
           3: target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n32:64-S128" 
           4: target triple = "riscv64" 
           5:  
           6: define void @fadd(ptr noalias %a, ptr noalias %b, i64 %n) #0 { 
           7: entry: 
next:28'0           X error: no match found
           8:  br label %loop 
next:28'0     ~~~~~~~~~~~~~~~~
           9:  
next:28'0     ~
          10: loop: ; preds = %loop, %entry 
next:28'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          11:  %i = phi i64 [ 0, %entry ], [ %i.next, %loop ] 
next:28'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          12:  %a.gep = getelementptr half, ptr %a, i64 %i 
next:28'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           .
           .
           .
          15:  %y = load half, ptr %b.gep, align 2 
...

llvm-ci · 2025-07-28T15:44:38Z

LLVM Buildbot has detected a new failure on builder llvm-x86_64-debian-dylib running on gribozavr4 while building llvm at step 7 "test-build-unified-tree-check-llvm".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/60/builds/34039

Here is the relevant piece of the build log for the reference

Step 7 (test-build-unified-tree-check-llvm) failure: test (failure)
******************** TEST 'LLVM :: Transforms/LoopVectorize/RISCV/f16.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
/b/1/llvm-x86_64-debian-dylib/build/bin/opt < /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/Transforms/LoopVectorize/RISCV/f16.ll -passes=loop-vectorize -mtriple riscv64 -mattr=+v -S | /b/1/llvm-x86_64-debian-dylib/build/bin/FileCheck /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/Transforms/LoopVectorize/RISCV/f16.ll -check-prefix=NO-ZVFHMIN # RUN: at line 2
+ /b/1/llvm-x86_64-debian-dylib/build/bin/FileCheck /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/Transforms/LoopVectorize/RISCV/f16.ll -check-prefix=NO-ZVFHMIN
+ /b/1/llvm-x86_64-debian-dylib/build/bin/opt -passes=loop-vectorize -mtriple riscv64 -mattr=+v -S
/b/1/llvm-x86_64-debian-dylib/build/bin/opt < /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/Transforms/LoopVectorize/RISCV/f16.ll -passes=loop-vectorize -mtriple riscv64 -mattr=+v -S -prefer-predicate-over-epilogue=predicate-else-scalar-epilogue | /b/1/llvm-x86_64-debian-dylib/build/bin/FileCheck /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/Transforms/LoopVectorize/RISCV/f16.ll -check-prefix=NO-ZVFHMIN-PREDICATED # RUN: at line 3
+ /b/1/llvm-x86_64-debian-dylib/build/bin/opt -passes=loop-vectorize -mtriple riscv64 -mattr=+v -S -prefer-predicate-over-epilogue=predicate-else-scalar-epilogue
+ /b/1/llvm-x86_64-debian-dylib/build/bin/FileCheck /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/Transforms/LoopVectorize/RISCV/f16.ll -check-prefix=NO-ZVFHMIN-PREDICATED
/b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/Transforms/LoopVectorize/RISCV/f16.ll:28:31: error: NO-ZVFHMIN-PREDICATED-NEXT: expected string not found in input
; NO-ZVFHMIN-PREDICATED-NEXT: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
                              ^
<stdin>:7:7: note: scanning from here
entry:
      ^
<stdin>:20:2: note: possible intended match here
 br i1 %done, label %exit, label %loop
 ^

Input file: <stdin>
Check file: /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/Transforms/LoopVectorize/RISCV/f16.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           1: ; ModuleID = '<stdin>' 
           2: source_filename = "<stdin>" 
           3: target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n32:64-S128" 
           4: target triple = "riscv64" 
           5:  
           6: define void @fadd(ptr noalias %a, ptr noalias %b, i64 %n) #0 { 
           7: entry: 
next:28'0           X error: no match found
           8:  br label %loop 
next:28'0     ~~~~~~~~~~~~~~~~
           9:  
next:28'0     ~
          10: loop: ; preds = %loop, %entry 
next:28'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          11:  %i = phi i64 [ 0, %entry ], [ %i.next, %loop ] 
next:28'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          12:  %a.gep = getelementptr half, ptr %a, i64 %i 
next:28'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           .
           .
           .
          15:  %y = load half, ptr %b.gep, align 2 
...

llvm-ci · 2025-07-28T15:53:18Z

LLVM Buildbot has detected a new failure on builder clang-x86_64-debian-fast running on gribozavr4 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/56/builds/31935

Here is the relevant piece of the build log for the reference

Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: Transforms/LoopVectorize/RISCV/f16.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
/b/1/clang-x86_64-debian-fast/llvm.obj/bin/opt < /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/Transforms/LoopVectorize/RISCV/f16.ll -passes=loop-vectorize -mtriple riscv64 -mattr=+v -S | /b/1/clang-x86_64-debian-fast/llvm.obj/bin/FileCheck /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/Transforms/LoopVectorize/RISCV/f16.ll -check-prefix=NO-ZVFHMIN # RUN: at line 2
+ /b/1/clang-x86_64-debian-fast/llvm.obj/bin/opt -passes=loop-vectorize -mtriple riscv64 -mattr=+v -S
+ /b/1/clang-x86_64-debian-fast/llvm.obj/bin/FileCheck /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/Transforms/LoopVectorize/RISCV/f16.ll -check-prefix=NO-ZVFHMIN
/b/1/clang-x86_64-debian-fast/llvm.obj/bin/opt < /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/Transforms/LoopVectorize/RISCV/f16.ll -passes=loop-vectorize -mtriple riscv64 -mattr=+v -S -prefer-predicate-over-epilogue=predicate-else-scalar-epilogue | /b/1/clang-x86_64-debian-fast/llvm.obj/bin/FileCheck /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/Transforms/LoopVectorize/RISCV/f16.ll -check-prefix=NO-ZVFHMIN-PREDICATED # RUN: at line 3
+ /b/1/clang-x86_64-debian-fast/llvm.obj/bin/FileCheck /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/Transforms/LoopVectorize/RISCV/f16.ll -check-prefix=NO-ZVFHMIN-PREDICATED
+ /b/1/clang-x86_64-debian-fast/llvm.obj/bin/opt -passes=loop-vectorize -mtriple riscv64 -mattr=+v -S -prefer-predicate-over-epilogue=predicate-else-scalar-epilogue
/b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/Transforms/LoopVectorize/RISCV/f16.ll:28:31: error: NO-ZVFHMIN-PREDICATED-NEXT: expected string not found in input
; NO-ZVFHMIN-PREDICATED-NEXT: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
                              ^
<stdin>:7:7: note: scanning from here
entry:
      ^
<stdin>:20:2: note: possible intended match here
 br i1 %done, label %exit, label %loop
 ^

Input file: <stdin>
Check file: /b/1/clang-x86_64-debian-fast/llvm.src/llvm/test/Transforms/LoopVectorize/RISCV/f16.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           1: ; ModuleID = '<stdin>' 
           2: source_filename = "<stdin>" 
           3: target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n32:64-S128" 
           4: target triple = "riscv64" 
           5:  
           6: define void @fadd(ptr noalias %a, ptr noalias %b, i64 %n) #0 { 
           7: entry: 
next:28'0           X error: no match found
           8:  br label %loop 
next:28'0     ~~~~~~~~~~~~~~~~
           9:  
next:28'0     ~
          10: loop: ; preds = %loop, %entry 
next:28'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          11:  %i = phi i64 [ 0, %entry ], [ %i.next, %loop ] 
next:28'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          12:  %a.gep = getelementptr half, ptr %a, i64 %i 
next:28'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           .
           .
           .
          15:  %y = load half, ptr %b.gep, align 2 
...

mshockwave · 2025-07-29T17:48:09Z

Since TTI::isLegalMaskedLoad no longer returns false in the absent of zvfhmin/zvfbfmin extensions, ScalarizeMaskedMemIntrinPass now stops processing masked.load/store with f16/bf16, results in crashing the backend (that doesn't enable zvfhmin/zvfbfmin) since type legalizer doesn't know how to scalarize them.

@lukel97 do you think we should split into two separate TTI hooks, one for "whether data type is legal for load/store" the other for "whether masked.load/store operations are legal"?

lukel97 · 2025-07-29T18:06:06Z

Since TTI::isLegalMaskedLoad no longer returns false in the absent of zvfhmin/zvfbfmin extensions, ScalarizeMaskedMemIntrinPass now stops processing masked.load/store with f16/bf16, results in crashing the backend (that doesn't enable zvfhmin/zvfbfmin) since type legalizer doesn't know how to scalarize them.

@lukel97 do you think we should split into two separate TTI hooks, one for "whether data type is legal for load/store" the other for "whether masked.load/store operations are legal"?

Oh, actually can we even lower a plain unmasked f16/bf16 load/store today without zvfmin/zvfbfmin?

If we can't because the vector type isn't legal then maybe we should just revert this and make the unmasked loads and stores illegal

mshockwave · 2025-07-29T18:19:15Z

Oh, actually can we even lower a plain unmasked f16/bf16 load/store today without zvfmin/zvfbfmin?

I guess you're referring to plain, unmasked vector f16/bf16 load/store: the type legalizer only knows how to scalarize f16/bf16 fixed vectors but not scalable vectors.

mshockwave · 2025-07-29T18:44:33Z

I think the question now is, how did non-predicate vectorization -- which I assume use unmasked load/store -- vectorize without zvfhmin/zvfbfmin.
Also, I noticed that in your test/Transforms/InterleavedAccess/RISCV/interleaved-accesses.ll, even now f16/bf16 are treat as legal, functions like load_factor2_bf16 are still not turned into segmented load/store, is that expected?

lukel97 · 2025-07-30T04:15:27Z

I think the question now is, how did non-predicate vectorization -- which I assume use unmasked load/store -- vectorize without zvfhmin/zvfbfmin.

IIUC isElementTypeLegalForScalableVector returns false without zvfhmin/zvfbfmin so the loop vectorizer chooses a fixed length VF, which is scalarized by the type legalizer

Also, I noticed that in your test/Transforms/InterleavedAccess/RISCV/interleaved-accesses.ll, even now f16/bf16 are treat as legal, functions like load_factor2_bf16 are still not turned into segmented load/store, is that expected?

Yeah I noticed that too, it must be because we check if the type is legal in RISCVInterleavedAccess.cpp:

bool RISCVTargetLowering::isLegalInterleavedAccessType(
    VectorType *VTy, unsigned Factor, Align Alignment, unsigned AddrSpace,
    const DataLayout &DL) const {
  EVT VT = getValueType(DL, VTy);
  // Don't lower vlseg/vsseg for vector types that can't be split.
  if (!isTypeLegal(VT))
    return false;

  if (!isLegalLoadStoreElementTypeForRVV(VT.getScalarType()) ||
      !allowsMemoryAccessForAlignment(VTy->getContext(), DL, VT, AddrSpace,
                                      Alignment))
    return false;

…l without zvfhmin/zvfbfmin (#150882)" This reverts commit fe4f6c1, but leaves the tests that were added. The original commit mistakenly assumed that if regular bf16/f16 loads and stores could be lowered without zvfbfmin/zvfhmin, then so too could masked loads/stores and gathers/scatters. However SelectionDAG can't actually type-legalize masked.load/stores since it needs to be done in ScalarizeMaskedMemIntrinPass. This was causing crashes on IREE because we now returned true for isLegalMaskedLoadStore. The original intent of this was to remove a discrepancy in the loop vectorizer tests whenever predication was enabled, but this has gone away after 92d0924. So I don't think we need to reapply this patch.

lukel97 · 2025-07-30T05:31:31Z

@mshockwave After thinking about this for a bit, I don't think we should be returning legal at all for these accesses, even for fixed-length vectors seeing as SelectionDAG can't type-legalize them. So I've just reverted the patch in 2a5ac19. I don't think we need to reapply it, since the actual test diff this was trying to fix was also separately fixed by 92d0924

mshockwave · 2025-07-30T17:06:48Z

@mshockwave After thinking about this for a bit, I don't think we should be returning legal at all for these accesses, even for fixed-length vectors seeing as SelectionDAG can't type-legalize them. So I've just reverted the patch in 2a5ac19. I don't think we need to reapply it, since the actual test diff this was trying to fix was also separately fixed by 92d0924

Thank you, this decision makes sense to me.

lukel97 added 2 commits July 27, 2025 23:30

Precommit tests

c598013

lukel97 requested review from preames, topperc, wangpc-pp and ElvisWang123 July 28, 2025 05:40

llvmbot added backend:RISC-V llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels Jul 28, 2025

lukel97 changed the title ~~[RISCV] Cost bf16/f16 vector loads and stores as legal without zvfhmin/zvfbfmin~~ [RISCV] Cost bf16/f16 vector non-unit memory accesses as legal without zvfhmin/zvfbfmin Jul 28, 2025

ElvisWang123 approved these changes Jul 28, 2025

View reviewed changes

preames approved these changes Jul 28, 2025

View reviewed changes

lukel97 merged commit fe4f6c1 into llvm:main Jul 28, 2025
13 checks passed

lukel97 added a commit that referenced this pull request Jul 28, 2025

[RISCV][LV] Update f16/bf16 loop vectorizer tests. NFC

5f2092d

This fixes a failing test after the changes in #150908 affected the result in #150882.

fhahn reviewed Jul 28, 2025

View reviewed changes

mshockwave mentioned this pull request Jul 29, 2025

[Integrate][RISCV] Many e2e math ops tests failed to compile in f16 cases iree-org/iree#21512

Open

[RISCV] Cost bf16/f16 vector non-unit memory accesses as legal without zvfhmin/zvfbfmin #150882

[RISCV] Cost bf16/f16 vector non-unit memory accesses as legal without zvfhmin/zvfbfmin #150882

Conversation

lukel97 commented Jul 28, 2025

Uh oh!

llvmbot commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jul 28, 2025

Uh oh!

ElvisWang123 left a comment

Choose a reason for hiding this comment

Uh oh!

preames left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvm-ci commented Jul 28, 2025

Uh oh!

llvm-ci commented Jul 28, 2025

Uh oh!

fhahn Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

lukel97 Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

lukel97 Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

llvm-ci commented Jul 28, 2025

Uh oh!

llvm-ci commented Jul 28, 2025

Uh oh!

llvm-ci commented Jul 28, 2025

Uh oh!

mshockwave commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lukel97 commented Jul 29, 2025

Uh oh!

mshockwave commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mshockwave commented Jul 29, 2025

Uh oh!

lukel97 commented Jul 30, 2025

Uh oh!

lukel97 commented Jul 30, 2025

Uh oh!

mshockwave commented Jul 30, 2025

Uh oh!

Uh oh!

llvmbot commented Jul 28, 2025 •

edited

Loading

mshockwave commented Jul 29, 2025 •

edited

Loading

mshockwave commented Jul 29, 2025 •

edited

Loading