Backend
VL (Velox)
Bug description
How to reproduce this issue
Run UT in GlutenInjectRuntimeFilterSuite
test("xxx") {
withSQLConf(
SQLConf.RUNTIME_BLOOM_FILTER_APPLICATION_SIDE_SCAN_SIZE_THRESHOLD.key -> "3000",
SQLConf.DYNAMIC_PARTITION_PRUNING_ENABLED.key -> "false",
SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "2000"
) {
withTable("bf5_text") {
spark.range(10000).toDF("a5").selectExpr("rpad('bf5_id_' || a5, 1024, 'x') as a5")
.write.format("text").saveAsTable("bf5_text")
assertRewroteWithBloomFilter(
"select * from bf5_text join bf2 on " +
"bf5_text.a5 = bf2.c2 where bf2.a2 = 67")
}
}
}
How does this issue happen
- Stage 0: gluten build a native bloom filter
from bf2 where bf2.a2 = 67
- Stage 1: gluten will try to offload filter and tableScan operator to gluten, but fallback to java due to the text datasource is not supported.
- In stage1: spark will try to read the bloom filter in java side and throw exception
Unexpected Bloom filter version number (16777217)
Our production case
Error plan:
Error stack:
java.io.IOException: Unexpected Bloom filter version number (16777217)
at org.apache.spark.util.sketch.BloomFilterImpl.readFrom0(BloomFilterImpl.java:251)
at org.apache.spark.util.sketch.BloomFilterImpl.readFrom(BloomFilterImpl.java:260)
at org.apache.spark.util.sketch.BloomFilterImpl.readFrom(BloomFilterImpl.java:266)
at org.apache.spark.util.sketch.BloomFilter.readFrom(BloomFilter.java:185)
at org.apache.spark.sql.catalyst.expressions.BloomFilterMightContain.deserialize(BloomFilterMightContain.scala:120)
at org.apache.spark.sql.catalyst.expressions.BloomFilterMightContain.bloomFilter$lzycompute(BloomFilterMightContain.scala:92)
at org.apache.spark.sql.catalyst.expressions.BloomFilterMightContain.bloomFilter(BloomFilterMightContain.scala:90)
at org.apache.spark.sql.catalyst.expressions.BloomFilterMightContain.doGenCode(BloomFilterMightContain.scala:105)
at org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:207)
at scala.Option.getOrElse(Option.scala:201)
at org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:202)
at org.apache.spark.sql.execution.GeneratePredicateHelper.genPredicate$1(basicPhysicalOperators.scala:156)
at org.apache.spark.sql.execution.GeneratePredicateHelper.$anonfun$generatePredicateCode$4(basicPhysicalOperators.scala:200)
at scala.collection.immutable.List.map(List.scala:247)
at scala.collection.immutable.List.map(List.scala:79)
at org.apache.spark.sql.execution.GeneratePredicateHelper.generatePredicateCode(basicPhysicalOperators.scala:181)
at org.apache.spark.sql.execution.GeneratePredicateHelper.generatePredicateCode$(basicPhysicalOperators.scala:140)
at org.apache.spark.sql.execution.FilterExec.generatePredicateCode(basicPhysicalOperators.scala:220)
at org.apache.spark.sql.execution.FilterExec.doConsume(basicPhysicalOperators.scala:253)
at org.apache.spark.sql.execution.CodegenSupport.consume(WholeStageCodegenExec.scala:198)
at org.apache.spark.sql.execution.CodegenSupport.consume$(WholeStageCodegenExec.scala:153)
Gluten version
Gluten 1.5
Spark version
Spark 4.0
Spark configurations
None
System information
None
Relevant logs
Backend
VL (Velox)
Bug description
How to reproduce this issue
Run UT in GlutenInjectRuntimeFilterSuite
How does this issue happen
from bf2 where bf2.a2 = 67Unexpected Bloom filter version number (16777217)Our production case
Error plan:
Error stack:
Gluten version
Gluten 1.5
Spark version
Spark 4.0
Spark configurations
None
System information
None
Relevant logs