feat: bloom filter pushdown by xanderbailey · Pull Request #2398 · apache/iceberg-rust

xanderbailey · 2026-05-02T13:21:10Z

Which issue does this PR close?

Closes #.

What changes are included in this PR?

Adds bloom filter pushdown for equality predicates during Parquet reads. When enabled, the reader loads bloom filters from row group column chunks and uses them to skip row groups that definitely don't contain the queried values.

Key points:

New bloom_filter_enabled option on TableScanBuilder and ArrowReaderBuilder (off by default since it requires extra I/O per column per row group)
Only loads bloom filters for columns referenced in eq or in predicates — range predicates and other operators are ignored

Are these changes tested?

Unit tests covering the bloom filter evaluator: eq/in present/absent, AND/OR/NOT logic, all decimal physical types (INT32, INT64, FIXED_LEN_BYTE_ARRAY), negative values, missing bloom filters etc
Integration tests writing multi-row-group Parquet files with bloom filters enabled and verifying end-to-end row group pruning

xanderbailey · 2026-05-02T13:21:57Z

+    /// against them to filter out row groups that definitely don't match.
+    async fn filter_row_groups_by_bloom_filter(
+        predicate: &crate::expr::BoundPredicate,
+        builder: &mut ParquetRecordBatchStreamBuilder<ArrowFileReader>,


mut reference is because get_row_group_column_bloom_filter requires it.

…-endian (#2397) ## Which issue does this PR close?  - Closes #. Found this whilst working on #2398 ## What changes are included in this PR? [Spec](https://iceberg.apache.org/spec/#binary-single-value-serialization) says `Int128` and `UInt128` are big-endian not little-endian and indeed we are using big-endian [here](https://github.com/apache/iceberg-rust/blob/c1538de36dd53e491299b62ad89286f2db496bc7/crates/iceberg/src/arrow/schema.rs#L761) for example. I think it's just the doc string which needs correcting.  ## Are these changes tested?

xanderbailey added 3 commits May 2, 2026 13:54

feat: bloom filter pushdown

68b0a39

test

08b461d

avoid copy

d649891

xanderbailey commented May 2, 2026

View reviewed changes

xanderbailey added 3 commits May 2, 2026 14:24

new name

fbad184

fmt

9b14011

doc string

86c50b7

xanderbailey mentioned this pull request May 2, 2026

docs: Correct docs for Int128 and UInt128 being big-endian not little-endian #2397

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: bloom filter pushdown#2398

feat: bloom filter pushdown#2398
xanderbailey wants to merge 6 commits intoapache:mainfrom
xanderbailey:xb/bloom_filter_pushdown

xanderbailey commented May 2, 2026

Uh oh!

xanderbailey May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

xanderbailey commented May 2, 2026

Which issue does this PR close?

What changes are included in this PR?

Are these changes tested?

Uh oh!

xanderbailey May 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant