[test] Add Rust integration coverage for RecordBatchLogReader bounded reads. by slfan1989 · Pull Request #559 · apache/fluss-rust

slfan1989 · 2026-05-23T05:48:16Z

Purpose

Linked issue: close #558

Add Rust-level integration coverage for RecordBatchLogReader bounded reads.

RecordBatchLogReader already has unit coverage for batch filtering and Python binding coverage for to_arrow_batch_reader guard/drop behavior, but it lacks Rust end-to-end integration coverage for:

new_until_offsets stopping semantics
new_until_latest with partitioned log tables

This PR adds those missing integration tests.

Brief change log

Added a Rust integration test for RecordBatchLogReader::new_until_offsets.
- Creates a log table.
- Appends records.
- Subscribes from a non-zero offset.
- Reads with an explicit stopping offset.
- Verifies records at or beyond the stop offset are not returned.
Added a Rust integration test for RecordBatchLogReader::new_until_latest on partitioned tables.
- Creates a partitioned log table.
- Adds US and EU partitions.
- Appends records to both partitions.
- Subscribes partition buckets.
- Verifies all records present at reader creation are returned.
Updated the comment in RecordBatchLogReader to point to the new Rust integration coverage.

Tests

Verified locally.

API and Format

No API or storage format changes.

Documentation

No user-facing documentation changes.

slfan1989 · 2026-05-23T05:51:04Z

@fresh-borzoni @leekeiabstraction Could you please help review this PR when you have time?

fresh-borzoni

@slfan1989 Ty for the PR, great addition, LGTM overall, couple of minor comments 👍

Let's move this to a separate file for testing RecordBatchLogReader feature and then add some additional scenarios:
*until_offsets_with_empty_range (stop == start -> no batches)

until_offsets_past_end_of_log (graceful finish past actual end)
until_offsets_multi_bucket (multiple buckets in the HashMap)

also extract_ids_from_batches overlaps with extract_ids in test_project test, mb we wish to factor this out to utils and reuse.

fresh-borzoni · 2026-05-23T23:07:41Z

@charlesdong1991 Can you also take a look, please?
Since this was initially your feature to propose :)

Copilot

Pull request overview

Adds Rust integration tests to cover RecordBatchLogReader bounded-read semantics end-to-end (closing #558), complementing existing unit/Python coverage.

Changes:

Added an integration test verifying RecordBatchLogReader::new_until_offsets stops before an explicit upper offset.
Added an integration test verifying RecordBatchLogReader::new_until_latest returns all records present at reader creation for partitioned log tables.
Updated an internal comment to point to the new Rust integration coverage location.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`crates/fluss/tests/integration/log_table.rs`	Adds two new Rust integration tests and a shared helper for validating bounded reader behavior.
`crates/fluss/src/client/table/reader.rs`	Updates an in-code comment to reference the new Rust integration coverage.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    fn extract_ids_from_batches(batches: &[ScanBatch]) -> Vec<i32> {
+        batches
+            .iter()
+            .flat_map(|b| {
+                let batch = b.batch();
+                (0..batch.num_rows()).map(move |i| {
+                    batch
+                        .column(0)
+                        .as_any()
+                        .downcast_ref::<Int32Array>()
+                        .expect("id column should be Int32")
+                        .value(i)
+                })
+            })
+            .collect()
+    }


slfan1989 · 2026-05-25T05:13:29Z

@slfan1989 Ty for the PR, great addition, LGTM overall, couple of minor comments 👍

Let's move this to a separate file for testing RecordBatchLogReader feature and then add some additional scenarios: *until_offsets_with_empty_range (stop == start -> no batches)

until_offsets_past_end_of_log (graceful finish past actual end)

until_offsets_multi_bucket (multiple buckets in the HashMap)

also extract_ids_from_batches overlaps with extract_ids in test_project test, mb we wish to factor this out to utils and reuse.

@fresh-borzoni Thank you for the review and suggestions!

Addressed most of the comments:

Moved the RecordBatchLogReader integration coverage into a dedicated test file: record_batch_log_reader.rs.
Added until_offsets_with_empty_range.
Added until_offsets_multi_bucket.
Moved extract_ids_from_batches into integration/utils.rs and reused it from both log_table.rs and record_batch_log_reader.rs.

I left until_offsets_past_end_of_log out for now because it requires a behavior/API decision.

The current new_until_offsets implementation waits until the requested stop offset is observed. If the stop offset is beyond the current log end, it keeps polling for future records rather than finishing.

Supporting graceful completion would require snapshot-style semantics by querying latest offsets at reader creation, which needs FlussAdmin and async work.

I think that should be handled separately from this test-only PR.

Thanks again!

charlesdong1991

thanks for the PR, overall very nice, just minor comments!

charlesdong1991 · 2026-05-25T07:49:25Z

+    }
+
+    #[tokio::test]
+    async fn until_latest_reads_partitioned_table() {


query_latest_offsets should be applied on both partitioned and non-partitioned tables, can we check if non-partition one has integration test? and if not, add it?

Thanks for pointing this out!

Added until_latest_reads_non_partitioned_table, so new_until_latest now has integration coverage for both non-partitioned and partitioned tables.

charlesdong1991 · 2026-05-25T07:53:12Z

+            .expect("Failed to list partition infos")
+        {
+            scanner
+                .subscribe_partition(partition.get_partition_id(), 0, EARLIEST_OFFSET)


i assume we use 0 here because default bucket number if 1, can you add an inline comment here for future reference if we want to have multiple buckets?

Added an inline comment explaining that bucket 0 is used because the table uses the default single-bucket layout, and that future multi-bucket coverage should subscribe all buckets per partition.

charlesdong1991 · 2026-05-25T07:59:15Z

+ */
+
+#[cfg(test)]
+mod reader_test {


until_offsets_past_end_of_log (graceful finish past actual end)

i think what @fresh-borzoni meant might be the scenario where stop_at past current end, and future records arrive and cross it, which i think doesn't need API change, basically to cover wait and then resume behaviour of the loop.

API change probably is needed if we want immediate graceful completion against arbtrary offsets

yes, I should have been more specific. Sorry for the confusion

Got it, that makes sense. I interpreted past_end_of_log as immediate completion when the requested stop offset is beyond the current log end, which would require querying/clamping latest offsets.

I'll add a test for the wait-and-resume scenario instead: create a reader with a stop offset past the current end, start collecting, append more records later so the log crosses the stop offset, and verify the reader resumes and finishes.

fresh-borzoni · 2026-05-25T12:08:34Z

@slfan1989 Ty for the changes, LGTM
Can you rebase and resolve conflicts?

… reads.

slfan1989 · 2026-05-25T14:07:48Z

@slfan1989 Ty for the changes, LGTM
Can you rebase and resolve conflicts?

@fresh-borzoni Thanks for the review!

I have rebased the branch on the latest main and resolved the conflicts. The updated changes have been pushed, and I’m waiting for CI to finish now.

fresh-borzoni

@slfan1989 LGTM 👍

@charlesdong1991 Do you have additional comments?

charlesdong1991

thanks for the changes, very nice! 👍

fresh-borzoni · 2026-05-25T17:15:07Z

@slfan1989 Thank you for the contribution, merged 👍

fresh-borzoni reviewed May 23, 2026

View reviewed changes

luoyuxia requested a review from Copilot May 24, 2026 06:41

Copilot started reviewing on behalf of luoyuxia May 24, 2026 06:41 View session

Copilot AI reviewed May 24, 2026

View reviewed changes

charlesdong1991 reviewed May 25, 2026

View reviewed changes

slfan1989 force-pushed the fluss-558 branch from 70fad39 to edb7f39 Compare May 25, 2026 13:49

slfan1989 changed the title ~~[FLUSS #558] [test] Add Rust integration coverage for RecordBatchLogReader bounded reads.~~ [test] Add Rust integration coverage for RecordBatchLogReader bounded reads. May 25, 2026

[test] Add Rust integration coverage for RecordBatchLogReader bounded…

9c28bc6

… reads.

slfan1989 force-pushed the fluss-558 branch from edb7f39 to 9c28bc6 Compare May 25, 2026 13:57

fresh-borzoni approved these changes May 25, 2026

View reviewed changes

charlesdong1991 approved these changes May 25, 2026

View reviewed changes

fresh-borzoni merged commit 4a836ea into apache:main May 25, 2026
10 checks passed

Conversation

slfan1989 commented May 23, 2026

Purpose

Brief change log

Tests

API and Format

Documentation

Uh oh!

slfan1989 commented May 23, 2026

Uh oh!

fresh-borzoni left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fresh-borzoni commented May 23, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

slfan1989 commented May 25, 2026

Uh oh!

charlesdong1991 left a comment

Choose a reason for hiding this comment

Uh oh!

charlesdong1991 May 25, 2026

Choose a reason for hiding this comment

Uh oh!

slfan1989 May 25, 2026

Choose a reason for hiding this comment

Uh oh!

charlesdong1991 May 25, 2026

Choose a reason for hiding this comment

Uh oh!

slfan1989 May 25, 2026

Choose a reason for hiding this comment

Uh oh!

charlesdong1991 May 25, 2026

Choose a reason for hiding this comment

Uh oh!

fresh-borzoni May 25, 2026

Choose a reason for hiding this comment

Uh oh!

slfan1989 May 25, 2026

Choose a reason for hiding this comment

Uh oh!

fresh-borzoni commented May 25, 2026

Uh oh!

slfan1989 commented May 25, 2026

Uh oh!

fresh-borzoni left a comment

Choose a reason for hiding this comment

Uh oh!

charlesdong1991 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fresh-borzoni commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fresh-borzoni left a comment •

edited

Loading