Skip to content

[test] Add Rust integration coverage for RecordBatchLogReader bounded reads.#559

Merged
fresh-borzoni merged 1 commit into
apache:mainfrom
slfan1989:fluss-558
May 25, 2026
Merged

[test] Add Rust integration coverage for RecordBatchLogReader bounded reads.#559
fresh-borzoni merged 1 commit into
apache:mainfrom
slfan1989:fluss-558

Conversation

@slfan1989
Copy link
Copy Markdown
Contributor

Purpose

Linked issue: close #558

Add Rust-level integration coverage for RecordBatchLogReader bounded reads.

RecordBatchLogReader already has unit coverage for batch filtering and Python binding coverage for to_arrow_batch_reader guard/drop behavior, but it lacks Rust end-to-end integration coverage for:

  • new_until_offsets stopping semantics
  • new_until_latest with partitioned log tables

This PR adds those missing integration tests.

Brief change log

  • Added a Rust integration test for RecordBatchLogReader::new_until_offsets.

    • Creates a log table.
    • Appends records.
    • Subscribes from a non-zero offset.
    • Reads with an explicit stopping offset.
    • Verifies records at or beyond the stop offset are not returned.
  • Added a Rust integration test for RecordBatchLogReader::new_until_latest on partitioned tables.

    • Creates a partitioned log table.
    • Adds US and EU partitions.
    • Appends records to both partitions.
    • Subscribes partition buckets.
    • Verifies all records present at reader creation are returned.
  • Updated the comment in RecordBatchLogReader to point to the new Rust integration coverage.

Tests

Verified locally.

API and Format

No API or storage format changes.

Documentation

No user-facing documentation changes.

@slfan1989
Copy link
Copy Markdown
Contributor Author

@fresh-borzoni @leekeiabstraction Could you please help review this PR when you have time?

Copy link
Copy Markdown
Member

@fresh-borzoni fresh-borzoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@slfan1989 Ty for the PR, great addition, LGTM overall, couple of minor comments 👍

Let's move this to a separate file for testing RecordBatchLogReader feature and then add some additional scenarios:
*until_offsets_with_empty_range (stop == start -> no batches)

  • until_offsets_past_end_of_log (graceful finish past actual end)
  • until_offsets_multi_bucket (multiple buckets in the HashMap)

also extract_ids_from_batches overlaps with extract_ids in test_project test, mb we wish to factor this out to utils and reuse.

@fresh-borzoni
Copy link
Copy Markdown
Member

@charlesdong1991 Can you also take a look, please?
Since this was initially your feature to propose :)

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Rust integration tests to cover RecordBatchLogReader bounded-read semantics end-to-end (closing #558), complementing existing unit/Python coverage.

Changes:

  • Added an integration test verifying RecordBatchLogReader::new_until_offsets stops before an explicit upper offset.
  • Added an integration test verifying RecordBatchLogReader::new_until_latest returns all records present at reader creation for partitioned log tables.
  • Updated an internal comment to point to the new Rust integration coverage location.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
crates/fluss/tests/integration/log_table.rs Adds two new Rust integration tests and a shared helper for validating bounded reader behavior.
crates/fluss/src/client/table/reader.rs Updates an in-code comment to reference the new Rust integration coverage.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +455 to +470
fn extract_ids_from_batches(batches: &[ScanBatch]) -> Vec<i32> {
batches
.iter()
.flat_map(|b| {
let batch = b.batch();
(0..batch.num_rows()).map(move |i| {
batch
.column(0)
.as_any()
.downcast_ref::<Int32Array>()
.expect("id column should be Int32")
.value(i)
})
})
.collect()
}
@slfan1989
Copy link
Copy Markdown
Contributor Author

@slfan1989 Ty for the PR, great addition, LGTM overall, couple of minor comments 👍

Let's move this to a separate file for testing RecordBatchLogReader feature and then add some additional scenarios: *until_offsets_with_empty_range (stop == start -> no batches)

  • until_offsets_past_end_of_log (graceful finish past actual end)
  • until_offsets_multi_bucket (multiple buckets in the HashMap)

also extract_ids_from_batches overlaps with extract_ids in test_project test, mb we wish to factor this out to utils and reuse.

@fresh-borzoni Thank you for the review and suggestions!

Addressed most of the comments:

  • Moved the RecordBatchLogReader integration coverage into a dedicated test file: record_batch_log_reader.rs.
  • Added until_offsets_with_empty_range.
  • Added until_offsets_multi_bucket.
  • Moved extract_ids_from_batches into integration/utils.rs and reused it from both log_table.rs and record_batch_log_reader.rs.

I left until_offsets_past_end_of_log out for now because it requires a behavior/API decision.

The current new_until_offsets implementation waits until the requested stop offset is observed. If the stop offset is beyond the current log end, it keeps polling for future records rather than finishing.

Supporting graceful completion would require snapshot-style semantics by querying latest offsets at reader creation, which needs FlussAdmin and async work.

I think that should be handled separately from this test-only PR.

Thanks again!

Copy link
Copy Markdown
Contributor

@charlesdong1991 charlesdong1991 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the PR, overall very nice, just minor comments!

}

#[tokio::test]
async fn until_latest_reads_partitioned_table() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

query_latest_offsets should be applied on both partitioned and non-partitioned tables, can we check if non-partition one has integration test? and if not, add it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out!

Added until_latest_reads_non_partitioned_table, so new_until_latest now has integration coverage for both non-partitioned and partitioned tables.

.expect("Failed to list partition infos")
{
scanner
.subscribe_partition(partition.get_partition_id(), 0, EARLIEST_OFFSET)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i assume we use 0 here because default bucket number if 1, can you add an inline comment here for future reference if we want to have multiple buckets?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an inline comment explaining that bucket 0 is used because the table uses the default single-bucket layout, and that future multi-bucket coverage should subscribe all buckets per partition.

*/

#[cfg(test)]
mod reader_test {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

until_offsets_past_end_of_log (graceful finish past actual end)

i think what @fresh-borzoni meant might be the scenario where stop_at past current end, and future records arrive and cross it, which i think doesn't need API change, basically to cover wait and then resume behaviour of the loop.

API change probably is needed if we want immediate graceful completion against arbtrary offsets

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I should have been more specific. Sorry for the confusion

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, that makes sense. I interpreted past_end_of_log as immediate completion when the requested stop offset is beyond the current log end, which would require querying/clamping latest offsets.

I'll add a test for the wait-and-resume scenario instead: create a reader with a stop offset past the current end, start collecting, append more records later so the log crosses the stop offset, and verify the reader resumes and finishes.

@fresh-borzoni
Copy link
Copy Markdown
Member

@slfan1989 Ty for the changes, LGTM
Can you rebase and resolve conflicts?

@slfan1989 slfan1989 changed the title [FLUSS #558] [test] Add Rust integration coverage for RecordBatchLogReader bounded reads. [test] Add Rust integration coverage for RecordBatchLogReader bounded reads. May 25, 2026
@slfan1989
Copy link
Copy Markdown
Contributor Author

@slfan1989 Ty for the changes, LGTM
Can you rebase and resolve conflicts?

@fresh-borzoni Thanks for the review!

I have rebased the branch on the latest main and resolved the conflicts. The updated changes have been pushed, and I’m waiting for CI to finish now.

Copy link
Copy Markdown
Member

@fresh-borzoni fresh-borzoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@slfan1989 LGTM 👍

@charlesdong1991 Do you have additional comments?

Copy link
Copy Markdown
Contributor

@charlesdong1991 charlesdong1991 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the changes, very nice! 👍

@fresh-borzoni fresh-borzoni merged commit 4a836ea into apache:main May 25, 2026
10 checks passed
@fresh-borzoni
Copy link
Copy Markdown
Member

@slfan1989 Thank you for the contribution, merged 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[test] Add Rust integration coverage for RecordBatchLogReader bounded reads

4 participants