feat: add table partition scanning by lemorage · Pull Request #222 · apache/fluss-rust

lemorage · 2026-01-29T03:05:25Z

Purpose

Linked issue: close #203

This PR implements support for scanning partitioned tables in the Fluss Rust client.

Brief change log

Update TableBucket::new() to accept partition_id parameter, and all call sites
Add filter_partition() to TableScan builder
Add subscribe_partition() to LogScanner/RecordBatchLogScanner

Tests

UT (Added):

subscribe_with_partition_creates_correct_table_bucket
subscribe_partition_overrides_stored_partition
subscribe_without_partition_uses_none

API and Format

New APIs:

TableScan.filter_partition(partition_id)
LogScanner.subscribe_partition(partition_id, bucket, offset)

Documentation

Copilot

Pull request overview

Implements initial support for scanning partitioned tables in the Fluss Rust client by threading partition_id through scan/bucket identification and adding partition-specific subscription APIs.

Changes:

Extend TableBucket to carry an optional partition_id and update call sites accordingly.
Add TableScan::filter_partition(...) and subscribe_partition(...) APIs on scanners to target a partition.
Update fetch request construction to include partition_id per bucket and add tests around partition subscription behavior.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
crates/fluss/src/metadata/table.rs	Extends `TableBucket` constructor to accept `partition_id`.
crates/fluss/src/client/table/scanner.rs	Adds partition filtering/subscription APIs and includes `partition_id` in fetch requests.
crates/fluss/src/cluster/cluster.rs	Updates `TableBucket::new` call sites (currently still uses `partition_id=None`).
crates/fluss/src/client/write/sender.rs	Updates `TableBucket::new` call sites for write response handling.
crates/fluss/src/client/table/lookup.rs	Updates `TableBucket::new` call site for lookup leader resolution.
crates/fluss/src/client/table/remote_log.rs	Updates test helper to construct `TableBucket` with `partition_id=None`.
crates/fluss/src/client/table/log_fetch_buffer.rs	Updates tests to use new `TableBucket::new` signature.
crates/fluss/src/client/metadata.rs	Updates tests to use new `TableBucket::new` signature.
crates/fluss/src/client/admin.rs	Updates `TableBucket::new` call sites when materializing snapshot/leader mappings.
crates/fluss/src/util/mod.rs	Updates unit tests to the new `TableBucket::new` signature.
crates/fluss/src/record/mod.rs	Updates unit tests to the new `TableBucket::new` signature.
crates/fluss/src/test_utils.rs	Updates test cluster builder to use `partition_id=None`.
crates/fluss/tests/integration/table.rs	Updates integration test to use new `TableBucket::new` signature.
bindings/python/src/metadata.rs	Updates Python binding conversion to core `TableBucket` with `partition_id=None`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-29T04:09:01Z

+    pub fn new(table_id: TableId, partition_id: Option<PartitionId>, bucket: BucketId) -> Self {
+        Self {
            table_id,


Changing TableBucket::new to require partition_id is a breaking change for external callers. If this type is part of the public client API, consider keeping the old new(table_id, bucket) (defaulting partition_id=None) and adding a new constructor like new_with_partition(...) to preserve backwards compatibility.

Suggested change

pub fn new(table_id: TableId, partition_id: Option<PartitionId>, bucket: BucketId) -> Self {

Self {

table_id,

pub fn new(table_id: TableId, bucket: BucketId) -> Self {

Self {

table_id,

partition_id: None,

bucket,

}

}

pub fn new_with_partition(

table_id: TableId,

partition_id: Option<PartitionId>,

bucket: BucketId,

) -> Self {

Self {

table_id,

Copilot · 2026-01-29T04:09:01Z

    async fn check_and_update_metadata(&self) -> Result<()> {
-        let need_update = self
+        // Collect buckets that are missing leader information
+        let buckets_needing_leader: Vec<TableBucket> = self
            .fetchable_buckets()
-            .iter()
-            .any(|bucket| self.get_table_bucket_leader(bucket).is_none());
+            .into_iter()
+            .filter(|bucket| self.get_table_bucket_leader(bucket).is_none())
+            .collect();


buckets_needing_leader is only used to check is_empty(), so collecting into a Vec adds an avoidable allocation. Consider reverting to an iterator any(...) boolean, or keep the Vec only if you’ll use it later (e.g., to drive a partition-aware metadata refresh).

Copilot · 2026-01-29T04:09:02Z

+        // Non-partitioned table: standard metadata refresh
        self.metadata
            .update_tables_metadata(&HashSet::from([&self.table_path]))
            .await


Both the is_partitioned branch above and the non-partitioned path here execute essentially the same update_tables_metadata(...).await.or_else(...) logic. Consider factoring the retrying metadata refresh into a shared helper (or unifying the branches) to avoid drift and simplify future partition-aware enhancements.

+1 Did you intend to update metadata only for those with missing leader? Note, we should follow Java side logic where possible

luoyuxia

@lemorage Thanks for your pr. But since interation test is required, I think this pr is blocked by #202

lemorage · 2026-01-29T13:50:53Z

Thanks! I'll see!

leekeiabstraction

Thank you very much for the PR. I've left comments, PTAL!

leekeiabstraction · 2026-01-30T19:43:39Z

        })
    }
+
+    pub fn filter_partition(mut self, partition_id: PartitionId) -> Self {


Is this API available in Java side?

This doesn't seem necessary as partitioned table scan is called with partition ID on java side. Calling subscribe without partition ID for partitioned table results in exception on Java side.

https://github.com/apache/fluss/blob/71b625ff0c1638539f6089eb727a698f080f92b4/fluss-client/src/main/java/org/apache/fluss/client/table/scanner/log/LogScannerImpl.java#L192-L198

leekeiabstraction · 2026-01-30T19:44:25Z

    conn: &'a FlussConnection,
    table_info: TableInfo,
    metadata: Arc<Metadata>,
+    partition_id: Option<PartitionId>,


TableScan shouldn't be partition aware.

leekeiabstraction · 2026-01-30T19:48:30Z

@@ -337,7 +349,7 @@ impl LogScannerInner {
    }

    async fn subscribe(&self, bucket: i32, offset: i64) -> Result<()> {


Let's follow Java side logic here, returning error if this is a partitioned table.

https://github.com/apache/fluss/blob/71b625ff0c1638539f6089eb727a698f080f92b4/fluss-client/src/main/java/org/apache/fluss/client/table/scanner/log/LogScannerImpl.java#L175-L180

leekeiabstraction · 2026-01-30T19:51:23Z

    async fn check_and_update_metadata(&self) -> Result<()> {
-        let need_update = self
+        // Collect buckets that are missing leader information
+        let buckets_needing_leader: Vec<TableBucket> = self
            .fetchable_buckets()
-            .iter()
-            .any(|bucket| self.get_table_bucket_leader(bucket).is_none());
+            .into_iter()
+            .filter(|bucket| self.get_table_bucket_leader(bucket).is_none())
+            .collect();


leekeiabstraction · 2026-01-30T19:53:54Z

+        // Non-partitioned table: standard metadata refresh
        self.metadata
            .update_tables_metadata(&HashSet::from([&self.table_path]))
            .await


+1 Did you intend to update metadata only for those with missing leader? Note, we should follow Java side logic where possible

leekeiabstraction · 2026-01-30T19:57:19Z


        for bucket_resp in response.buckets_resp() {
-            let tb = TableBucket::new(table_id, bucket_resp.bucket_id());
+            let tb = TableBucket::new(table_id, None, bucket_resp.bucket_id());


Should partition id be extracted from response?

See https://github.com/apache/fluss/blob/71b625ff0c1638539f6089eb727a698f080f92b4/fluss-client/src/main/java/org/apache/fluss/client/write/Sender.java#L460-L462

leekeiabstraction · 2026-01-30T19:59:15Z


        for bucket_id in buckets {
-            let table_bucket = TableBucket::new(table_id, *bucket_id);
+            let table_bucket = TableBucket::new(table_id, None, *bucket_id);


Partition id is one of the arg of function, pass in that instead of None.

leekeiabstraction · 2026-01-30T20:06:34Z

    ) -> Result<TableBucket> {
        let table_info = self.get_table(table_path)?;
-        Ok(TableBucket::new(table_info.table_id, bucket_id))
+        Ok(TableBucket::new(table_info.table_id, None, bucket_id))


I do not think we can use None here, this function is used to group requests for the same bucket (partitioned bucket as well). If we override with None here, the accumulator will look for leader in a non existent bucket for partitioned table.

luoyuxia · 2026-01-30T23:55:25Z

@lemorage Since #228 is merged. You can add it for scan partitioned table now.

lemorage · 2026-02-01T14:45:33Z

Working on it! Thanks folks!

leekeiabstraction · 2026-01-31T10:38:50Z

    /// Convert to core TableBucket (internal use)
    pub fn to_core(&self) -> fcore::metadata::TableBucket {
-        fcore::metadata::TableBucket::new(self.table_id, self.bucket)
+        fcore::metadata::TableBucket::new(self.table_id, None, self.bucket)


This should take self.partition_id as well

luoyuxia · 2026-02-03T12:43:25Z

@lemorage Hi, is there any progress on this pr?

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

luoyuxia · 2026-02-04T03:58:44Z

@lemorage Thanks for the pr and Thank @leekeiabstraction for review. I'm going to merge this. Feel free to left comment in this pr, and I'll address them in following pr

lemorage · 2026-02-04T10:14:23Z

I am so so sorry for the delayed follow-up. I did some work on my local branch, but a bit kept by other stuff, and haven't got them clean up and pushed them. @luoyuxia Thank you so much for your rapid work on the rest. If there are anything I need do further on my side, do let me know. Thank you all for the long delay on me.

luoyuxia · 2026-02-04T11:15:17Z

I am so so sorry for the delayed follow-up. I did some work on my local branch, but a bit kept by other stuff, and haven't got them clean up and pushed them. @luoyuxia Thank you so much for your rapid work on the rest. If there are anything I need do further on my side, do let me know. Thank you all for the long delay on me.

No worries at all! Since we have an upcoming release deadline, I went ahead and handled the remaining parts to keep us on track. Your base pull request is much appreciated, and we’d love to have more of your contributions in the future!

luoyuxia requested a review from Copilot January 29, 2026 04:05

luoyuxia assigned lemorage Jan 29, 2026

Copilot started reviewing on behalf of luoyuxia January 29, 2026 04:05 View session

Copilot AI reviewed Jan 29, 2026

View reviewed changes

luoyuxia reviewed Jan 29, 2026

View reviewed changes

leekeiabstraction reviewed Jan 30, 2026

View reviewed changes

leekeiabstraction reviewed Feb 2, 2026

View reviewed changes

lemorage and others added 3 commits February 4, 2026 10:43

feat: support partition scanning

8296c32

test: add unit tests for partition scanning

0c3bfb6

rebase main branch

82f56ed

luoyuxia force-pushed the feat branch from 3f0734e to 82f56ed Compare February 4, 2026 02:46

luoyuxia requested a review from Copilot February 4, 2026 03:44

Copilot started reviewing on behalf of luoyuxia February 4, 2026 03:44 View session

Copilot AI reviewed Feb 4, 2026

View reviewed changes

Comment thread crates/fluss/src/client/table/scanner.rs

Comment thread crates/fluss/src/metadata/table.rs Outdated

Comment thread crates/fluss/tests/integration/log_table.rs Outdated

add yuxia modification

0a7f9a0

luoyuxia force-pushed the feat branch from c4d0512 to 0a7f9a0 Compare February 4, 2026 03:57

luoyuxia merged commit 7d794f7 into apache:main Feb 4, 2026
13 checks passed

		@@ -337,7 +349,7 @@ impl LogScannerInner {
		}

		async fn subscribe(&self, bucket: i32, offset: i64) -> Result<()> {

Conversation

lemorage commented Jan 29, 2026

Purpose

Brief change log

Tests

API and Format

Documentation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

luoyuxia left a comment

Choose a reason for hiding this comment

Uh oh!

lemorage commented Jan 29, 2026

Uh oh!

leekeiabstraction left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

luoyuxia commented Jan 30, 2026

Uh oh!

lemorage commented Feb 1, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

luoyuxia commented Feb 3, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

luoyuxia commented Feb 4, 2026

Uh oh!

Uh oh!

lemorage commented Feb 4, 2026

Uh oh!

luoyuxia commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development