Skip to content

Support Cpp bindings#83

Merged
luoyuxia merged 10 commits into
apache:mainfrom
zhaohaidao:cppbinding
Dec 13, 2025
Merged

Support Cpp bindings#83
luoyuxia merged 10 commits into
apache:mainfrom
zhaohaidao:cppbinding

Conversation

@zhaohaidao
Copy link
Copy Markdown
Contributor

Purpose

Linked issue: close #67

Brief change log

Tests

API and Format

Documentation

@zhaohaidao zhaohaidao changed the title (WIP)Support Cpp bindings Support Cpp bindings Dec 8, 2025
@zhaohaidao
Copy link
Copy Markdown
Contributor Author

@luoyuxia Hi, yuxia, PTAL if u have time.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds C++ language bindings for the Fluss client library, enabling C++ applications to interact with Fluss tables. The implementation uses the cxx library to create a safe FFI bridge between Rust and C++, exposing core functionality including connection management, admin operations, table operations, append writers, and log scanners.

Key changes:

  • Implements FFI bridge layer using cxx for type-safe Rust-C++ interop
  • Adds C++ wrapper classes with RAII resource management for Connection, Admin, Table, AppendWriter, and LogScanner
  • Includes comprehensive example demonstrating table creation, data insertion, scanning, and column projection
  • Modifies Config struct to manually implement Default trait to avoid conflicts with clap's Parser derive

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
bindings/cpp/src/lib.rs Rust FFI implementation defining the bridge interface and core bindings logic
bindings/cpp/src/types.rs Type conversion utilities between FFI types and Fluss core types
bindings/cpp/src/connection.cpp C++ implementation of Connection class
bindings/cpp/src/admin.cpp C++ implementation of Admin class for table management
bindings/cpp/src/table.cpp C++ implementation of Table, AppendWriter, and LogScanner classes
bindings/cpp/src/ffi_converter.hpp Helper utilities for converting between C++ and FFI types
bindings/cpp/include/fluss.hpp Public C++ API header with all class definitions and types
bindings/cpp/examples/example.cpp Comprehensive usage example demonstrating all features
bindings/cpp/build.rs Build script for cxx bridge code generation
bindings/cpp/Cargo.toml Rust package configuration for C++ bindings
bindings/cpp/CMakeLists.txt CMake build configuration
bindings/cpp/.clang-format Code formatting configuration
bindings/cpp/.gitignore Git ignore rules for build artifacts
crates/fluss/src/config.rs Manual Default implementation to avoid clap derive conflicts
Cargo.toml Workspace updated to include cpp bindings

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

if (!Available()) {
return utils::make_error(1, "Connection not available");
}

Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential memory leak: If out.table_ is already non-null when GetTable is called, the existing resource will be overwritten without being freed, causing a memory leak. Consider calling out.Destroy() before assigning the new pointer, or check and free the existing resource first.

Suggested change
// Free any existing resource before overwriting out.table_
out.Destroy();

Copilot uses AI. Check for mistakes.
}

Result Connection::Connect(const std::string& bootstrap_server, Connection& out) {
try {
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential memory leak: If out.conn_ is already non-null when Connect is called, the existing resource will be overwritten without being freed, causing a memory leak. Consider calling out.Destroy() before assigning the new pointer, or check and free the existing resource first.

Suggested change
try {
try {
out.Destroy();

Copilot uses AI. Check for mistakes.
Comment thread bindings/cpp/src/table.cpp
Comment thread bindings/cpp/src/types.rs Outdated
datum.i64_val = array.value(row_id);
datum
}
_ => panic!("Unsupported Time64 unit for column {}", i),
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Panics can cause undefined behavior when crossing FFI boundaries. Instead of panic!, consider returning an error Result or using a default/fallback value. The panic will not be caught by C++ exception handlers and will abort the process.

Copilot uses AI. Check for mistakes.
Comment thread bindings/cpp/src/types.rs Outdated
}
_ => panic!("Unsupported Time64 unit for column {}", i),
},
other => panic!("Unsupported Arrow data type for column {}: {:?}", i, other),
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Panics can cause undefined behavior when crossing FFI boundaries. Instead of panic!, consider returning an error Result or using a default/fallback value. The panic will not be caught by C++ exception handlers and will abort the process.

Copilot uses AI. Check for mistakes.
Comment thread bindings/cpp/src/table.cpp
Comment thread bindings/cpp/src/table.cpp
Comment thread bindings/cpp/src/types.rs Outdated
Comment thread bindings/cpp/src/types.rs
Comment on lines +350 to +466
.expect("LargeUtf8 column expected");
let mut datum = new_datum(DATUM_TYPE_STRING);
datum.string_val = array.value(row_id).to_string();
datum
}
ArrowDataType::Binary => {
let mut datum = new_datum(DATUM_TYPE_BYTES);
datum.bytes_val = row.get_bytes(i);
datum
}
ArrowDataType::FixedSizeBinary(len) => {
let mut datum = new_datum(DATUM_TYPE_BYTES);
datum.bytes_val = row.get_binary(i, *len as usize);
datum
}
ArrowDataType::LargeBinary => {
let array = record_batch
.column(i)
.as_any()
.downcast_ref::<LargeBinaryArray>()
.expect("LargeBinary column expected");
let mut datum = new_datum(DATUM_TYPE_BYTES);
datum.bytes_val = array.value(row_id).to_vec();
datum
}
ArrowDataType::Date32 => {
let array = record_batch
.column(i)
.as_any()
.downcast_ref::<Date32Array>()
.expect("Date32 column expected");
let mut datum = new_datum(DATUM_TYPE_INT32);
datum.i32_val = array.value(row_id);
datum
}
ArrowDataType::Timestamp(unit, _) => match unit {
TimeUnit::Second => {
let array = record_batch
.column(i)
.as_any()
.downcast_ref::<TimestampSecondArray>()
.expect("Timestamp(second) column expected");
let mut datum = new_datum(DATUM_TYPE_INT64);
datum.i64_val = array.value(row_id);
datum
}
TimeUnit::Millisecond => {
let array = record_batch
.column(i)
.as_any()
.downcast_ref::<TimestampMillisecondArray>()
.expect("Timestamp(millisecond) column expected");
let mut datum = new_datum(DATUM_TYPE_INT64);
datum.i64_val = array.value(row_id);
datum
}
TimeUnit::Microsecond => {
let array = record_batch
.column(i)
.as_any()
.downcast_ref::<TimestampMicrosecondArray>()
.expect("Timestamp(microsecond) column expected");
let mut datum = new_datum(DATUM_TYPE_INT64);
datum.i64_val = array.value(row_id);
datum
}
TimeUnit::Nanosecond => {
let array = record_batch
.column(i)
.as_any()
.downcast_ref::<TimestampNanosecondArray>()
.expect("Timestamp(nanosecond) column expected");
let mut datum = new_datum(DATUM_TYPE_INT64);
datum.i64_val = array.value(row_id);
datum
}
},
ArrowDataType::Time32(unit) => match unit {
TimeUnit::Second => {
let array = record_batch
.column(i)
.as_any()
.downcast_ref::<Time32SecondArray>()
.expect("Time32(second) column expected");
let mut datum = new_datum(DATUM_TYPE_INT32);
datum.i32_val = array.value(row_id);
datum
}
TimeUnit::Millisecond => {
let array = record_batch
.column(i)
.as_any()
.downcast_ref::<Time32MillisecondArray>()
.expect("Time32(millisecond) column expected");
let mut datum = new_datum(DATUM_TYPE_INT32);
datum.i32_val = array.value(row_id);
datum
}
_ => panic!("Unsupported Time32 unit for column {}", i),
},
ArrowDataType::Time64(unit) => match unit {
TimeUnit::Microsecond => {
let array = record_batch
.column(i)
.as_any()
.downcast_ref::<Time64MicrosecondArray>()
.expect("Time64(microsecond) column expected");
let mut datum = new_datum(DATUM_TYPE_INT64);
datum.i64_val = array.value(row_id);
datum
}
TimeUnit::Nanosecond => {
let array = record_batch
.column(i)
.as_any()
.downcast_ref::<Time64NanosecondArray>()
.expect("Time64(nanosecond) column expected");
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expect() calls throughout this function can panic if the downcasts fail, which will cause undefined behavior when crossing FFI boundaries. Consider using pattern matching with proper error handling instead of expect(), or return a Result type that can be properly handled by the C++ caller.

Copilot uses AI. Check for mistakes.
Comment thread bindings/cpp/src/lib.rs
Copy link
Copy Markdown
Contributor

@luoyuxia luoyuxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhaohaidao Thanks for the pr. Left some comments. PTAL

auto descriptor = fluss::TableDescriptor::NewBuilder()
.SetSchema(schema)
.SetBucketCount(1)
.SetProperty("table.log.arrow.compression.type", "NONE")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious about why set compression type to NONE? Is there any bug when compression is not null?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, when compression and column pruning are enabled simultaneously, an "out of range" error occurs. I verified this with both Java and Rust, and both had this problem. I was probably using Fluss version 0.6 or 0.7 at the time; I'm not sure if the latest version has fixed it. I mentioned this issue in the ##57 as followed.
image

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explantion. But looks werid to me since compression and column pruning are enabled in our internal production env and no any exception happen. I'm afraid of it's a special case for rust client. Could you please help check it when you got some time. If it did a Fluss issue, could you please help create the issue in Fluss repo?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explantion. But looks werid to me since compression and column pruning are enabled in our internal production env and no any exception happen. I'm afraid of it's a special case for rust client. Could you please help check it when you got some time. If it did a Fluss issue, could you please help create the issue in Fluss repo?

No problem. According to our plan, our internal use cases also require both compression and column prouning to be enabled simultaneously, so we will follow up on resolving this issue soon.

Comment thread bindings/cpp/include/fluss.hpp Outdated
std::string string_val;
std::vector<uint8_t> bytes_val;

static Datum Null() { return Datum(); }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

static Datum Null() { return {}; }

?

Schema schema;
};

struct Datum {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seem for even a bool value, Datum will occupy more bytes, right?
We can consider to optimze it in the future version. Two thought in here:

  • use cpp variant
  • rust side emit arrow record batch, and cpp side wrap the arrow record batch to provide row api

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.
I've created a separate issue to document potential bottlenecks caused by copying. What do you think? I've seen you mention copying as potentially optimizeable in several places.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#87

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense to me. Let's first built it and improve one step by step.

datum.i64_val = ffi_datum.i64_val;
datum.f32_val = ffi_datum.f32_val;
datum.f64_val = ffi_datum.f64_val;
datum.string_val = std::string(ffi_datum.string_val);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seem here need to a string copy?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure is there any effient way. If complex, maybe left a todo to mark it. It'll reminds us if we find any bottle neck

Comment thread bindings/cpp/src/types.rs Outdated
Comment thread bindings/cpp/src/types.rs
}
ArrowDataType::Utf8 => {
let mut datum = new_datum(DATUM_TYPE_STRING);
datum.string_val = row.get_string(i).to_string();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It aslo need a string copy. Not sure whether it's easy or not to avoid this copy. We can left a todo to remind us the string copy wil happen in here.

Comment thread bindings/cpp/src/types.rs
Comment thread bindings/cpp/src/types.rs Outdated
Comment thread bindings/cpp/src/types.rs Outdated
Copy link
Copy Markdown
Contributor

@luoyuxia luoyuxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhaohaidao Thanks for the pr. LGTM

@luoyuxia
Copy link
Copy Markdown
Contributor

@zhaohaidao You can use

cargo clippy --all-targets --fix --allow-dirty --allow-staged

to make clippy happy

@luoyuxia luoyuxia merged commit a34f1c1 into apache:main Dec 13, 2025
13 checks passed
@zhaohaidao
Copy link
Copy Markdown
Contributor Author

@zhaohaidao You can use

cargo clippy --all-targets --fix --allow-dirty --allow-staged

to make clippy happy

Thanks for the suggestion, the Rust toolchain is indeed very useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Introduce cpp binding

3 participants