98 Fix bug where column prune won't work when compression is enabled#108
Conversation
…st client not honoring compression configuration
|
@luoyuxia @zhaohaidao Would appreciate a review here. |
There was a problem hiding this comment.
Pull request overview
This PR fixes a bug where column pruning fails after the Rust client writes to a log table. The root cause was that the Rust client wasn't honoring compression configuration when writing Arrow IPC data, causing parsing errors on read. The fix ensures that the Arrow writer respects the compression type specified in table properties.
- Added compression configuration support by parsing table properties for compression type and level
- Updated
MemoryLogRecordsArrowBuilderto accept and useArrowCompressionInfowhen creating the Arrow IPCStreamWriter - Integrated compression configuration into the write path from table config through accumulator to the Arrow writer
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| crates/fluss/src/compression/mod.rs | New module declaration for compression support |
| crates/fluss/src/compression/arrow_compression.rs | Implements ArrowCompressionType enum and ArrowCompressionInfo struct with parsing from table properties and validation |
| crates/fluss/src/lib.rs | Adds compression module to library exports |
| crates/fluss/src/metadata/table.rs | Adds get_arrow_compression_info() method to TableConfig for retrieving compression settings |
| crates/fluss/src/record/arrow.rs | Updates MemoryLogRecordsArrowBuilder to accept compression info and configures StreamWriter with appropriate IpcWriteOptions |
| crates/fluss/src/client/write/batch.rs | Updates ArrowLogWriteBatch::new() to accept and pass compression info to the arrow builder |
| crates/fluss/src/client/write/accumulator.rs | Retrieves compression info from table config and passes it when creating new write batches |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
luoyuxia
left a comment
There was a problem hiding this comment.
@leekeiabstraction Thanks for the pr. Currently, all test is set table.log.arrow.compression.type to none, could you please also remove this to verify your fix works.
luoyuxia
left a comment
There was a problem hiding this comment.
@leekeiabstraction Thanks. LGTM
|
Raised #109 to track ZSTD compression level support |
Sorry, I just saw this notification. Thank you so much for fixing it. |
Purpose
Linked issue: close #98
Fix bug where column prune is broken after rust client writes to log table due to parsing error caused by rust client not honoring compression configuration.
ZSTD Compression level config and other compression related concerns has not been hooked up, I suggest that we raise a separate issue/PR to follow that up.
Brief change log
Ensure that record batch writing honors compression type by instantiating ArrowWriter with compression type specified in table properties,
Tests
Added UTs for ArrowCompressionInfo/Type.
Manual testings verifying pruned scan on (zstd, lz4_frame, none compressed) records work.