-
Notifications
You must be signed in to change notification settings - Fork 961
Support multi-threaded writing of Parquet files with modular encryption #7818
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
ee38fb5
to
cd5196f
Compare
cd5196f
to
c2ac2d9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I like the approach of exposing the ArrowRowGroupWriterFactory
to handle passing through the file encryption properties. We should probably also update the documentation on ArrowColumnWriter
to encourage users to use this API, and/or add documentation to get_column_writers
to encourage users to use this approach for compatibility with encryption.
There seem to be a lot of extra things that have been exposed publicly in this PR that don't need to be though, which makes it hard to follow what changes are actually needed. We should minimize how much is public so it's easier for users to follow documentation and see which APIs they should be using, and so that internal details can be changed later.
Co-authored-by: Adam Reeve <[email protected]>
86ddf45
to
b35d078
Compare
b35d078
to
2d42ca0
Compare
Thanks for the quick review @adamreeve ! |
bc293de
to
4009f42
Compare
4009f42
to
b57a5d4
Compare
Closes #7359.
Rationale for this change
This is to enable concurrent column writing with encryption downstream (e.g. with datafusion). See #7359 for more.
See https://github.com/apache/arrow-rs/pull/7111/files#r2015196618
What changes are included in this PR?
ArrowRowGroupWriterFactory.create_row_group_writer
now passes aFileEncryptor
toArrowColumnWriterFactory
that can later be used to write columns concurrently.pub
.Are these changes tested?
Yes.
Are there any user-facing changes?
ArrowRowGroupWriterFactory
,ArrowRowGroupWriter
,ArrowRowGroupWriterFactory::new
,ArrowRowGroupWriterFactory.create_row_group_writer
, .. are nowpub
. Shouldn't be breaking but perhaps needs review.