-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Find a way to communicate the ordering of a file back with the existing listing table implementation #13891
Comments
take |
One way to do this would be to write DataFusion specific metadata into the files (e..g add something to https://docs.rs/parquet/latest/parquet/file/properties/struct.WriterProperties.html#method.key_value_metadata) It would be great if we can avoid making something DataFusion specific. Maybe someone could do some research and find how other systems handle this |
There is field for sort column, but it seems rowgroup level metadata, so when we set the sort column to parquet, it will applied to rowgroup level metadata.
@alamb This is a good idea for file level metadata storage. And i am wandering do we need to add sort column to parquet file metadata also besides the row group level metadata, so we can use it in datafusion? |
Copying from #13933 (comment): So this would look something like:
|
Already updated the PR, thanks! |
Is your feature request related to a problem or challenge?
This is the follow-up for:
#13874 (review)
We add support (order by / sort) for DataFrameWriteOptions, but when a user try to query the table which the file already ordered, we can't get info from the table.
We need to find a way to communicate the ordering of a file back with the existing listing table implementation.
Describe the solution you'd like
It is also conceivable that DataFusion itself could write custom metadata in paquet and other formats that support that custom metadata with the ordering, but that seems like we can use iceberg and other table formats.
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: