-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: Move hive partitioning outside of readers #20203
base: main
Are you sure you want to change the base?
Conversation
This is a very early stage PR exploring whether it is possible to take the hive partitioning out of the readers. This will also make the Hive partitioning complete file type agnostic. Currently, this PR does not do anything unless `POLARS_NEW_HIVE` is set to `1`. Here is a rough todo list. - [x] Row Indexes - [x] Include File Paths - [x] Projection Pushdown - [ ] Slicing - [ ] Hive Predicates - [ ] Lazy Loading of Non-Hive data - [ ] Allow Missing Columns - [ ] Other file types besides Parquet - [ ] New streaming engine sink
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #20203 +/- ##
==========================================
- Coverage 79.63% 79.56% -0.08%
==========================================
Files 1564 1566 +2
Lines 217804 218364 +560
Branches 2477 2475 -2
==========================================
+ Hits 173439 173731 +292
- Misses 43796 44066 +270
+ Partials 569 567 -2 ☔ View full report in Codecov by Sentry. |
} | ||
|
||
pub struct HiveExec { | ||
sources: ScanSources, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can accept a generic here. One that take the filtered out paths and creates a Box<dyn Executror>
. Then this node doesn't have to know anything about the actual implementations of parquet, csv, etc. The planner deals with that.
bd71732
to
99d6a93
Compare
This is a very early stage PR exploring whether it is possible to take the hive partitioning out of the readers. This will also make the Hive partitioning complete file type agnostic.
Currently, this PR does not do anything unless
POLARS_NEW_HIVE
is set to1
.Here is a rough todo list.
ping @ritchie46, @nameexhaustion