Commit 73cb7e8
committed
refactor(physical-plan): extract make_group_column factory + tighten Time variants + dhat harness
Preparation PR for #22682 / #22715 (full GroupValuesColumn type
coverage). No new builders; this PR only refactors the dispatch so
subsequent PRs that add FixedSizeList / Struct / List support can plug
into one well-defined factory instead of growing the inline match in
`GroupValuesColumn::intern`. Includes two adjacent fixes adriangb
asked for in the upstream PR thread.
## What changed
1. **Factory extraction.** The inline match that maps each schema
field to a `Box<dyn GroupColumn>` builder moves out of
`GroupValuesColumn::intern` into a free function
`make_group_column(field: &Field) -> Result<Box<dyn GroupColumn>>`.
`intern` is now a one-liner loop that delegates per-field. Future
nested-type specializations recursively call this factory for child
field construction without enumerating every combination inline.
2. **Time32 / Time64 supported_type alignment.** Previously
`supported_type` matched `Time32(_)` (admitting the invalid
Microsecond / Nanosecond combinations) and did not match `Time64(_)`
at all, while the dispatcher accepted `Time32(Second / Millisecond)`
and `Time64(Microsecond / Nanosecond)`. Tighten `supported_type` to
the exact set the dispatcher constructs. The dispatcher's
wildcard arms for invalid Time variants now return
`not_impl_err` instead of silently doing nothing, so a schema that
somehow reaches the dispatcher fails loudly rather than producing
an empty builder vector.
3. **supported_type ↔ make_group_column consistency fuzz.** New unit
test `supported_type_and_make_group_column_stay_in_sync` iterates a
representative set of 20 supported and 6 unsupported `DataType`s
and asserts the biconditional. Pins the alignment so future
contributors who add a type to one side without the other trip a
unit test immediately.
4. **dhat heap regression harness behind `dhat-heap` feature.** Adds
an optional `dhat` dependency, a new `dhat-heap` feature in
`physical-plan/Cargo.toml`, a feature-gated global allocator in
`lib.rs`, and a `dhat_tests` submodule that wraps a
representative `GroupValuesColumn::intern` workload in
`dhat::Profiler` and asserts peak heap stays under a budget.
Default `cargo test` does not compile or run dhat. Subsequent
type-support PRs should lower the budget to reflect their
savings.
## Testing
- `cargo test -p datafusion-physical-plan --lib aggregates::group_values`
passes 30 tests (27 existing + 3 new: the consistency fuzz, the
mixed-schema rejection, and the NotImpl propagation).
- `cargo test -p datafusion-physical-plan --lib aggregates` passes
105 tests (no regression in the broader aggregate suite).
- `cargo test -p datafusion-physical-plan --features dhat-heap
aggregates::group_values::multi_group_by::dhat_tests` passes 1 test
with peak heap well under the 1 MiB budget on a 1k-row workload.
- `cargo clippy -p datafusion-physical-plan --lib --tests --features
dhat-heap -- -D warnings` is clean.
## What this PR is NOT doing
- No new `GroupColumn` builders. `FixedSizeList`, `Struct`, `List`,
`LargeList` come in subsequent PRs of the #22682 sequence.
- No new types in `supported_type`'s allow-list (the Time tightening
removes invalid combinations rather than adding new ones).
- No behavior change for any schema that already used
`GroupValuesColumn` on main: the factory is the same dispatch logic
pulled into a function, and the Time changes only affect schemas
that are semantically invalid Arrow types anyway.
This is PR 1 of the 5-PR split discussed at
#22706 (comment).1 parent f033199 commit 73cb7e8
4 files changed
Lines changed: 456 additions & 174 deletions
File tree
- datafusion/physical-plan
- src
- aggregates/group_values/multi_group_by
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
42 | 42 | | |
43 | 43 | | |
44 | 44 | | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
45 | 50 | | |
46 | 51 | | |
47 | 52 | | |
| |||
76 | 81 | | |
77 | 82 | | |
78 | 83 | | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
79 | 87 | | |
80 | 88 | | |
81 | 89 | | |
| |||
0 commit comments