Open
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Arrow has added REE support apache/arrow#14176, similar to dictionary arrays that allow repeated values to be encoded in a space efficient manner that also allows fast processing.
Describe the solution you'd like
Implement REE in arrow-rs. Some likely candidate:
- Support in DataType
- Support in ArrayData
- New REE array
- Support REE in IPC
- Support REE in cast kernels
- Support REE in compute kernels
Describe alternatives you've considered
Remaining tasks:
- arrow-row: Add support for REE #7649
- arrow-select: Implement concat for
RunArray
s #7487 - arrow-data: Add REE support for
build_extend
andbuild_extend_nulls
#7671 - Implement
PartialEq
for RunArray #7691 - Reduce repetition in tests for arrow-row/src/run.rs #7692
- Improve performance of RunArray --> Row conversion #7693
- Potential Optimization for interleave/take on RunEndEncoded arrays #7710
- [Draft]Implemented casting for RunEnd Encoding #7713
Additional context
Among other things, @brancz is working to improve aggregation performance in DataFusion using Runarrays, see