Open
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Parquet writing performance is not very good. The arrow_writer
microbenchmark shows throughput for a batch of primitives to be around 200MiB/s. For a column-oriented format that seems rather low, but a profiling run shows no obvious single bottleneck.
Describe the solution you'd like
Investigate and improve the performance.
- Optimize counting of values and nulls
- Avoid asserts called in a loop in
BitWriter::put_value
- Avoids bounds checks in
flush_bit_packed_run
- Optimize iteration in
LevelInfoBuilder::write_leaf
- Avoid cloning null buffer or recalculating logical nulls
- Do not collect
non_null_indices
and gather these into a new Vec for non-nullable arrays - Optimize writing bit-packed runs (bit width is 1 for levels most of the time, always writes 8 values except for last run)
- Change
get_min_max
to check logical/converted types outside of loop
Describe alternatives you've considered
Additional context