-
Notifications
You must be signed in to change notification settings - Fork 94
Description
Following the guide here https://awkward-array.org/how-to-convert-buffers.html it instructs to use ak.to_buffers
in order to write HDF5 files. However, the output files can become unnecessary large very easily.
Please consider the following example
import numpy as np
import awkward as ak
arr = ak.Array({"x": np.random.rand(1000)})
mask = [0, 2]
arr = arr[mask]
form, length, container = ak.to_buffers(arr)
container
, which will get saved to the file, contains an array of 1000 numbers, even though we only want 2 of them. It doesn't have to be 1000, in fact this number can be much larger.
What I think would be very nice here is an option to have the container be restricted to only the data that is necessary. This could even be an additional function, condensing an awkward array so that it is compact in memory.
I know that flattening can have a similar effect, but it doesn't work on arrays with records. Surprisingly doing something like ak.from_array(ak.to_arrow(arr))
has the desired effect on the array. However, this seems to be a very crude workaround.