Arrow adoption #2
Replies: 4 comments 4 replies
-
Beta Was this translation helpful? Give feedback.
-
Cfr. the teams meeting, providing documentation in the power-grid-model-io project is a good enough first step. Issue PowerGridModel/power-grid-model-io#190 is created and will be picked up next sprint. |
Beta Was this translation helpful? Give feedback.
-
Support for columnar data formats (PowerGridModel/power-grid-model#548) is in preview now. That should also provide a better fit to data in Arrow format. We intend to do a minor version bump (to the Official support for PyArrow data format would be a new feature in the |
Beta Was this translation helpful? Give feedback.
-
We are close to the release for fully supporting the columnar data format in PGM. Meanwhile sharing an update: |
Beta Was this translation helpful? Give feedback.
-
Recent (versions of) dataframe libraries such as pandas and polars in Python support Arrow as the underlying memory backend. Arrow is quickly becoming the de facto standard for columnar storage. It's is an in-memory columnar data format that can improve the performance of data processing and analysis.
By using Arrow, we can avoid the cost of serialization and deserialization when passing data between different parts of our system, and take advantage of optimizations that are specific to columnar data. Many other data processing and analysis tools already support Arrow, so adding Arrow integration to PGM would make it easier for users to work with this library in conjunction with other tools. Even between different languages this memory can be shared, which in the context of Alliander still using R a lot could mean that PGM could be easier to adopt in that language too.
I'd love to hear your thoughts on this. Do you think it makes sense to add Arrow integration to PGM? Are there any potential downsides or challenges that I'm not aware of? Please let me know your opinions and feedback.
Beta Was this translation helpful? Give feedback.
All reactions