Ibis benchmarking: DuckDB, DataFusion, Polars – Ibis #10179
-
Ibis benchmarking: DuckDB, DataFusion, Polars – Ibisthe portable Python dataframe library |
Beta Was this translation helpful? Give feedback.
Answered by
lostmygithubaccount
Sep 20, 2024
Replies: 1 comment 2 replies
-
Did I miss the part that you "write the results of the execution time in a parquet file? If not, what system was faster (sum of all the queries for a given engine)? |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
hi @alberttwong, the results of the TPC-H queries are written out to Parquet files and discarded (to ensure the results are materialized uniformly), but this does not contain the runtimes
the runtimes are stored as JSON and compacted into Parquet files, then uploaded to a public GCS bucket so you can perform you own analysis. they results are a bit old at this point and I plan on improving the benchmarking (e.g. capturing memory usage) and running on newer versions soon
it's also not necessarily straightforward to compare each system, as sometimes queries fail on one but not others. some are better at some scale factors. in general, Polars is the best when data size is small relative to R…