-
Notifications
You must be signed in to change notification settings - Fork 190
Update results to Datafusion 46 #353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
CARGO_PROFILE_RELEASE_LTO=true RUSTFLAGS="-C codegen-units=1" cargo build --release | ||
cd arrow-datafusion/ | ||
git checkout 46.0.0 | ||
CARGO_PROFILE_RELEASE_LTO=true RUSTFLAGS="-C codegen-units=1" cargo build --release --package datafusion-cli --bin datafusion-cli |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifying package seems to help narrow the build scope by a bit
"tags": ["Rust", "column-oriented", "embedded", "stateless"], | ||
"load_time": 0, | ||
"data_size": 14779976446, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was just copy-pasted from the single file, and the data isn't actually the same size.
@@ -31,7 +31,7 @@ cat queries.sql | while read -r query; do | |||
# 2. each query contains a "Query took xxx seconds", we just grep these 2 lines | |||
# 3. use sed to take the second line | |||
# 4. use awk to take the number we want | |||
RES=`datafusion-cli -f $CREATE_SQL_FILE /tmp/query.sql 2>&1 | grep "Elapsed" |sed -n 2p | awk '{ print $2 }'` | |||
RES=$(datafusion-cli -f $CREATE_SQL_FILE /tmp/query.sql 2>&1 | grep "Elapsed" |sed -n 2p | awk '{ print $2 }') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was weird, but the backticks just didn't work? this seems to work well
@pmcgleenon do you mean just doing diff --git a/datafusion/queries.sql b/datafusion/queries.sql
-SELECT REGEXP_REPLACE("Referer", '^https?://(?:www\\.)?([^/]+)/.*$', '\\1') AS k, AVG(length("Referer")) AS l, COUNT(*) AS c, MIN("Referer") FROM hits WHERE "Referer" <> '' GROUP BY k HAVING COUNT(*) > 100000 ORDER BY l DESC LIMIT 25;
+SELECT REGEXP_REPLACE("Referer", '^https?://(?:www\.)?([^/]+)/.*$', '\1') AS k, AVG(length("Referer")) AS l, COUNT(*) AS c, MIN("Referer") FROM hits WHERE "Referer" <> '' GROUP BY k HAVING COUNT(*) > 100000 ORDER BY l DESC LIMIT 25; |
Yes exactly that - the bug in the CLI that required adding the additional escape characters has been fixed. So they can be safely removed now |
thanks for the heads up! fixed it and reran everything. |
Reran the benchmark for datafusion 46.0.0.
I ran into some minor issues with the scripts, so I fixed all the issues I ran into, including the README reflect the fact that since it was initially written tests moved to Ubuntu from Amazon Linux 2 (which seems to be the right move in the spirit of ClickBench)