Skip to content

Update results to Datafusion 46 #353

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 21, 2025

Conversation

AdamGS
Copy link
Contributor

@AdamGS AdamGS commented Apr 17, 2025

Reran the benchmark for datafusion 46.0.0.
I ran into some minor issues with the scripts, so I fixed all the issues I ran into, including the README reflect the fact that since it was initially written tests moved to Ubuntu from Amazon Linux 2 (which seems to be the right move in the spirit of ClickBench)

CARGO_PROFILE_RELEASE_LTO=true RUSTFLAGS="-C codegen-units=1" cargo build --release
cd arrow-datafusion/
git checkout 46.0.0
CARGO_PROFILE_RELEASE_LTO=true RUSTFLAGS="-C codegen-units=1" cargo build --release --package datafusion-cli --bin datafusion-cli
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifying package seems to help narrow the build scope by a bit

"tags": ["Rust", "column-oriented", "embedded", "stateless"],
"load_time": 0,
"data_size": 14779976446,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was just copy-pasted from the single file, and the data isn't actually the same size.

@@ -31,7 +31,7 @@ cat queries.sql | while read -r query; do
# 2. each query contains a "Query took xxx seconds", we just grep these 2 lines
# 3. use sed to take the second line
# 4. use awk to take the number we want
RES=`datafusion-cli -f $CREATE_SQL_FILE /tmp/query.sql 2>&1 | grep "Elapsed" |sed -n 2p | awk '{ print $2 }'`
RES=$(datafusion-cli -f $CREATE_SQL_FILE /tmp/query.sql 2>&1 | grep "Elapsed" |sed -n 2p | awk '{ print $2 }')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was weird, but the backticks just didn't work? this seems to work well

@AdamGS AdamGS changed the title Update results to datafusion 46 Update results to Datafusion 46 Apr 17, 2025
@pmcgleenon
Copy link
Contributor

@AdamGS I believe we can now remove the double escaping as per this note

@AdamGS AdamGS closed this Apr 19, 2025
@AdamGS AdamGS reopened this Apr 19, 2025
@AdamGS
Copy link
Contributor Author

AdamGS commented Apr 19, 2025

@pmcgleenon do you mean just doing

diff --git a/datafusion/queries.sql b/datafusion/queries.sql

-SELECT REGEXP_REPLACE("Referer", '^https?://(?:www\\.)?([^/]+)/.*$', '\\1') AS k, AVG(length("Referer")) AS l, COUNT(*) AS c, MIN("Referer") FROM hits WHERE "Referer" <> '' GROUP BY k HAVING COUNT(*) > 100000 ORDER BY l DESC LIMIT 25;
+SELECT REGEXP_REPLACE("Referer", '^https?://(?:www\.)?([^/]+)/.*$', '\1') AS k, AVG(length("Referer")) AS l, COUNT(*) AS c, MIN("Referer") FROM hits WHERE "Referer" <> '' GROUP BY k HAVING COUNT(*) > 100000 ORDER BY l DESC LIMIT 25;

@pmcgleenon
Copy link
Contributor

@pmcgleenon do you mean just doing

diff --git a/datafusion/queries.sql b/datafusion/queries.sql

-SELECT REGEXP_REPLACE("Referer", '^https?://(?:www\\.)?([^/]+)/.*$', '\\1') AS k, AVG(length("Referer")) AS l, COUNT(*) AS c, MIN("Referer") FROM hits WHERE "Referer" <> '' GROUP BY k HAVING COUNT(*) > 100000 ORDER BY l DESC LIMIT 25;
+SELECT REGEXP_REPLACE("Referer", '^https?://(?:www\.)?([^/]+)/.*$', '\1') AS k, AVG(length("Referer")) AS l, COUNT(*) AS c, MIN("Referer") FROM hits WHERE "Referer" <> '' GROUP BY k HAVING COUNT(*) > 100000 ORDER BY l DESC LIMIT 25;

Yes exactly that - the bug in the CLI that required adding the additional escape characters has been fixed. So they can be safely removed now

@AdamGS
Copy link
Contributor Author

AdamGS commented Apr 19, 2025

thanks for the heads up! fixed it and reran everything.

@rschu1ze rschu1ze merged commit 47429fb into ClickHouse:main Apr 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants