Help with Cluster optimization for a 150 node Trino 457 cluster running on r6g.16xlarge #24817
soham-dasgupta
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi Team, I am looking for some help with optimizing our Trino(457) cluster running on emr-7.6.0. Our main use case is to read from Glue backed by s3 , transform and write it back to S3 using CTAS. I am trying to benchmark the below cluster setup by using a CTAS query that joins two tables on have 61 billion records and the other 58 billion records
Here is coordinator config -
Here is the catalog configuration
I am trying to benchmark the cluster against this fairly complex query to find out levers that I can pull to optimize
Link to query https://pastecode.io/s/xfeai33m
Link to query plan https://pastecode.io/s/rj9nx0hh
Count of rows
stg_dim_ad_group 61153245481
fact_sa_ad_group_dly 58372216116
Beta Was this translation helpful? Give feedback.
All reactions