Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize coalesce of join keys #23603

Open
kaikalur opened this issue Sep 6, 2024 · 1 comment
Open

Optimize coalesce of join keys #23603

kaikalur opened this issue Sep 6, 2024 · 1 comment

Comments

@kaikalur
Copy link
Contributor

kaikalur commented Sep 6, 2024

We have seen queries like:

COALESCE(l.id, r.id) FROM l INNER/LEFT JOIN r ON l.id=r.id

Here for both inner ad left joins we can simplify it to l.id since it's a join key. Similarly for the converse for right join. We should apply this rule before join reordering

@aaneja
Copy link
Contributor

aaneja commented Sep 23, 2024

This already works for INNER JOINs -

presto:tiny> explain select COALESCE(l.nationkey, r.regionkey) FROM nation l inner join region r on l.nationkey=r.regionkey;
                                                                                                                                                                                    Query Plan                                                                                          >
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------->
 - Output[PlanNodeId 9][_col0] => [nationkey:bigint]                                                                                                                                                                                                                                    >
         Estimates: {source: CostBasedSourceInfo, rows: 5 (45B), cpu: 630.00, memory: 45.00, network: 90.00}                                                                                                                                                                            >
         _col0 := nationkey (1:16)                                                                                                                                                                                                                                                      >
     - RemoteStreamingExchange[PlanNodeId 308][GATHER] => [nationkey:bigint]                                                                                                                                                                                                            >
             Estimates: {source: CostBasedSourceInfo, rows: 5 (45B), cpu: 630.00, memory: 45.00, network: 90.00}                                                                                                                                                                        >
         - InnerJoin[PlanNodeId 286][("nationkey" = "regionkey_0")] => [nationkey:bigint]                                                                                                                                                                                               >
                 Estimates: {source: CostBasedSourceInfo, rows: 5 (45B), cpu: 630.00, memory: 45.00, network: 45.00}                                                                                                                                                                    >
                 Distribution: REPLICATED                                                                                                                                                                                                                                               >
             - TableScan[PlanNodeId 0][TableHandle {connectorId='tpch', connectorHandle='nation:sf0.01', layout='Optional[nation:sf0.01]'}] => [nationkey:bigint]                                                                                                                       >
                     Estimates: {source: CostBasedSourceInfo, rows: 25 (225B), cpu: 225.00, memory: 0.00, network: 0.00}                                                                                                                                                                >
                     nationkey := tpch:nationkey (1:56)                                                                                                                                                                                                                                 >
             - LocalExchange[PlanNodeId 334][HASH] (regionkey_0) => [regionkey_0:bigint]                                                                                                                                                                                                >
                     Estimates: {source: CostBasedSourceInfo, rows: 5 (45B), cpu: 90.00, memory: 0.00, network: 45.00}                                                                                                                                                                  >
                 - RemoteStreamingExchange[PlanNodeId 307][REPLICATE] => [regionkey_0:bigint]                                                                                                                                                                                           >
                         Estimates: {source: CostBasedSourceInfo, rows: 5 (45B), cpu: 45.00, memory: 0.00, network: 45.00}                                                                                                                                                              >
                     - TableScan[PlanNodeId 1][TableHandle {connectorId='tpch', connectorHandle='region:sf0.01', layout='Optional[region:sf0.01]'}] => [regionkey_0:bigint]                                                                                                             >
                             Estimates: {source: CostBasedSourceInfo, rows: 5 (45B), cpu: 45.00, memory: 0.00, network: 0.00}                                                                                                                                                           >
                             regionkey_0 := tpch:regionkey (1:76) 

For outer joins, we would need to set the joins_not_null_inference_strategy to a value other than NONE, so that we add NOT NULL filters on the outer side of the join.
Once this is set, we could optimize out the COALESCE

@auden-woolfson auden-woolfson self-assigned this Oct 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 🆕 Unprioritized
Development

No branches or pull requests

3 participants