Skip to content

Conversation

@feiniaofeiafei
Copy link
Contributor

picked from #58372

…che#58372)

Related PR: apache#41731

Problem Summary:

The optimizer cannot derive predicates across multiple LEFT JOINs. For
example, given a filter on the leftmost table in a chain of LEFT JOINs,
the optimizer should be able to derive predicates on the rightmost
table, but it currently fails to do so.

create table t1(a int, b int);
create table t2(a int, b int);
create table t3(a int, b int);

insert into t1 values(1,2);
insert into t2 values(1,2);
insert into t3 values(1,2);
insert into t3 values(null,2);

explain logical plan
select * from t1 left join t2 on t1.a=t2.a left join t3 on t2.a=t3.a where t1.a=1;

LogicalResultSink[110] ( outputExprs=[a#0, b#1, a#2, b#3, a#4, b#5] )
+--LogicalProject[109] ( distinct=false, projects=[a#0, b#1, a#2, b#3, a#4, b#5] )
   +--LogicalJoin[108] ( type=LEFT_OUTER_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(a#2 = a#4)], otherJoinConjuncts=[], markJoinConjuncts=[] )
      |--LogicalProject[105] ( distinct=false, projects=[a#0, b#1, a#2, b#3] )
      |  +--LogicalJoin[104] ( type=LEFT_OUTER_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(a#0 = a#2)], otherJoinConjuncts=[], markJoinConjuncts=[] )
      |     |--LogicalFilter[101] ( predicates=(a#0 = 1) )
      |     |  +--LogicalOlapScan ( qualified=internal.maldb.t1, indexName=<index_not_selected>, selectedIndexId=1764043369852, preAgg=ON, operativeCol=[a#0], virtualColumns=[] )
      |     +--LogicalFilter[103] ( predicates=(a#2 = 1) )
      |        +--LogicalOlapScan ( qualified=internal.maldb.t2, indexName=<index_not_selected>, selectedIndexId=1764043369875, preAgg=ON, operativeCol=[a#2], virtualColumns=[] )
     +--LogicalOlapScan ( qualified=internal.maldb.t3, indexName=<index_not_selected>, selectedIndexId=1764043369898, preAgg=ON, operativeCol=[a#4], virtualColumns=[] )

The optimizer should derive t3.a=1 from t1.a=1 and the join conditions,
but it currently doesn't.

The root cause is that the PullUpPredicates rule doesn't properly handle
predicate pull-up from the right side of LEFT JOINs. This PR fixes this
by generating null-tolerant predicates when pulling up from RIGHT JOIN's
right table and strengthening them when possible based on upper-level
join conditions.
after this pr:

LogicalResultSink[110] ( outputExprs=[a#0, b#1, a#2, b#3, a#4, b#5] )
+--LogicalProject[109] ( distinct=false, projects=[a#0, b#1, a#2, b#3, a#4, b#5] )
   +--LogicalJoin[108] ( type=LEFT_OUTER_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(a#2 = a#4)], otherJoinConjuncts=[], markJoinConjuncts=[] )
      |--LogicalProject[105] ( distinct=false, projects=[a#0, b#1, a#2, b#3] )
      |  +--LogicalJoin[104] ( type=LEFT_OUTER_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(a#0 = a#2)], otherJoinConjuncts=[], markJoinConjuncts=[] )
      |     |--LogicalFilter[101] ( predicates=(a#0 = 1) )
      |     |  +--LogicalOlapScan ( qualified=internal.maldb.t1, indexName=<index_not_selected>, selectedIndexId=1764043369852, preAgg=ON, operativeCol=[a#0], virtualColumns=[] )
      |     +--LogicalFilter[103] ( predicates=(a#2 = 1) )
      |        +--LogicalOlapScan ( qualified=internal.maldb.t2, indexName=<index_not_selected>, selectedIndexId=1764043369875, preAgg=ON, operativeCol=[a#2], virtualColumns=[] )
      +--LogicalFilter[107] ( predicates=(a#4 = 1) )
         +--LogicalOlapScan ( qualified=internal.maldb.t3, indexName=<index_not_selected>, selectedIndexId=1764043369898, preAgg=ON, operativeCol=[a#4], virtualColumns=[] )
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@feiniaofeiafei
Copy link
Contributor Author

run buildall

@feiniaofeiafei
Copy link
Contributor Author

run buildall

@feiniaofeiafei
Copy link
Contributor Author

run buildall

@morrySnow morrySnow merged commit d604b45 into apache:branch-3.1 Dec 4, 2025
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants