-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Add configuration for eliminating sort in subquery #15993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few observation notes.
datafusion/sql/src/relation/mod.rs
Outdated
fn optimize_subquery_sort(plan: LogicalPlan) -> Result<Transformed<LogicalPlan>> { | ||
fn optimize_subquery_sort( | ||
plan: LogicalPlan, | ||
enalbe_eliminate: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be
enable_eliminate: bool
datafusion/sql/src/relation/mod.rs
Outdated
@@ -241,7 +246,7 @@ fn optimize_subquery_sort(plan: LogicalPlan) -> Result<Transformed<LogicalPlan>> | |||
} | |||
match c { | |||
LogicalPlan::Sort(s) => { | |||
if !has_limit { | |||
if !has_limit && enalbe_eliminate { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
enable_eliminate
@@ -3385,6 +3387,7 @@ fn ident_normalization_parser_options_ident_normalization() -> ParserOptions { | |||
map_varchar_to_utf8view: false, | |||
enable_options_value_normalization: false, | |||
collect_spans: false, | |||
enable_eliminate_subquery_sort: true, | |||
} | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be a good idea to add tests for
enable_eliminate_subquery_sort: false
too?
Hi @irenjj. Thank you working on this, but did you try to set that config to false and run the tests? I think people should see here how the plans are changed if we do so (perhaps you can open another PR for that -- I don't know if there is an easier way) As I said in the discussion, I afraid our physical rules are not ready for this. we might either miss some sort eliminations, or possibly eliminate some subquery sorts even if config says opposite. Therefore; I was suggesting enabling this option from lower level to higher. If people still consent this, I couldn't say more but we should add some warnings at least. |
Yep, I've tried to set config to false and it can forbid eliminating sort in subquery. I'll add a test.
This is a very good approach. We can implement it later to replace the current approach of adding configuration |
Thanks @kosiew and @berkaysynnada for your review, PTAL👀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @irenjj
Please include also a test with CTE
with t as (select c1 from sink_table order by c2 nulls last) select * from t
@irenjj I've set the default value of this config to "false", and auto-complete the slt tests. This is one of the changes: ![]() as I said, even if the logical planner doesn't remove the Sort, it's still eliminated in the physical planning stage. I also thought it's so because of the count(), but if I replace it by another aggregate fn, a SortExec still doesn't emerge in the physical plan (I also consider subquery brings unique results, so remove the inner group by also to be sure). TLDR, we cannot disable subquery sort elimination by disabling it from the logical world only. |
That's right, we need to process this more elegantly. |
Which issue does this PR close?
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?