Skip to content

Add configuration for eliminating sort in subquery #15993

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from

Conversation

irenjj
Copy link
Contributor

@irenjj irenjj commented May 8, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added sql SQL Planner core Core DataFusion crate common Related to common crate labels May 8, 2025
@github-actions github-actions bot added documentation Improvements or additions to documentation sqllogictest SQL Logic Tests (.slt) labels May 9, 2025
@irenjj irenjj marked this pull request as ready for review May 9, 2025 03:14
Copy link
Contributor

@kosiew kosiew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few observation notes.

fn optimize_subquery_sort(plan: LogicalPlan) -> Result<Transformed<LogicalPlan>> {
fn optimize_subquery_sort(
plan: LogicalPlan,
enalbe_eliminate: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be

enable_eliminate: bool

@@ -241,7 +246,7 @@ fn optimize_subquery_sort(plan: LogicalPlan) -> Result<Transformed<LogicalPlan>>
}
match c {
LogicalPlan::Sort(s) => {
if !has_limit {
if !has_limit && enalbe_eliminate {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enable_eliminate

@@ -3385,6 +3387,7 @@ fn ident_normalization_parser_options_ident_normalization() -> ParserOptions {
map_varchar_to_utf8view: false,
enable_options_value_normalization: false,
collect_spans: false,
enable_eliminate_subquery_sort: true,
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be a good idea to add tests for

enable_eliminate_subquery_sort: false

too?

@berkaysynnada
Copy link
Contributor

Hi @irenjj. Thank you working on this, but did you try to set that config to false and run the tests?

I think people should see here how the plans are changed if we do so (perhaps you can open another PR for that -- I don't know if there is an easier way)

As I said in the discussion, I afraid our physical rules are not ready for this. we might either miss some sort eliminations, or possibly eliminate some subquery sorts even if config says opposite. Therefore; I was suggesting enabling this option from lower level to higher. If people still consent this, I couldn't say more but we should add some warnings at least.

@irenjj
Copy link
Contributor Author

irenjj commented May 9, 2025

Hi @irenjj. Thank you working on this, but did you try to set that config to false and run the tests?

I think people should see here how the plans are changed if we do so (perhaps you can open another PR for that -- I don't know if there is an easier way)

As I said in the discussion, I afraid our physical rules are not ready for this. we might either miss some sort eliminations, or possibly eliminate some subquery sorts even if config says opposite. Therefore; I was suggesting enabling this option from lower level to higher. If people still consent this, I couldn't say more but we should add some warnings at least.

Yep, I've tried to set config to false and it can forbid eliminating sort in subquery. I'll add a test.

If I approach this case practically, when "order by" clauses are given in subqueries: these are converted into SortExecs at somewhere in the plan. However, in enforce_sorting, we don't track ordering requirements through SortExecs directly (otherwise, we wouldn't be able to eliminate truly necessary SortExecs). Instead, we track the requirement by inserting OutputRequirementExec at the top of the plan, which corresponds to the global ordering - that is the ordering expected when an explicit ORDER BY is given in the outermost query.

If we decide to introduce a new config for this setting, we first need to improve the optimizer phase. Specifically, OutputRequirementExecs could be inserted into intermediate nodes in the plan, meaning the subplan beneath must guarantee to bring the required ordering at that point.
So, TLDR, unless we adapt the current enforce_sorting rule accordingly, we risk losing the ability to eliminate unnecessary SortExecs in some cases.

This is a very good approach. We can implement it later to replace the current approach of adding configuration

@irenjj
Copy link
Contributor Author

irenjj commented May 9, 2025

Thanks @kosiew and @berkaysynnada for your review, PTAL👀

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @irenjj
Please include also a test with CTE

with t as (select c1 from sink_table order by c2 nulls last) select * from t

@berkaysynnada
Copy link
Contributor

@irenjj I've set the default value of this config to "false", and auto-complete the slt tests. This is one of the changes:

image

as I said, even if the logical planner doesn't remove the Sort, it's still eliminated in the physical planning stage. I also thought it's so because of the count(), but if I replace it by another aggregate fn, a SortExec still doesn't emerge in the physical plan (I also consider subquery brings unique results, so remove the inner group by also to be sure). TLDR, we cannot disable subquery sort elimination by disabling it from the logical world only.

@irenjj
Copy link
Contributor Author

irenjj commented May 11, 2025

@irenjj I've set the default value of this config to "false", and auto-complete the slt tests. This is one of the changes:

image as I said, even if the logical planner doesn't remove the Sort, it's still eliminated in the physical planning stage. I also thought it's so because of the count(), but if I replace it by another aggregate fn, a SortExec still doesn't emerge in the physical plan (I also consider subquery brings unique results, so remove the inner group by also to be sure). TLDR, we cannot disable subquery sort elimination by disabling it from the logical world only.

That's right, we need to process this more elegantly.

@irenjj irenjj closed this May 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
common Related to common crate core Core DataFusion crate documentation Improvements or additions to documentation sql SQL Planner sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants