perf(optimizer): generic in-place mutable rewrite to obsolete per-rule fast-paths#22728
perf(optimizer): generic in-place mutable rewrite to obsolete per-rule fast-paths#22728zhuqi-lucas wants to merge 4 commits into
Conversation
Generalizes the in-place `Arc::make_mut` traversal adriangb landed in apache#22298 so all logical-optimizer rules — not just the no-subqueries fast-path — get the cheaper recursion. Closes apache#22616. Two pieces: 1. `map_children_and_subqueries_mut` — extends `map_children_mut` with a subquery-plan descent. Direct children stay on the in-place path; subqueries borrow the plan briefly via `mem::take` to reach the ownership-based `LogicalPlan::map_subqueries`, whose per-node `has_subquery_expressions` fast-path keeps subquery-free nodes allocation-free. With this in place, `rewrite_plan_in_place` no longer needs the `plan_has_subqueries` gate the driver was using — plans with subqueries can now share the in-place recursion. Removes `plan_has_subqueries`, the `TreeNodeRewriter`-based `Rewriter` shim, and the `has_subqueries ? owned : in_place` branch in `Optimizer::optimize_internal`. 2. `rewrite_children_in_place` — in-place equivalent of the `map_uncorrelated_subqueries` + `map_children` pattern that rules with `apply_order = None` use to drive their own recursion. `EliminateCrossJoin` and `CommonSubexprEliminate` are the two such rules in-tree; both convert. The net effect: when a rule returns `Transformed::no` for a child, the parent `LogicalPlan` variant is no longer destructured-and-rebuilt at every recursive layer — the child's `Arc` is reused via `Arc::make_mut`. Profile-guided: sql_planner profiling on `logical_plan_tpch_all` / `optimizer_tpch_all` showed ~17% of active CPU in `Vec/Box/Arc TreeNodeContainer` walking; this path is what generates that load. Net diff: -28 lines (the `Rewriter` shim + `plan_has_subqueries` together outweigh the two new helpers). 711 optimizer unit tests still pass. SLT broad sweep — subquery / cse / joins / predicates / aggregate / tpch (16 files) — all pass. `cargo clippy -p datafusion-optimizer --all-targets -- -D warnings` clean.
The new `map_children_and_subqueries_mut` / `rewrite_children_in_place` helpers were only covered through the existing 711 optimizer tests + SLT sweep — fine for behaviour but not great for pinning each helper's contract on its own. Five targeted tests: - `map_children_and_subqueries_mut_no_children` — no children + no subqueries returns `Ok(false)` and never invokes `f`. - `map_children_and_subqueries_mut_walks_direct_children` — `Filter`'s `TableScan` input is visited exactly once. - `map_children_and_subqueries_mut_descends_into_subquery` — pins the subquery descent that `map_children_mut` alone is missing: `Filter(IN (SELECT ...))` visits the outer `TableScan` *and* the subquery plan inside the predicate. - `map_children_and_subqueries_mut_reports_changes` — closure returning `true` propagates as `changed == true`. - `rewrite_children_in_place_drives_rule_recursion` — bridges `&mut LogicalPlan` callback to ownership-based `OptimizerRule::rewrite` API and visits each child exactly once with a recording rule.
3ae2c6d to
cc54f98
Compare
|
run benchmark sql_planner |
|
🤖 Criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing perf/in-place-mutable-rewrite (cc54f98) to 7bf54db (merge-base) diff File an issue against this benchmark runner |
|
🤖 Criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
…sion Per-node has_subquery_expressions walks were dominating optimizer cost on subquery-free plans (the common case for join-heavy workloads). Bench showed optimizer_join_chain_4 +12%, optimizer_tpch_all +3%, even though logical_plan_tpch_all (full pipeline) was -5%. Restore the one-shot whole-plan plan_has_subqueries check at the driver, propagate has_subqueries: bool through rewrite_plan_in_place, and dispatch to map_children_mut (no subquery descent) when false. For the ApplyOrder::None path, also add a per-node fast-path inside map_children_and_subqueries_mut to skip the mem::take + map_subqueries roundtrip when the node carries no subquery expressions. Make LogicalPlan::has_subquery_expressions pub so the optimizer can reuse it without duplicating the walk. Also fixes the cargo doc -D warnings CI break (unresolved intra-doc link to TreeNode::rewrite).
|
run benchmark sql_planner |
|
🤖 Criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing perf/in-place-mutable-rewrite (02748ae) to e71bd56 (merge-base) diff File an issue against this benchmark runner |
|
🤖 Criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
Which issue does this PR close?
Closes #22616.
Rationale for this change
#22298 added an in-place
Arc::make_muttraversal for the optimizer driver, but only used it when the plan had no subqueries — plans with subqueries fell back to the ownership-based path that destructures and rebuilds eachLogicalPlanvariant at every recursive layer. TwoApplyOrder::Nonerules (EliminateCrossJoin,CommonSubexprEliminate) also never got the in-place treatment.Profiling
sql_planner(pre-#22298) showed a non-trivial share of CPU in those ownership-based tree walks. Most of that has already been collected by #22298 — this PR closes the remaining two gaps.Scope
This PR converts:
rewrite_plan_in_place) — subquery-bearing plans now go through the in-place path too, no more fallback to owned recursion.EliminateCrossJoinandCommonSubexprEliminate— the twoApplyOrder::Nonerules with the simplemap_uncorrelated_subqueries+map_childrenpattern.Not converted (left for future work — each has its own non-uniform recursion shape that doesn't fit the generic helper):
OptimizeProjections— required-columns context plumbed top-down.RewriteSetComparison— restructures expressions, introduces aliases.UnionsToFilter— shape-dependent rewrites on Union nodes.Where the gain actually comes from
Of the two
ApplyOrder::Nonerules converted:CommonSubexprEliminateis where most of the value lands. It has no plan-level fast-path: every plan walks from root to leaf throughrewrite_children, including on plans where the rule's own targeted optimization has nothing to do at any node (Join,Repartition,Union,TableScan,Limit, ...). The in-place conversion saves the per-layer destructure/rebuild on every such walk.EliminateCrossJoinis marginal in practice: theplan_has_joinsfast-path landed in perf(optimizer): EliminateCrossJoin fast-path for join-free plans #22612 short-circuits the rule entirely on join-free plans, and on plans with joins the rule's main flattening logic usesArc::unwrap_or_clonedirectly rather thanrewrite_children.rewrite_childrenonly fires on the wrapping non-join layers (e.g. an outerProjectionabove a join chain) — small but non-zero.The third source of gain is the TopDown/BottomUp path: previously, any plan with even one subquery expression fell back to owned recursion for all ~17 rules in the pipeline. Now those plans stay on the in-place path.
What changes are included in this PR?
Two in-place helpers so all converted rules share the
Arc::make_mutpath:map_children_and_subqueries_mut— extendsmap_children_mutto also descend into subquery plans referenced from expressions (Expr::ScalarSubquery,InSubquery,Exists,SetComparison).rewrite_children_in_place— in-place equivalent of themap_uncorrelated_subqueries+map_childrenpattern.EliminateCrossJoinandCommonSubexprEliminateconvert to it.When a rule returns
Transformed::no, the parentLogicalPlanvariant is no longer destructured-and-rebuilt at every recursive layer.Subquery descent is gated by a one-shot whole-plan
plan_has_subqueriescheck per rule application: subquery-free plans (the common case for join-heavy workloads) fall through to plainmap_children_mutand pay zero subquery overhead — same shape as the #22298 fast path. Plans with subqueries get the new unified in-place recursion. TheApplyOrder::Nonepath additionally has a per-node fast-path insidemap_children_and_subqueries_mutto skip themem::take+map_subqueriesroundtrip when a node carries no subquery expressions.Are these changes tested?
subquery/cse/joins/predicates/aggregate/tpch(16 files) — all pass.Are there any user-facing changes?
No. Pure perf, no new APIs / config knobs.
LogicalPlan::has_subquery_expressionsis nowpubso the optimizer can call it without duplicating the walker.