Skip to content

Improve simplify_expressions rule #15735

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 19, 2025
Merged

Conversation

xudong963
Copy link
Member

Which issue does this PR close?

  • Closes #.

Rationale for this change

Before:

[2025-04-16T10:30:23Z DEBUG datafusion_optimizer::utils] Optimizer input (pass 0):
    Projection: t.a
      Filter: CAST(t.a AS Int64) BETWEEN Int64(3) AND Int64(3)
        TableScan: t
    
[2025-04-16T10:30:23Z DEBUG datafusion_optimizer::optimizer] Plan unchanged by optimizer rule 'eliminate_nested_union' (pass 0)
[2025-04-16T10:30:23Z DEBUG datafusion_optimizer::utils] simplify_expressions:
    Projection: t.a
      Filter: t.a = Int32(3)
        TableScan: t
    
...
[2025-04-16T10:30:23Z DEBUG datafusion_optimizer::utils] push_down_filter:
    Projection: t.a
      Filter: t.a = Int32(3)
        TableScan: t
    
...
[2025-04-16T10:30:23Z DEBUG datafusion_optimizer::utils] optimize_projections:
    Filter: t.a = Int32(3)
      TableScan: t projection=[a]
    
[2025-04-16T10:30:23Z DEBUG datafusion_optimizer::utils] Optimized plan (pass 0):
    Filter: t.a = Int32(3)
      TableScan: t projection=[a]
    
[2025-04-16T10:30:23Z DEBUG datafusion_optimizer::utils] Optimizer input (pass 1):
    Filter: t.a = Int32(3)
      TableScan: t projection=[a]
    
[2025-04-16T10:30:23Z DEBUG datafusion_optimizer::optimizer] Plan unchanged by optimizer rule 'eliminate_nested_union' (pass 1)
[2025-04-16T10:30:23Z DEBUG datafusion_optimizer::utils] simplify_expressions:
    Filter: t.a = Int32(3)
      TableScan: t projection=[a]
    
...

Now

[2025-04-16T10:46:09Z DEBUG datafusion_optimizer::utils] Optimizer input (pass 0):
    Projection: t.a
      Filter: CAST(t.a AS Int64) BETWEEN Int64(3) AND Int64(3)
        TableScan: t
    
[2025-04-16T10:46:09Z DEBUG datafusion_optimizer::optimizer] Plan unchanged by optimizer rule 'eliminate_nested_union' (pass 0)
[2025-04-16T10:46:09Z DEBUG datafusion_optimizer::utils] simplify_expressions:
    Projection: t.a
      Filter: t.a = Int32(3)
        TableScan: t
    
...
[2025-04-16T10:46:09Z DEBUG datafusion_optimizer::utils] push_down_filter:
    Projection: t.a
      Filter: t.a = Int32(3)
        TableScan: t
    
...
[2025-04-16T10:46:09Z DEBUG datafusion_optimizer::utils] optimize_projections:
    Filter: t.a = Int32(3)
      TableScan: t projection=[a]
    
[2025-04-16T10:46:09Z DEBUG datafusion_optimizer::utils] Optimized plan (pass 0):
    Filter: t.a = Int32(3)
      TableScan: t projection=[a]
    
[2025-04-16T10:46:09Z DEBUG datafusion_optimizer::utils] Optimizer input (pass 1):
    Filter: t.a = Int32(3)
      TableScan: t projection=[a]
    
[2025-04-16T10:46:09Z DEBUG datafusion_optimizer::optimizer] Plan unchanged by optimizer rule 'eliminate_nested_union' (pass 1)
[2025-04-16T10:46:09Z DEBUG datafusion_optimizer::optimizer] Plan unchanged by optimizer rule 'simplify_expressions' (pass 1)

What changes are included in this PR?

Check if expr is simplified, if not, return Transformed::no

Are these changes tested?

By existing tests.

For specific influence, it's difficult to check by test, so I put the log to check

Are there any user-facing changes?

@github-actions github-actions bot added the optimizer Optimizer rules label Apr 16, 2025
@jayzhan211
Copy link
Contributor

jayzhan211 commented Apr 16, 2025

I think what we need to do is return Transformed for simplify_with_cycle_count. Your change introduce additional clone

pub fn simplify_with_cycle_count(&self, mut expr: Expr) -> Result<(Expr, u32)> {
let mut simplifier = Simplifier::new(&self.info);
let mut const_evaluator = ConstEvaluator::try_new(self.info.execution_props())?;
let mut shorten_in_list_simplifier = ShortenInListSimplifier::new();
let mut guarantee_rewriter = GuaranteeRewriter::new(&self.guarantees);
if self.canonicalize {
expr = expr.rewrite(&mut Canonicalizer::new()).data()?
}
// Evaluating constants can enable new simplifications and
// simplifications can enable new constant evaluation
// see `Self::with_max_cycles`
let mut num_cycles = 0;
loop {
let Transformed {
data, transformed, ..
} = expr
.rewrite(&mut const_evaluator)?
.transform_data(|expr| expr.rewrite(&mut simplifier))?
.transform_data(|expr| expr.rewrite(&mut guarantee_rewriter))?;
expr = data;
num_cycles += 1;
if !transformed || num_cycles >= self.max_simplifier_cycles {
break;
}
}
// shorten inlist should be started after other inlist rules are applied
expr = expr.rewrite(&mut shorten_in_list_simplifier).data()?;
Ok((expr, num_cycles))
}

@xudong963
Copy link
Member Author

I think what we need to do is return Transformed for simplify_with_cycle_count. Your change introduce additional clone

pub fn simplify_with_cycle_count(&self, mut expr: Expr) -> Result<(Expr, u32)> {
let mut simplifier = Simplifier::new(&self.info);
let mut const_evaluator = ConstEvaluator::try_new(self.info.execution_props())?;
let mut shorten_in_list_simplifier = ShortenInListSimplifier::new();
let mut guarantee_rewriter = GuaranteeRewriter::new(&self.guarantees);
if self.canonicalize {
expr = expr.rewrite(&mut Canonicalizer::new()).data()?
}
// Evaluating constants can enable new simplifications and
// simplifications can enable new constant evaluation
// see `Self::with_max_cycles`
let mut num_cycles = 0;
loop {
let Transformed {
data, transformed, ..
} = expr
.rewrite(&mut const_evaluator)?
.transform_data(|expr| expr.rewrite(&mut simplifier))?
.transform_data(|expr| expr.rewrite(&mut guarantee_rewriter))?;
expr = data;
num_cycles += 1;
if !transformed || num_cycles >= self.max_simplifier_cycles {
break;
}
}
// shorten inlist should be started after other inlist rules are applied
expr = expr.rewrite(&mut shorten_in_list_simplifier).data()?;
Ok((expr, num_cycles))
}

Sounds good, will give a try

@github-actions github-actions bot added the core Core DataFusion crate label Apr 17, 2025
@xudong963 xudong963 force-pushed the improve_simplify_expr branch from ecb150c to 101b80a Compare April 17, 2025 03:34
@xudong963 xudong963 added the api change Changes the API exposed to users of the crate label Apr 17, 2025
@@ -188,7 +188,7 @@ impl<S: SimplifyInfo> ExprSimplifier<S> {
/// assert_eq!(expr, b_lt_2);
/// ```
pub fn simplify(&self, expr: Expr) -> Result<Expr> {
Ok(self.simplify_with_cycle_count(expr)?.0)
Ok(self.simplify_with_cycle_count_transformed(expr)?.0.data)
Copy link
Member Author

@xudong963 xudong963 Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only changed the simplify_with_cycle_count API, keep the simplify API to avoid two API changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe after we merge this one in, it would be worth considering changing the simplify API as well -- but I agree that would be good as a follow on pR

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @xudong963 -- this looks great to me. I'll try and run some planning benchmarks on this too

cc @erratic-pattern as you wrote some of this code initially

@@ -188,7 +188,7 @@ impl<S: SimplifyInfo> ExprSimplifier<S> {
/// assert_eq!(expr, b_lt_2);
/// ```
pub fn simplify(&self, expr: Expr) -> Result<Expr> {
Ok(self.simplify_with_cycle_count(expr)?.0)
Ok(self.simplify_with_cycle_count_transformed(expr)?.0.data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe after we merge this one in, it would be worth considering changing the simplify API as well -- but I agree that would be good as a follow on pR

@alamb
Copy link
Contributor

alamb commented Apr 17, 2025

BTW I tried to run the planning benchmarks to see if this made things better, but sadly I found a bug:

@alamb alamb merged commit 1d3c592 into apache:main Apr 19, 2025
27 checks passed
@alamb
Copy link
Contributor

alamb commented Apr 19, 2025

Thanks @xudong963

xudong963 added a commit to polygon-io/arrow-datafusion that referenced this pull request Apr 23, 2025
* Improve simplify_expressions rule

* address comments

* address comments
nirnayroy pushed a commit to nirnayroy/datafusion that referenced this pull request May 2, 2025
* Improve simplify_expressions rule

* address comments

* address comments
xudong963 added a commit to polygon-io/arrow-datafusion that referenced this pull request May 7, 2025
* Improve simplify_expressions rule

* address comments

* address comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api change Changes the API exposed to users of the crate core Core DataFusion crate optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants