-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(substrait): Do not add implicit groupBy expressions when building logical plans from Substrait #14553
base: main
Are you sure you want to change the base?
Conversation
…. Do not implicitly add any expressions when building the LogicalPlan.
a4030e9
to
cc0fee8
Compare
self._aggregate(group_expr, aggr_expr, false) | ||
} | ||
|
||
fn _aggregate( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super new to Rust -- is this an okay / conventional way to name private helpers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's need for _
since the function is already private (by virtue of not being pub fn
). Something like aggregate_inner
I think is used quite a lot.
Alternatively, given the logicalplanbuilder for aggregate doesn't do that much, we could also just inline it into the substrait consumer. That way it's not changing the LogicalPlanBuilder api, which might be easier.
Or maybe this whole add_group_by_exprs_from_dependencies
thing should move from the plan builder into the analyzer/optimizer? Intuitively it feels like the constructed logical plan shouldn't do this kind of magic, but the analyzer/optimizer can if it makes things faster to execute. But that might be a bigger undertaking, so I'd be quite fine with this PR or the alternative above first.
@@ -300,6 +300,17 @@ async fn aggregate_grouping_rollup() -> Result<()> { | |||
).await | |||
} | |||
|
|||
#[tokio::test] | |||
async fn multilayer_aggregate() -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was this succeeding also before? (I'd guess so as it'd add the extra groupbys but take that into consideration while producing the plan, is that right?)
if include_implicit_group_by_exprs { | ||
group_expr = | ||
add_group_by_exprs_from_dependencies(group_expr, self.plan.schema())?; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: style-wise I'd prefer keeping group_expr
non-mut and doing something like:
if include_implicit_group_by_exprs { | |
group_expr = | |
add_group_by_exprs_from_dependencies(group_expr, self.plan.schema())?; | |
} | |
let group_expr = if include_implicit_group_by_exprs { | |
group_expr = | |
add_group_by_exprs_from_dependencies(group_expr, self.plan.schema())?; | |
} else { | |
group_exrp | |
}; |
Thanks, seems like a clear enough bug, appreciate both the report and the PR to fix it! |
Which issue does this PR close?
Closes #14348
Rationale for this change
Substrait plans are intended to be interpreted literally. When you see plan nodes like:
The output mapping (e.g.
[0, 3]
) contains ordinals representing the offset of the target expression(s) within the [input, output] list. If the DataFusion LogicalPlanBuilder is introducing additional input expressions, this violates the plan's intent and will produce the incorrect output mappings. Please see the issue for a concrete example.What changes are included in this PR?
In the Substrait path, do not add additional grouping expressions derived from functional dependencies.
Are these changes tested?
Added a multilayer aggregation Substrait example. The first aggregation produces a unique column with a functional dependency. Despite this, the second aggregation must not introduce any additional grouping expressions.
There should be no changes in the non-Substrait path.
Are there any user-facing changes?
No.