-
Couldn't load subscription status.
- Fork 13.8k
[FLINK-38511][table] Join that consumes cdc source without delete may be converted to delta join #27111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-38511][table] Join that consumes cdc source without delete may be converted to delta join #27111
Conversation
939312f to
1092702
Compare
1092702 to
c65f4d2
Compare
… be converted to delta join
c65f4d2 to
3d28928
Compare
| val expected = List("+I[1.0, 1, 2022-02-02T02:02:02, 1, 1.0, 2022-02-02T02:02:22]") | ||
|
|
||
| // could not optimize into delta join because join keys do not contain indexes strictly | ||
| assertThatThrownBy( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed these expected failed tests because these exceptions have been tested in DeltaJoinTest. And I hope the IT tests can purely test the dataset, not for plan.
| private static final String FIELD_NAME_RIGHT_JOIN_KEYS = "rightJoinKeys"; | ||
|
|
||
| private static final String FIELD_NAME_LOOKUP_RIGHT_TABLE_JOIN_SPEC = | ||
| "lookupRightTableJoinSpec"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand why the DELTA_JOIN_TRANSFORMATION here doesn't use deltaJoin but to use delta-join.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I'm not entirely sure either. 🧐 I referenced CommonExecLookupJoin and CommonExecSink, and I noticed that the transformations use "-" as separators, while the field names for JSON are in camel case. So, I decided to follow that convention as well.
...src/main/java/org/apache/flink/table/planner/plan/nodes/exec/stream/StreamExecDeltaJoin.java
Show resolved
Hide resolved
| return false; | ||
| } | ||
|
|
||
| // if this join output cdc records, the non-equiv condition must be applied on upsert key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this join output cdc records and has non-equiv condition, upsert key must contain non-equiv condition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides, I don't quite understand why there is such a restriction here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You raised a very important issue. You can refer FlinkChangelogModeInferenceProgram.scala to understand the problem better. The essence of the issue is whether we can allow the upstream to omit sending Update Before(UB) when a filter is present. We can break down the filtering condition into two scenarios: 1. Filtering only on upsert key columns, and 2. Filtering on non-upsert keys. The distinction between these two cases is that in scenario 1, we can enable further optimization for the upstream and avoid sending UB.
Currently, in the FlinkChangelogModeInferenceProgram, we only check the filter on the Calc node. However, there are similar filters present on the join node (which are divided into join keys and non-equivalent join conditions), and this aspect has been overlooked. To prevent expanding the scope of this pull request, I only ensure that a delta join will not be optimized here (since Delta Join cannot consume UB messages) without altering the existing logic for streaming join.
As for why we don't need to address the join key, that has already been handled here: FlinkChangelogModeInferenceProgram.scala.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, the comment here should emphasize that the columns in the non-equivalent condition must all come from the same set of upsert keys. Therefore, I will modify it to:
If this join outputs cdc records and has non-equiv condition, the reference columns in the non-equiv condition must come from the same set of upsert keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I did not reuse the logic of isNonUpsertKeyCondition from FlinkChangelogModeInferenceProgram is that it only considers a single upsert key (as seen in this line). In this case, any set of upsert keys can be applicable.
| } | ||
|
|
||
| @Test | ||
| def testFilterOnNonUpsertKeysAfterJoinWithCdcSourceWithoutDelete(): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we set deltaJoin to force and use assertThatThrownBy to display the error reason in the test for failing to convert to deltaJoin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, we are unable to provide detailed error logs to explain why optimization to Delta Join is not supported. For more information, please refer to FLINK-37954. As for now, I think it's acceptable to display the original Join in the plan instead of the DeltaJoin. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think it's indeed worth further improvement
|
Thanks for advancing the feature. Let me leave some comments |
6cf7fdf to
cc55f42
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing the comment. It LGTM.
What is the purpose of the change
Join with the following pattern can be converted to delta join:
Brief change log
Verifying this change
Tests are added to verify this pr
Does this pull request potentially affect one of the following parts:
@Public(Evolving): noDocumentation