Skip to content

Conversation

@xuyangzhong
Copy link
Contributor

@xuyangzhong xuyangzhong commented Oct 14, 2025

What is the purpose of the change

Add restore tests for delta join.

Brief change log

  • add index for SourceTestStep
  • add registeredDataForFullStage in TestValuesTableFactory
  • add restore tests for delta join

Verifying this change

New tests and existent tests can verify this pr.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented?

@flinkbot
Copy link
Collaborator

flinkbot commented Oct 14, 2025

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@xuyangzhong xuyangzhong force-pushed the delta_join_restore_test branch from b590638 to d899471 Compare October 27, 2025 12:54
@xuyangzhong xuyangzhong marked this pull request as ready for review October 27, 2025 12:55
@xuyangzhong xuyangzhong force-pushed the delta_join_restore_test branch 2 times, most recently from f187951 to 7026c3c Compare October 30, 2025 02:10
@xuyangzhong xuyangzhong force-pushed the delta_join_restore_test branch from 7026c3c to d2c0332 Compare October 30, 2025 06:57
data.addAll(expectedAfterRestore);
}

Map<List<Object>, Row> deduplicatedMap = new HashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic in this section, which uses deduplicatedFieldIndices to modify expectedBeforeRestore and expectedAfterRestore, can also be implemented by calling a function in getExpectedBeforeRestoreAsStrings and getExpectedAfterRestoreAsStrings. Of course, it's up to you, but I personally find the latter easier to understand

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, for expectedBeforeRestoreStrings and expectedAfterRestoreStrings, which have transformed each row into a string and lost their schema, it is not possible to deduplicate them by column. (Perhaps one way to handle this is by removing the leading and trailing parentheses and splitting the strings by commas).
However, for now, there is no need to support this in string scenarios for test, so I haven't implemented it yet.

// The difference between registeredDataForFullStage and `registeredData` is that
// `registeredData` is used for data delivered from the source to downstream, while the rows in
// `registeredDataForFullStage` will not be sent to downstream and are only used for lookup.
private static final Map<String, Collection<Row>> registeredDataForFullStage = new HashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why it's named registeredDataForFullStage instead of ForCurrentStage or ForCurrentTable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have renamed it to registeredConsumedData.

Different with registeredRowData, the registeredConsumedData will not be re-consumed by the source to be sent to downstream operators.

@Au-Miner
Copy link
Contributor

Thanks for advancing the feature. Let me leave some comments!

Copy link
Contributor

@Au-Miner Au-Miner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks.

@github-actions github-actions bot added the community-reviewed PR has been reviewed by the community. label Oct 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-reviewed PR has been reviewed by the community.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants