Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Update the filterInput when flatten the right projections #11982

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

wankunde
Copy link

@wankunde wankunde commented Dec 29, 2024

What changes were proposed in this pull request?

Bug fix for MergeJoin which the join filter is not null. For example, in the tpcds 95, the subquery ws_wh contains a join filter ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk

SELECT
    ws1.ws_order_number,
    ws1.ws_warehouse_sk wh1,
    ws2.ws_warehouse_sk wh2
  FROM web_sales ws1, web_sales ws2
  WHERE ws1.ws_order_number = ws2.ws_order_number
    AND ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk

How to reproduce this issue:

In tpc95 query, if the right matches contains more than one batch, and these batches are DictionaryVector.
When we addOutputRow() from those batches, SMJ join will only copy the Dictionary indices.
And then we decode the incorrect value from the last DictionaryVector.

image

void copyRow(
    const RowVectorPtr& source,
    vector_size_t sourceIndex,
    const RowVectorPtr& target,
    vector_size_t targetIndex,
    const std::vector<IdentityProjection>& projections) {
  for (const auto& projection : projections) {
    const auto& sourceChild = source->childAt(projection.inputChannel);
    const auto& targetChild = target->childAt(projection.outputChannel);
    targetChild->copy(sourceChild.get(), targetIndex, sourceIndex, 1);
  }
}

Why are the changes needed?

Fix data quality issue.

How was this patch tested?

Test with TPCDS 95 query and local query

SELECT
    ws1.ws_order_number,
    ws1.ws_warehouse_sk wh1,
    ws2.ws_warehouse_sk wh2,
    ws2.ws_warehouse_sk - ws1.ws_warehouse_sk as diff
  FROM web_sales ws1, web_sales ws2
  WHERE ws1.ws_order_number = ws2.ws_order_number
    AND ws1.ws_order_number is not null
    AND ws2.ws_order_number is not null
    AND ws1.ws_warehouse_sk - ws2.ws_warehouse_sk = 0;

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 29, 2024
Copy link

netlify bot commented Dec 29, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit c9c651a
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/67728f55e95d5b00080ad29d

@wankunde wankunde changed the title Force output rows for new right batch Force output rows for new right batch in MergeJoin Dec 29, 2024
@wankunde wankunde changed the title Force output rows for new right batch in MergeJoin fix: Force output rows for new right batch in MergeJoin Dec 29, 2024
@JkSelf
Copy link
Collaborator

JkSelf commented Dec 30, 2024

@wankunde Thank you for addressing this issue. It is a known problem and can be resolved in #11771

@wankunde
Copy link
Author

@wankunde Thank you for addressing this issue. It is a known problem and can be resolved in #11771

Hi, @JkSelf , in #11771 only semi join and anti join are fixed with JoinTracker, but inner join with a filter has not been fixed.
In my case, the filter exprSet will eval on the incorrect input DictionaryVector.

@wankunde wankunde changed the title fix: Force output rows for new right batch in MergeJoin fix: Update the filterInput when flatten the right projections Dec 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants