HIVE-29627 IS_NULL selectivity improvement with stats#6553
Open
cyanzheng2926 wants to merge 6 commits into
Open
HIVE-29627 IS_NULL selectivity improvement with stats#6553cyanzheng2926 wants to merge 6 commits into
cyanzheng2926 wants to merge 6 commits into
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Improves Hive’s Calcite-based filter selectivity estimation for IS NULL predicates by using existing column null-count statistics when available, falling back to the default selectivity when stats are missing/estimated.
Changes:
- Add dedicated
IS_NULLselectivity handling inFilterSelectivityEstimatorthat usesnumNullsstats forHiveTableScan. - Add helper logic to detect missing/estimated column stats and fall back to
DEFAULT_COMPARISON_SELECTIVITY. - Add unit tests covering
IS NULLselectivity with valid stats, missing stats, and estimated-stats fallback.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/FilterSelectivityEstimator.java |
Adds IS_NULL selectivity computation using column null statistics plus a missing/estimated-stats fallback helper. |
ql/src/test/org/apache/hadoop/hive/ql/optimizer/calcite/stats/TestFilterSelectivityEstimator.java |
Adds unit tests validating IS NULL selectivity behavior under different stats availability scenarios. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ange & refactored based on commit review & added null test
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



What changes were proposed in this pull request?
This PR is for issues.apache.org/jira/browse/HIVE-29627 which tries to improve the IS_NULL selectivity with exisiting statistics.
Why are the changes needed?
To improve the selectivity of IS_NULL with stats instead of falling to the default path in the switch statement.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Added unit tests under TestFilterSelectivityEstimator.java