Skip to content

Specialize filterCompetitiveHits when have exact 2 clauses #14827

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

HUSTERGS
Copy link
Contributor

Description

This PR propose to specialize function filterCompetitiveHits when we have exact 2 scorers, in order to reduce float calculation and potential function calls

Luceneutil result on wikimediumall with searchConcurrency=0, taskCountPerCat=5, taskRepeatCount=50 after 20 iterations

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                      OrHighRare      116.59      (3.1%)      116.96      (2.7%)    0.3% (  -5% -    6%) 0.734
                       OrHighMed       87.75      (2.4%)       88.83      (2.6%)    1.2% (  -3% -    6%) 0.116
                      AndHighMed       67.91      (2.3%)       69.17      (2.2%)    1.9% (  -2% -    6%) 0.009
                     AndHighHigh       27.96      (1.4%)       28.63      (2.0%)    2.4% (  -1% -    5%) 0.000
                      OrHighHigh       26.16      (1.6%)       26.97      (1.6%)    3.1% (   0% -    6%) 0.000

Copy link

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

@HUSTERGS
Copy link
Contributor Author

For what it's worth, the reason of this PR is that I find filterCompetitiveHits ocuppied about 13% of flamegraph on OrHighHigh query,
image

Also, filterCompetitiveHits calls MathUtil.sumUpperBound in a loop, seems repeatly calculate MathUtil.sumRelativeErrorBound(numValues), (numValues is constant within the loop), I tried to optimize this, but it shows no performance difference, maybe filterCompetitiveHits is no longer the bottleneck when numValues > 2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant