Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up top-k retrieval on filtered conjunctions. #13994

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

jpountz
Copy link
Contributor

@jpountz jpountz commented Nov 14, 2024

A while back we added an optimized bulk scorer that implements block-max AND, this yielded a good speedup on nightly benchmarks, see annotation FP at https://benchmarks.mikemccandless.com/AndHighHigh.html. With this PR, filtered conjunctions now also run through this optimized bulk scorer by doing two things:

  • It flattens inner conjunctions. This makes queries initially written as something like +(+term1 +term2) #filter rewritten to +term1 +term2 #filter.
  • It evaluates queries that have a mix of MUST and FILTER clauses evaluated through BlockMaxConjunctionBulkScorer by treating FILTER clauses as scoring clauses that produce a score of 0.

A while back we added an optimized bulk scorer that implements block-max AND,
this yielded a good speedup on nightly benchmarks, see annotation `FP` at
https://benchmarks.mikemccandless.com/AndHighHigh.html. With this PR, filtered
conjunctions now also run through this optimized bulk scorer by doing two
things:
 - It flattens inner conjunctions. This makes queries initially written as
   something like `+(+term1 +term2) #filter` rewritten to
   `+term1 +term2 #filter`.
 - It evaluates queries that have a mix of MUST and FILTER clauses evaluated
   through `BlockMaxConjunctionBulkScorer` by treating FILTER clauses as
   scoring clauses that produce a score of 0.
@jpountz jpountz added this to the 10.1.0 milestone Nov 14, 2024
@jpountz
Copy link
Contributor Author

jpountz commented Nov 14, 2024

I ran luceneutil on wikibigall and the new filtered tasks from wikinightly.tasks (https://github.com/mikemccand/luceneutil/blob/main/tasks/wikinightly.tasks#L355-L390):

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
              FilteredOrHighHigh       47.29      (2.0%)       46.59      (3.1%)   -1.5% (  -6% -    3%) 0.070
               FilteredOrHighMed      103.10      (1.5%)      102.05      (2.9%)   -1.0% (  -5% -    3%) 0.165
                    FilteredTerm      157.16      (1.9%)      155.86      (2.4%)   -0.8% (  -5% -    3%) 0.225
                        PKLookup      267.05      (3.6%)      265.36      (4.1%)   -0.6% (  -8% -    7%) 0.603
                  FilteredPhrase       24.52      (2.1%)       24.55      (2.7%)    0.1% (  -4% -    5%) 0.894
              FilteredAndHighMed       99.47      (1.2%)      131.63      (2.9%)   32.3% (  27% -   36%) 0.000
             FilteredAndHighHigh       44.55      (1.5%)       66.41      (1.7%)   49.1% (  45% -   53%) 0.000

Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inlining and optimization makes sense to me. Minor thoughts around naming things differently so that the flattening logic is easier to follow.

Comment on lines 566 to 589
for (BooleanClause clause : clauses) {
if (clause.isRequired() && clause.query() instanceof BooleanQuery innerQuery) {
if (innerQuery.getMinimumNumberShouldMatch() == 0
&& innerQuery.getClauses(Occur.SHOULD).isEmpty()) {
actuallyRewritten = true;
for (BooleanClause innerClause : innerQuery) {
Occur occur = innerClause.occur();
if (occur == Occur.FILTER
|| occur == Occur.MUST_NOT
|| clause.occur() == Occur.MUST) {
builder.add(innerClause);
} else {
assert clause.occur() == Occur.FILTER && occur == Occur.MUST;
// In this case we need to change the occur of the inner query from MUST to FILTER.
builder.add(innerClause.query(), Occur.FILTER);
}
}
} else {
builder.add(clause);
}
} else {
builder.add(clause);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Holy smokes I had to read this like 5 times to fully grok what it is doing.

Could you name occur to innerClauseOccur and change clause to outerClause?

@benwtrent
Copy link
Member

@jpountz don't forget CHANGES ;) but this lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants