Skip to content

Conversation

@fulmicoton
Copy link
Collaborator

If a posting list doc freq is the segment reader's max_doc, and if scoring does not matter, we can replace it by a AllScorer.

In turn, in a boolean query, we can dismiss all scorers and empty scorers, to accelerate the request.

@fulmicoton fulmicoton requested review from PSeitz and trinity-1686a and removed request for trinity-1686a November 25, 2025 13:43
#[derive(Default, Copy, Clone, Debug)]
struct AllAndEmptyScorerCounts {
all_count: usize,
empty_count: usize,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
empty_count: usize,
num_empty_scorer: usize,


#[derive(Default, Copy, Clone, Debug)]
struct AllAndEmptyScorerCounts {
all_count: usize,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
all_count: usize,
num_all_scorer: usize,

index_writer.add_document(doc!(text_field=>"hello happy"))?;
index_writer.commit()?;
let searcher = index.reader()?.searcher();
let term_a: Box<dyn Query> = Box::new(TermQuery::new(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let term_a: Box<dyn Query> = Box::new(TermQuery::new(
let hit_all_term: Box<dyn Query> = Box::new(TermQuery::new(

Term::from_field_text(text_field, "hello"),
IndexRecordOption::Basic,
));
let term_b: Box<dyn Query> = Box::new(TermQuery::new(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let term_b: Box<dyn Query> = Box::new(TermQuery::new(
let hit_some_term: Box<dyn Query> = Box::new(TermQuery::new(

Term::from_field_text(text_field, "happy"),
IndexRecordOption::Basic,
));
let term_c: Box<dyn Query> = Box::new(TermQuery::new(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let term_c: Box<dyn Query> = Box::new(TermQuery::new(
let hit_none_term: Box<dyn Query> = Box::new(TermQuery::new(

mut boost: Score,
) -> crate::Result<SpecializedScorer> {
if !self.scoring_enabled {
boost = 1.0f32;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this?

If a posting list doc freq is the segment reader's
max_doc, and if scoring does not matter, we can replace it
by a AllScorer.

In turn, in a boolean query, we can dismiss  all scorers and
empty scorers, to accelerate the request.
@fulmicoton-dd fulmicoton-dd force-pushed the paul.masurel/optimization-all-empty-scorer branch from a7212d5 to dc4c218 Compare November 25, 2025 14:34
scoring_enabled: bool,
}

pub(crate) enum SpecializedScorer {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SpecializedTermScorer?
We already have SpecializedScorer in boolean_weight.rs, that's confusing

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed it to something explicit, removed it from the pub(crate) api and made it private.

let positive_scorer = match (should_scorers, must_scorers) {
(ShouldScorersCombinationMethod::Ignored, must_scorers) => {
let boxed_scorer: Box<dyn Scorer> = if must_scorers.is_empty() {
if must_special_scorer_counts.all_count + should_special_scorer_counts.all_count
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment would be good here

(CombinationMethod::Ignored, Some(must_scorers)) => {
SpecializedScorer::Other(intersect_scorers(must_scorers, num_docs))

let positive_scorer = match (should_scorers, must_scorers) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let positive_scorer = match (should_scorers, must_scorers) {
let include_scorer = match (should_scorers, must_scorers) {

@fulmicoton-dd fulmicoton-dd force-pushed the paul.masurel/optimization-all-empty-scorer branch 3 times, most recently from 195c5f1 to 814ec2f Compare November 26, 2025 09:11
@fulmicoton-dd fulmicoton-dd force-pushed the paul.masurel/optimization-all-empty-scorer branch from 814ec2f to e864338 Compare November 26, 2025 09:15
@fulmicoton fulmicoton requested a review from PSeitz November 26, 2025 12:52
@fulmicoton fulmicoton merged commit f88b720 into main Nov 26, 2025
7 checks passed
@fulmicoton fulmicoton deleted the paul.masurel/optimization-all-empty-scorer branch November 26, 2025 14:50
fulmicoton-dd added a commit that referenced this pull request Dec 1, 2025
* Optimization when posting list are saturated.

If a posting list doc freq is the segment reader's
max_doc, and if scoring does not matter, we can replace it
by a AllScorer.

In turn, in a boolean query, we can dismiss  all scorers and
empty scorers, to accelerate the request.

* Added range query optimization

* CR comment

* CR comments

* CR comment

---------

Co-authored-by: Paul Masurel <[email protected]>
fulmicoton-dd added a commit that referenced this pull request Dec 1, 2025
* Optimization when posting list are saturated.

If a posting list doc freq is the segment reader's
max_doc, and if scoring does not matter, we can replace it
by a AllScorer.

In turn, in a boolean query, we can dismiss  all scorers and
empty scorers, to accelerate the request.

* Added range query optimization

* CR comment

* CR comments

* CR comment

---------

Co-authored-by: Paul Masurel <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants