Skip to content

Index level caching policy is thrashed by segment-specific query rewrites #14986

@GovindBalaji-S-Glean

Description

@GovindBalaji-S-Glean

Description

  1. Each IndexSearcher has its own UsageTrackingQueryCachingPolicy that is shared across all segments.
  2. This caching policy uses a 256-length ring buffer to keep track of recently used queries.
  3. A TermInSetQuery with rewriteMethod = MultiTermQuery.CONSTANT_SCORE_BLENDED_REWRITE yields a RewritingWeight.
  4. Getting a scorer from this RewritingWeight for a segment could involve rewriting to a BooleanQuery of multiple TermQuery with only the terms present in that particular segment - ref org.apache.lucene.search.AbstractMultiTermQueryConstantScoreWrapper.RewritingWeight#scorerSupplier
  5. Thus a single TermInSetQuery will end up thrashing the ring buffer as multiple distinct BooleanQuerys from different segments.
  6. This leads to a poor caching rate for indexes with a large number of segments.

We could verify this behavior with a new caching policy that delegates to UsageTrackingQueryCachingPolicy after logging the onUse() and shouldCache() calls.

Is there a good reason to not have this ring buffer tracking at a per segment level? That would fix this issue.

Version and environment details

Lucene 9.12.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions