-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Open
Labels
Description
Description
- Each IndexSearcher has its own UsageTrackingQueryCachingPolicy that is shared across all segments.
- This caching policy uses a 256-length ring buffer to keep track of recently used queries.
- A
TermInSetQuerywithrewriteMethod = MultiTermQuery.CONSTANT_SCORE_BLENDED_REWRITEyields a RewritingWeight. - Getting a scorer from this RewritingWeight for a segment could involve rewriting to a BooleanQuery of multiple TermQuery with only the terms present in that particular segment - ref
org.apache.lucene.search.AbstractMultiTermQueryConstantScoreWrapper.RewritingWeight#scorerSupplier - Thus a single TermInSetQuery will end up thrashing the ring buffer as multiple distinct
BooleanQuerys from different segments. - This leads to a poor caching rate for indexes with a large number of segments.
We could verify this behavior with a new caching policy that delegates to UsageTrackingQueryCachingPolicy after logging the onUse() and shouldCache() calls.
Is there a good reason to not have this ring buffer tracking at a per segment level? That would fix this issue.
Version and environment details
Lucene 9.12.1