-
Notifications
You must be signed in to change notification settings - Fork 1.3k
[QueryCache] Use RamUsageEstimator instead of defaulting to 1024 bytes for non-accountable query size #15124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…1024 bytes for non accountable queries
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR. |
lucene/core/src/test/org/apache/lucene/search/TestLRUQueryCache.java
Outdated
Show resolved
Hide resolved
@msfroh Would you mind taking a look at this? |
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR. |
long queryRamBytesUsed = RamUsageEstimator.sizeOf(query, 0); | ||
return LINKED_HASHTABLE_RAM_BYTES_PER_ENTRY + queryRamBytesUsed; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize you have a small benchmark that you did, but real world performance is still a concern of mine here.
I am not sure how to benchmark this, but my concern is that non-accountable queries just got more expensive, we calculate memory used for a query on cache and eviction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I just timed one 3kb boolean query.
I am not sure how to benchmark this, but my concern is that non-accountable queries just got more expensive
The only way I see it is to micro-benchmark RamUsageEstimator
with different query types and size, might give more idea. I had tried to do same but with only term and boolean queries.
but my concern is that non-accountable queries just got more expensive
I am not sure whether it would be that expensive(atleast in absolute terms). Like for TermQuery
and similar cheaper queries, RamUsageEstimator should be pretty fast considering we already cache ramBytesUsed for the underlying terms etc. In my opinion, complex BooleanQueries with many nested clauses might be more expensive, as we need to visit all the underlying children for size calculation. To handle that, we already have a limit of 16 clauses beyond which we can cannot cache, and it can be refined further if needed.
I don't see any other good way of calculating query size until we implement Accountable for all queries, but it would too intrusive for just the caching use-case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benwtrent So I wrote a simple code to micro-benchmark putIfAbsent
method which is the one which calls getRamBytesUsed(Query query)
during caching and eviction.
Created a cache with MAX_SIZE = 10000
and MAX_SIZE_IN_BYTES = 1048576
, and created N sample boolean queries and the logic looks like something below
for (int i = 0; i < MAX_SIZE; i++) {
TermQuery must = new TermQuery(new Term("foo", "bar" + i));
TermQuery filter = new TermQuery(new Term("foo", "quux" + i));
TermQuery mustNot = new TermQuery(new Term("foo", "foo" + i));
BooleanQuery.Builder bq = new BooleanQuery.Builder();
bq.add(must, BooleanClause.Occur.FILTER);
bq.add(filter, BooleanClause.Occur.FILTER);
bq.add(mustNot, BooleanClause.Occur.MUST_NOT);
queries[i] = bq.build();
}
JMH method to test 100% writes workload for 10 threads
@Benchmark
@Group("concurrentPutOnly")
@GroupThreads(10)
public void testConcurrentPuts() {
int random = ThreadLocalRandom.current().nextInt(MAX_SIZE);
queryCache.putIfAbsent(
queries[random], this.sampleCacheAndCount, cacheHelpers[random & (SEGMENTS - 1)]);
}
Baseline numbers
Benchmark Mode Cnt Score Error Units
QueryCacheBenchmark.concurrentPutOnly thrpt 15 4102080.220 ± 80816.546 ops/s
My changes
Benchmark Mode Cnt Score Error Units
QueryCacheBenchmark.concurrentPutOnly thrpt 15 901925.345 ± 38059.230 ops/s
So it became ~4.5x slower. There were lot of evictions as well. Though this probably one of the worst case scenario with only write workload and boolean query. For a mixed read/write or simple filter queries, this might be way less.
And this is the method profile (taken from JFR). Kind of expected.

I don't know if there is a way out or we can further improve query visitor logic itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like I was doing
long queryRamBytesUsed = RamUsageEstimator.sizeOf(query, 0);
long sizeOf(Query q, long defSize)
Here defSize represents the shallow size of the query, and if 0 is passed, it calculates the shallow size during runtime which was doing a lot of object allocations(adding to higher CPU%) and therefore adding to latency.
I changed to RamUsageEstimator.sizeOf(query, 32);
, Here 32 is a rough shallow bytes for a term/boolean query which I calculated manually by calling that method
After running with above change, performance improved but still 2.5x slower than the baseline.
Benchmark Mode Cnt Score Error Units
QueryCacheBenchmark.concurrentPutOnly thrpt 15 1608680.127 ± 229218.026 ops/s
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution! |
Description
Related issue - #15097
Instead of using the default 1024 bytes for query size, we try to use RamUsageEstimator to calculate approx size. As seen in above issues, using 1024 as a default can cause issues(like heap exhaustion) due to underestimation, and even cause overestimation leading to fewer queries getting cached.
RamUsageEstimator.sizeOf()
is usually pretty fast, I made a sample BooleanQuery with 15 clauses with a size of around 3kb. It took around 14959ns to calculate its size viaRamUsageEstimator.sizeOf()
.