[5.x] Performance Optimisation for Queries - Optimise IteratorBuilder limit queries to avoid loading all items #13217
+723
−3
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR optimises the
IteratorBuilderto avoid loading all items when alimit()is applied withoutorderBy()orinRandomOrder(). This significantly improves search query performance, particularly for sites with large datasets.Fixes #13215
Problem
Previously,
IteratorBuilder::getFilteredItems()loaded ALL items before applying limits. For a->limit(10)query on 10,000 results, it would hydrate all 10,000 items before taking the first 10.Solution
Added a new abstract method
getBaseItemsLazy(): Generatorthat yields items lazily, enabling early termination when the limit is reached.Three optimisation paths in
getFilteredItems():offset + limititemsBenchmark Results
I wrote a simple benchmark script separately to test the difference in performance between current code and new code for various scenarios (eg wheres, where + limit, limit etc.). I skipped orderBy as this remains the same, but I did test just in case and it's the same.
Safety checks
orderBy()orinRandomOrder()load all items (sorting/shuffling requires all)limit()load all items (no early termination possible)Files Changed
src/Query/IteratorBuilder.php- Core optimisation logicsrc/Query/ItemQueryBuilder.php- ImplementsgetBaseItemsLazy()src/Search/QueryBuilder.php- ImplementsgetBaseItemsLazy()with batch hydrationTests
tests/Query/IteratorBuilderTest.php- 13 tests covering optimisation pathstests/Search/QueryBuilderPerformanceTest.php- 14 tests for search-specific behaviourtests/Fakes/Query/TestIteratorBuilder.php- Test helpertests/Fakes/Query/HydrationTrackingQueryBuilder.php- Test helperNote: These tests all pass, it's just the UTF-8 which Duncan is fixing in another PR that are failing.
Potential Future Optimisation
This PR optimises hydration by stopping early once the limit is reached. However, the search drivers still fetch all raw results from the index before we apply the limit.
A possible future optimisation could pass the limit down to
getSearchResults($query, $limit = null)so drivers can fetch fewer results at the source:hitsPerPageAPI parameter to request fewer hitsThis would be most beneficial for large indexes where the initial lookup is expensive.
In my testing, this PR still reduces my search from 3.5s down to 600ms using Comb, and with this future optimisation would reduce down to about 400ms.