Skip to content

Conversation

@jonathan-bird
Copy link

@jonathan-bird jonathan-bird commented Nov 30, 2025

Description

This PR optimises the IteratorBuilder to avoid loading all items when a limit() is applied without orderBy() or inRandomOrder(). This significantly improves search query performance, particularly for sites with large datasets.

Fixes #13215

Problem

Previously, IteratorBuilder::getFilteredItems() loaded ALL items before applying limits. For a ->limit(10) query on 10,000 results, it would hydrate all 10,000 items before taking the first 10.

Solution

Added a new abstract method getBaseItemsLazy(): Generator that yields items lazily, enabling early termination when the limit is reached.

Three optimisation paths in getFilteredItems():

  • No limit, orderBy, or randomise: Falls back to loading all items (unchanged behaviour)
  • Limit without wheres: Loads only offset + limit items
  • Limit with wheres: Batches items and stops early when enough matches are collected

Benchmark Results

I wrote a simple benchmark script separately to test the difference in performance between current code and new code for various scenarios (eg wheres, where + limit, limit etc.). I skipped orderBy as this remains the same, but I did test just in case and it's the same.

Scenario Limit Old Time Old Hydrated New Time New Hydrated Improvement
No wheres 10 0.70ms 10,000 0.04ms 10 99.9% fewer
No wheres 50 0.68ms 10,000 0.05ms 50 99.5% fewer
No wheres 100 0.68ms 10,000 0.06ms 100 99% fewer
50% match rate 10 9.13ms 10,000 0.10ms 50 99.5% fewer
50% match rate 50 9.14ms 10,000 0.15ms 100 99% fewer
50% match rate 100 9.12ms 10,000 0.26ms 200 98% fewer
10% match rate 10 9.01ms 10,000 0.18ms 100 99% fewer
10% match rate 50 8.99ms 10,000 0.57ms 500 95% fewer
10% match rate 100 8.98ms 10,000 1.08ms 1,000 90% fewer

Safety checks

  • Queries with orderBy() or inRandomOrder() load all items (sorting/shuffling requires all)
  • Queries without limit() load all items (no early termination possible)
  • Results are identical to previous behaviour, just faster

Files Changed

  • src/Query/IteratorBuilder.php - Core optimisation logic
  • src/Query/ItemQueryBuilder.php - Implements getBaseItemsLazy()
  • src/Search/QueryBuilder.php - Implements getBaseItemsLazy() with batch hydration

Tests

  • tests/Query/IteratorBuilderTest.php - 13 tests covering optimisation paths
  • tests/Search/QueryBuilderPerformanceTest.php - 14 tests for search-specific behaviour
  • tests/Fakes/Query/TestIteratorBuilder.php - Test helper
  • tests/Fakes/Query/HydrationTrackingQueryBuilder.php - Test helper

Note: These tests all pass, it's just the UTF-8 which Duncan is fixing in another PR that are failing.

Potential Future Optimisation

This PR optimises hydration by stopping early once the limit is reached. However, the search drivers still fetch all raw results from the index before we apply the limit.

A possible future optimisation could pass the limit down to getSearchResults($query, $limit = null) so drivers can fetch fewer results at the source:

  • Algolia: Could use the hitsPerPage API parameter to request fewer hits
  • Comb: Could limit raw results before mapping scores/snippets

This would be most beneficial for large indexes where the initial lookup is expensive.

In my testing, this PR still reduces my search from 3.5s down to 600ms using Comb, and with this future optimisation would reduce down to about 400ms.

@jonathan-bird jonathan-bird changed the title Optimise IteratorBuilder limit queries to avoid loading all items [5.x] Optimise IteratorBuilder limit queries to avoid loading all items Nov 30, 2025
@jonathan-bird jonathan-bird changed the title [5.x] Optimise IteratorBuilder limit queries to avoid loading all items [5.x] Performance Optimisation for Queries - Optimise IteratorBuilder limit queries to avoid loading all items Dec 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Search hydrates all results before applying limit causing slow results on large sites + suggestions

2 participants