Skip to content

Commit 4a2dd28

Browse files
authored
Add guide on optimizing indexing performance by analyzing batch stats (#3383)
1 parent 8c06e23 commit 4a2dd28

File tree

2 files changed

+121
-1
lines changed

2 files changed

+121
-1
lines changed

docs.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -293,7 +293,8 @@
293293
"learn/indexing/indexing_best_practices",
294294
"learn/indexing/ram_multithreading_performance",
295295
"learn/indexing/tokenization",
296-
"learn/indexing/multilingual-datasets"
296+
"learn/indexing/multilingual-datasets",
297+
"learn/indexing/optimize_indexing_performance"
297298
]
298299
},
299300
{
Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
---
2+
title: Optimize indexing performance with batch statistics
3+
description: Learn how to analyze the `progressTrace` to identify and resolve indexing bottlenecks in Meilisearch.
4+
---
5+
6+
# Optimize indexing performance by analyzing batch statistics
7+
8+
Indexing performance can vary significantly depending on your dataset, index settings, and hardware. The [batch object](/reference/api/batches) provides information about the progress of asynchronous indexing operations.
9+
10+
The `progressTrace` field within the batch object offers a detailed breakdown of where time is spent during the indexing process. Use this data to identify bottlenecks and improve indexing speed.
11+
12+
## Understanding the `progressTrace`
13+
14+
`progressTrace` is a hierarchical trace showing each phase of indexing and how long it took.
15+
Each entry follows the structure:
16+
17+
```json
18+
"processing tasks > indexing > extracting word proximity": "33.71s"
19+
```
20+
21+
This means:
22+
23+
- The step occurred during **indexing**.
24+
- The subtask was **extracting word proximity**.
25+
- It took **33.71 seconds**.
26+
27+
Focus on the **longest-running steps** and investigate which index settings or data characteristics influence them.
28+
29+
## Key phases and how to optimize them
30+
31+
### `computing document changes`and `extracting documents`
32+
33+
| Description | Optimization |
34+
|--------------|--------------|
35+
| Meilisearch compares incoming documents to existing ones. | No direct optimization possible. Process duration scales with the number and size of incoming documents.|
36+
37+
### `extracting facets` and `merging facet caches`
38+
39+
| Description | Optimization |
40+
|--------------|--------------|
41+
| Extracts and merges filterable attributes. | Keep the number of [**filterable attributes**](/reference/api/settings#filterable-attributes) to a minimum. |
42+
43+
### `extracting words` and `merging word caches`
44+
45+
| Description | Optimization |
46+
|--------------|--------------|
47+
| Tokenizes text and builds the inverted index. | Ensure the [searchable attributes](/reference/api/settings#searchable-attributes) list only includes the fields you want to be checked for query word matches. |
48+
49+
### `extracting word proximity` and `merging word proximity`
50+
51+
| Description | Optimization |
52+
|--------------|--------------|
53+
| Builds data structures for phrase and attribute ranking. | Lower the precision of this operation by setting [proximity precision](/reference/api/settings#proximity-precision) to `byAttribute` |
54+
55+
### `waiting for database writes`
56+
57+
| Description | Optimization |
58+
|--------------|--------------|
59+
| Time spent writing data to disk. | No direct optimization possible. Either the disk is too slow or you are writing too much data in a single operation. Avoid HDDs (Hard Disk Drives) |
60+
61+
### `waiting for extractors`
62+
63+
| Description | Optimization |
64+
|--------------|--------------|
65+
| Time spent waiting for CPU-bound extraction. | No direct optimization possible. Indicates a CPU bottleneck. Use more cores or scale horizontally with [sharding](/learn/advanced/sharding). |
66+
67+
### `post processing facets > strings bulk` / `numbers bulk`
68+
69+
| Description | Optimization |
70+
|--------------|--------------|
71+
| Processes equality or comparison filters. | - Disable unused [**filter features**](/reference/api/settings#features), such as comparison operators on string values. <br /> - Reduce the number of [**sortable attributes**](reference/api/settings#sortable-attributes). |
72+
73+
### `post processing facets > facet search`
74+
75+
| Description | Optimization |
76+
|--------------|--------------|
77+
| Builds structures for the [facet search API](/reference/api/facet_search). | If you don’t use the facet search API, [disable it](/reference/api/settings#update-facet-search-settings).|
78+
79+
### Embeddings
80+
81+
| Trace key | Description | Optimization |
82+
|------------|--------------|--------------|
83+
| `writing embeddings to database` | Time spent saving vector embeddings. | Use embedding vectors with fewer dimensions. <br/>- [Disabling embedding regeneration on document update](/reference/api/documents#vectors). <br/>- Consider enabling [binary quantization](/reference/api/settings#binaryquantized). |
84+
85+
### `post processing words > word prefix *`
86+
87+
| Description | Optimization |
88+
|--------------|--------------|
89+
| | Builds prefix data for autocomplete. Allows matching documents that begin with a specific query term, instead of only exact matches.| Disable [**prefix search**](/reference/api/settings#prefix-search) (`prefixSearch: disabled`). _This can severely impact search result relevancy._ |
90+
91+
### `post processing words > word fst`
92+
93+
| Description | Optimization |
94+
|--------------|--------------|
95+
| Builds the word FST (finite state transducer). | No direct action possible, as FST size reflect the number of different words in the database. Using documents with fewer searchable words may improve operation speed. |
96+
97+
## Example analysis
98+
99+
If you see:
100+
101+
```json
102+
"processing tasks > indexing > post processing facets > facet search": "1763.06s"
103+
```
104+
105+
[Facet searching](/learn/filtering_and_sorting/search_with_facet_filters#searching-facet-values) is raking significant indexing time. If your application doesn’t use facets, disable the feature:
106+
107+
```bash
108+
curl \
109+
-X PUT 'MEILISEARCH_URL/indexes/INDEX_UID/settings/facet-search' \
110+
-H 'Content-Type: application/json' \
111+
--data-binary 'false'
112+
```
113+
114+
## Learn more
115+
116+
- [Indexing best practices](/learn/indexing/indexing_best_practices)
117+
- [Impact of RAM and multi-threading on indexing performance
118+
](/learn/indexing/ram_multithreading_performance)
119+
- [Configuring index settings](/learn/configuration/configuring_index_settings)

0 commit comments

Comments
 (0)