Add guide on optimizing indexing performance by analyzing batch stats (#3383)

CaroFG · web-flow · commit 4a2dd2835936 · 2025-10-27T12:40:45.000+01:00
diff --git a/docs.json b/docs.json
@@ -293,7 +293,8 @@
                   "learn/indexing/indexing_best_practices",
                   "learn/indexing/ram_multithreading_performance",
                   "learn/indexing/tokenization",
-                  "learn/indexing/multilingual-datasets"
+                  "learn/indexing/multilingual-datasets",
+                  "learn/indexing/optimize_indexing_performance"
                 ]
               },
               {
diff --git a/learn/indexing/optimize_indexing_performance.mdx b/learn/indexing/optimize_indexing_performance.mdx
@@ -0,0 +1,119 @@
+---
+title: Optimize indexing performance with batch statistics 
+description: Learn how to analyze the `progressTrace` to identify and resolve indexing bottlenecks in Meilisearch.
+---
+
+# Optimize indexing performance by analyzing batch statistics
+
+Indexing performance can vary significantly depending on your dataset, index settings, and hardware. The [batch object](/reference/api/batches) provides information about the progress of asynchronous indexing operations.
+
+The `progressTrace` field within the batch object offers a detailed breakdown of where time is spent during the indexing process. Use this data to identify bottlenecks and improve indexing speed.
+
+## Understanding the `progressTrace`
+
+`progressTrace` is a hierarchical trace showing each phase of indexing and how long it took.
+Each entry follows the structure:
+
+```json
+"processing tasks > indexing > extracting word proximity": "33.71s"
+```
+
+This means:
+
+- The step occurred during **indexing**.
+- The subtask was **extracting word proximity**.
+- It took **33.71 seconds**.
+
+Focus on the **longest-running steps** and investigate which index settings or data characteristics influence them.
+
+## Key phases and how to optimize them
+
+### `computing document changes`and `extracting documents`
+
+| Description | Optimization |
+|--------------|--------------|
+| Meilisearch compares incoming documents to existing ones. | No direct optimization possible. Process duration scales with the number and size of incoming documents.|
+
+### `extracting facets` and `merging facet caches`
+
+| Description | Optimization |
+|--------------|--------------|
+| Extracts and merges filterable attributes. | Keep the number of [**filterable attributes**](/reference/api/settings#filterable-attributes) to a minimum. |
+
+### `extracting words` and `merging word caches`
+
+| Description | Optimization |
+|--------------|--------------|
+| Tokenizes text and builds the inverted index. | Ensure the [searchable attributes](/reference/api/settings#searchable-attributes) list only includes the fields you want to be checked for query word matches. |
+
+### `extracting word proximity` and `merging word proximity`
+
+| Description | Optimization |
+|--------------|--------------|
+| Builds data structures for phrase and attribute ranking. | Lower the precision of this operation by setting [proximity precision](/reference/api/settings#proximity-precision) to `byAttribute` |
+
+### `waiting for database writes`
+
+| Description | Optimization |
+|--------------|--------------|
+| Time spent writing data to disk. | No direct optimization possible. Either the disk is too slow or you are writing too much data in a single operation. Avoid HDDs (Hard Disk Drives) |
+
+### `waiting for extractors`
+
+| Description | Optimization |
+|--------------|--------------|
+| Time spent waiting for CPU-bound extraction. | No direct optimization possible. Indicates a CPU bottleneck. Use more cores or scale horizontally with [sharding](/learn/advanced/sharding). |
+
+### `post processing facets > strings bulk` / `numbers bulk`
+
+| Description | Optimization |
+|--------------|--------------|
+| Processes equality or comparison filters. | - Disable unused [**filter features**](/reference/api/settings#features), such as comparison operators on string values. <br /> - Reduce the number of [**sortable attributes**](reference/api/settings#sortable-attributes). |
+
+### `post processing facets > facet search`
+
+| Description | Optimization |
+|--------------|--------------|
+| Builds structures for the [facet search API](/reference/api/facet_search). | If you don’t use the facet search API, [disable it](/reference/api/settings#update-facet-search-settings).|
+
+### Embeddings
+
+| Trace key | Description | Optimization |
+|------------|--------------|--------------|
+| `writing embeddings to database` | Time spent saving vector embeddings. | Use embedding vectors with fewer dimensions. <br/>- [Disabling embedding regeneration on document update](/reference/api/documents#vectors). <br/>- Consider enabling [binary quantization](/reference/api/settings#binaryquantized). |
+
+### `post processing words > word prefix *`
+
+| Description | Optimization |
+|--------------|--------------|
+| | Builds prefix data for autocomplete. Allows matching documents that begin with a specific query term, instead of only exact matches.| Disable [**prefix search**](/reference/api/settings#prefix-search) (`prefixSearch: disabled`). _This can severely impact search result relevancy._ |
+
+### `post processing words > word fst`
+
+| Description | Optimization |
+|--------------|--------------|
+| Builds the word FST (finite state transducer). | No direct action possible, as FST size reflect the number of different words in the database. Using documents with fewer searchable words may improve operation speed. |
+
+## Example analysis
+
+If you see:
+
+```json
+"processing tasks > indexing > post processing facets > facet search": "1763.06s"
+```
+
+[Facet searching](/learn/filtering_and_sorting/search_with_facet_filters#searching-facet-values) is raking significant indexing time. If your application doesn’t use facets, disable the feature:
+
+```bash
+curl \
+  -X PUT 'MEILISEARCH_URL/indexes/INDEX_UID/settings/facet-search' \
+  -H 'Content-Type: application/json' \
+  --data-binary 'false'
+```
+
+## Learn more
+
+- [Indexing best practices](/learn/indexing/indexing_best_practices)
+- [Impact of RAM and multi-threading on indexing performance
+](/learn/indexing/ram_multithreading_performance)  
+- [Configuring index settings](/learn/configuration/configuring_index_settings)

Original file line number	Diff line number	Diff line change
`@@ -293,7 +293,8 @@`
`293`	`293`	`"learn/indexing/indexing_best_practices",`
`294`	`294`	`"learn/indexing/ram_multithreading_performance",`
`295`	`295`	`"learn/indexing/tokenization",`
`296`		`- "learn/indexing/multilingual-datasets"`
	`296`	`+ "learn/indexing/multilingual-datasets",`
	`297`	`+ "learn/indexing/optimize_indexing_performance"`
`297`	`298`	`]`
`298`	`299`	`},`
`299`	`300`	`{`