From c25b6445f5d53fe93b2363a3bcf28948ab63a68a Mon Sep 17 00:00:00 2001 From: gui machiavelli Date: Tue, 1 Apr 2025 16:15:48 +0200 Subject: [PATCH 1/4] add pooling parameter to embedder settings --- reference/api/settings.mdx | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/reference/api/settings.mdx b/reference/api/settings.mdx index ba54f2dc7..cdf0723e9 100644 --- a/reference/api/settings.mdx +++ b/reference/api/settings.mdx @@ -2394,6 +2394,7 @@ These embedder objects may contain the following fields: | **`request`** | Object | Empty | A JSON value representing the request Meilisearch makes to the remote embedder | | **`response`** | Object | Empty | A JSON value representing the request Meilisearch expects from the remote embedder | | **`binaryQuantized`** | Boolean | Empty | Once set to `true`, irreversibly converts all vector dimensions to 1-bit values | +| **`pooling`** | String | `"useModel"` | Pooling method for Hugging Face embedders | ### Get embedder settings @@ -2457,7 +2458,8 @@ Partially update the embedder settings for an index. When this setting is update "request": { … }, "response": { … }, "headers": { … }, - "binaryQuantized": + "binaryQuantized": , + "pooling": } } ``` @@ -2543,7 +2545,6 @@ This field is incompatible with `userProvided` embedders. This field is optional for all other embedders. - ##### `dimensions` Number of dimensions in the chosen model. If not supplied, Meilisearch tries to infer this value. @@ -2695,6 +2696,20 @@ This option can be useful when working with large Meilisearch projects. Consider **Activating `binaryQuantized` is irreversible.** Once enabled, Meilisearch converts all vectors and discards all vector data that does fit within 1-bit. The only way to recover the vectors' original values is to re-vectorize the whole index in a new embedder. +##### `pooling` + +Pooling refers to one of the last steps in embedding generation. During this phase, the multiple embeddings for individual tokens are merged into a single embedding. Most models indicate which pooling method they expect. + +`pooling` must be one of the following strings: + +- `"useModel"`: Meilisearch will fetch the pooling method from the model configuration, default value for new embedders +- `"forceMean"`: always use mean pooling, default value for embedders created in Meilisearch <=v1.13 +- `"forceCls"`: always use CLS pooling + +`pooling` is optional for embedders with the `huggingFace` source. + +`pooling` is invalid for all other embedder sources. + #### Example From b92001875dff1198a3f0fa672ff75bd1a4883e6b Mon Sep 17 00:00:00 2001 From: gui machiavelli Date: Tue, 1 Apr 2025 17:20:54 +0200 Subject: [PATCH 2/4] escape `<` for better mdx syntax highlighting --- reference/api/settings.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/reference/api/settings.mdx b/reference/api/settings.mdx index cdf0723e9..09241fe29 100644 --- a/reference/api/settings.mdx +++ b/reference/api/settings.mdx @@ -2703,7 +2703,7 @@ Pooling refers to one of the last steps in embedding generation. During this pha `pooling` must be one of the following strings: - `"useModel"`: Meilisearch will fetch the pooling method from the model configuration, default value for new embedders -- `"forceMean"`: always use mean pooling, default value for embedders created in Meilisearch <=v1.13 +- `"forceMean"`: always use mean pooling, default value for embedders created in Meilisearch \<=v1.13 - `"forceCls"`: always use CLS pooling `pooling` is optional for embedders with the `huggingFace` source. From 7b465960655bb7bcaf680f6a0652feb46d8a1001 Mon Sep 17 00:00:00 2001 From: gui machiavelli Date: Tue, 1 Apr 2025 17:23:02 +0200 Subject: [PATCH 3/4] shorten explanation text --- reference/api/settings.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/reference/api/settings.mdx b/reference/api/settings.mdx index 09241fe29..cc16ab29c 100644 --- a/reference/api/settings.mdx +++ b/reference/api/settings.mdx @@ -2698,7 +2698,7 @@ This option can be useful when working with large Meilisearch projects. Consider ##### `pooling` -Pooling refers to one of the last steps in embedding generation. During this phase, the multiple embeddings for individual tokens are merged into a single embedding. Most models indicate which pooling method they expect. +Configure how Meilisearch should merge individual tokens into a single embedding. `pooling` must be one of the following strings: From 2efe272795ef31d4a990678d1a0e886d1a6dfffd Mon Sep 17 00:00:00 2001 From: gui machiavelli Date: Thu, 3 Apr 2025 18:49:06 +0200 Subject: [PATCH 4/4] address reviewer feedback --- reference/api/settings.mdx | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/reference/api/settings.mdx b/reference/api/settings.mdx index cc16ab29c..dab235c3c 100644 --- a/reference/api/settings.mdx +++ b/reference/api/settings.mdx @@ -2702,10 +2702,12 @@ Configure how Meilisearch should merge individual tokens into a single embedding `pooling` must be one of the following strings: -- `"useModel"`: Meilisearch will fetch the pooling method from the model configuration, default value for new embedders -- `"forceMean"`: always use mean pooling, default value for embedders created in Meilisearch \<=v1.13 +- `"useModel"`: Meilisearch will fetch the pooling method from the model configuration. Default value for new embedders +- `"forceMean"`: always use mean pooling. Default value for embedders created in Meilisearch \<=v1.13 - `"forceCls"`: always use CLS pooling +If in doubt, use `"useModel"`. `"forceMean"` and `"forceCls"` are compatibility options that might be necessary for certain embedders and models. + `pooling` is optional for embedders with the `huggingFace` source. `pooling` is invalid for all other embedder sources.