diff --git a/learn/resources/experimental_features_overview.mdx b/learn/resources/experimental_features_overview.mdx index 74dc280ef..472b3ae5c 100644 --- a/learn/resources/experimental_features_overview.mdx +++ b/learn/resources/experimental_features_overview.mdx @@ -54,3 +54,4 @@ Activating or deactivating experimental features this way does not require you t | [Edit documents with function](/reference/api/documents#update-documents-with-function) | Use a RHAI function to edit documents directly in the Meilisearch database | API route | | [`/network` route](/reference/api/network) | Enable `/network` route | API route | | [Dumpless upgrade](/learn/self_hosted/configure_meilisearch_at_launch#dumpless-upgrade) | Upgrade Meilisearch without generating a dump | API route | +| [Composite embedders](/reference/api/settings#composite-embedders) | Enable composite embedders | API route | diff --git a/reference/api/settings.mdx b/reference/api/settings.mdx index ba54f2dc7..db9af8632 100644 --- a/reference/api/settings.mdx +++ b/reference/api/settings.mdx @@ -2380,20 +2380,22 @@ The embedders object may contain up to 256 embedder objects. Each embedder objec These embedder objects may contain the following fields: -| Name | Type | Default Value | Description | -| :---------------------| :---------------| :-----------------------------------------------------------------------| :-------------------------------------------------------------------------------------------------------------------------------------------------------------| -| **`source`** | String | Empty | The third-party tool that will generate embeddings from documents. Must be `openAi`, `huggingFace`, `ollama`, `rest`, or `userProvided` | -| **`url`** | String | `http://localhost:11434/api/embeddings` | The URL Meilisearch contacts when querying the embedder | -| **`apiKey`** | String | Empty | Authentication token Meilisearch should send with each request to the embedder. If not present, Meilisearch will attempt to read it from environment variables | -| **`model`** | String | Empty | The model your embedder uses when generating vectors | -| **`documentTemplate`** | String | `{% for field in fields %} {% if field.is_searchable and not field.value == nil %}{{ field.name }}: {{ field.value }} {% endif %} {% endfor %}` | Template defining the data Meilisearch sends to the embedder | -| **`documentTemplateMaxBytes`** | Integer | `400` | Maximum allowed size of rendered document template | -| **`dimensions`** | Integer | Empty | Number of dimensions in the chosen model. If not supplied, Meilisearch tries to infer this value | -| **`revision`** | String | Empty | Model revision hash | -| **`distribution`** | Object | Empty | Describes the natural distribution of search results. Must contain two fields, `mean` and `sigma`, each containing a numeric value between `0` and `1` | -| **`request`** | Object | Empty | A JSON value representing the request Meilisearch makes to the remote embedder | -| **`response`** | Object | Empty | A JSON value representing the request Meilisearch expects from the remote embedder | -| **`binaryQuantized`** | Boolean | Empty | Once set to `true`, irreversibly converts all vector dimensions to 1-bit values | +| Name | Type | Default Value | Description | +| ------------------------------ | ------- | ----------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| **`source`** | String | Empty | The third-party tool that will generate embeddings from documents. Must be `openAi`, `huggingFace`, `ollama`, `rest`, or `userProvided` | +| **`url`** | String | `http://localhost:11434/api/embeddings` | The URL Meilisearch contacts when querying the embedder | +| **`apiKey`** | String | Empty | Authentication token Meilisearch should send with each request to the embedder. If not present, Meilisearch will attempt to read it from environment variables | +| **`model`** | String | Empty | The model your embedder uses when generating vectors | +| **`documentTemplate`** | String | `{% for field in fields %} {% if field.is_searchable and not field.value == nil %}{{ field.name }}: {{ field.value }} {% endif %} {% endfor %}` | Template defining the data Meilisearch sends to the embedder | +| **`documentTemplateMaxBytes`** | Integer | `400` | Maximum allowed size of rendered document template | +| **`dimensions`** | Integer | Empty | Number of dimensions in the chosen model. If not supplied, Meilisearch tries to infer this value | +| **`revision`** | String | Empty | Model revision hash | +| **`distribution`** | Object | Empty | Describes the natural distribution of search results. Must contain two fields, `mean` and `sigma`, each containing a numeric value between `0` and `1` | +| **`request`** | Object | Empty | A JSON value representing the request Meilisearch makes to the remote embedder | +| **`response`** | Object | Empty | A JSON value representing the request Meilisearch expects from the remote embedder | +| **`binaryQuantized`** | Boolean | Empty | Once set to `true`, irreversibly converts all vector dimensions to 1-bit values | +| **`indexingEmbedder`** | Object | Empty | Configures embedder to vectorize documents during indexing | +| **`searchEmbedder`** | Object | Empty | Configures embedder to vectorize search queries | ### Get embedder settings @@ -2457,7 +2459,9 @@ Partially update the embedder settings for an index. When this setting is update "request": { … }, "response": { … }, "headers": { … }, - "binaryQuantized": + "binaryQuantized": , + "indexingEmbedder": { … }, + "searchEmbedder": { … } } } ``` @@ -2466,18 +2470,39 @@ Set an embedder to `null` to remove it from the embedders list. ##### `source` -Use `source` to configure an embedder's source. The following embedders can auto-generate vectors for documents and queries: +Use `source` to configure an embedder's source. The source corresponds to a service that generates embeddings from your documents. +Meilisearch supports the following sources: - `openAi` - `huggingFace` - `ollama` +- `rest` +- `userProvided` +- `composite` -Additionally, use `rest` to auto-generate embeddings with any embedder offering a REST API. +`rest` is a generic source compatible with any embeddings provider offering a REST API. -You may also configure a `userProvided` embedder. In this case, you must manually include vector data in your documents' `_vectors` field. You must also manually generate vectors for search queries. +Use `userProvided` when you want to generate embeddings manually. In this case, you must include vector data in your documents' `_vectors` field. You must also generate vectors for search queries. This field is mandatory. +###### Composite embedders + +Choose `composite` to use one embedder during indexing time, and another embedder at search time. Must be used together with [`indexingEmbedder` and `searchEmbedder`](#indexingembedder-and-searchembedder). + + +This is an experimental feature. Use the experimental features endpoint to activate it: + +```sh +curl \ + -X PATCH 'MEILISEARCH_URL/experimental-features/' \ + -H 'Content-Type: application/json' \ + --data-binary '{ + "compositeEmbedders": true + }' +``` + + ##### `url` Meilisearch queries `url` to generate vector embeddings for queries and documents. `url` must point to a REST-compatible embedder. You may also use `url` to work with proxies, such as when targeting `openAi` from behind a proxy. @@ -2543,7 +2568,6 @@ This field is incompatible with `userProvided` embedders. This field is optional for all other embedders. - ##### `dimensions` Number of dimensions in the chosen model. If not supplied, Meilisearch tries to infer this value. @@ -2695,6 +2719,26 @@ This option can be useful when working with large Meilisearch projects. Consider **Activating `binaryQuantized` is irreversible.** Once enabled, Meilisearch converts all vectors and discards all vector data that does fit within 1-bit. The only way to recover the vectors' original values is to re-vectorize the whole index in a new embedder. +##### `indexingEmbedder` and `searchEmbedder` + +When using a [composite embedder](#composite-embedders), configure separate embedders Meilisearch should use when vectorizing documents and search queries. + +`indexingEmbedder` often benefits from the higher bandwidth and speed of remote providers so it can vectorize large batches of documents quickly. `searchEmbedder` may often benefits from the lower latency of processing queries locally. + +Both fields must be an object and accept the same fields as a regular embedder, with the following exceptions: + +- `indexingEmbedder` and `searchEmbedder` must use the same model for generating embeddings +- `indexingEmbedder` and `searchEmbedder` must have identical `dimension`s and `pooling` methods +- `source` is mandatory for both `indexingEmbedder` and `searchEmbedder` +- Neither sub-embedder can set `source` to `composite` or `userProvided` +- Neither `binaryQuantized` and `distribution` are valid sub-embedder fields and must always be declared in the main embedder +- `documentTemplate` and `documentTemplateMaxBytes` are invalid fields for `searchEmbedder` +- `documentTemplate` and `documentTemplateMaxBytes` are mandatory for `indexingEmbedder`, if applicable to its source + +`indexingEmbedder` and `searchEmbedder` are mandatory when using the `composite` source. + +`indexingEmbedder` and `searchEmbedder` are incompatible with all other embedder sources. + #### Example diff --git a/reference/errors/error_codes.mdx b/reference/errors/error_codes.mdx index e21a48b6e..0bf485c7d 100644 --- a/reference/errors/error_codes.mdx +++ b/reference/errors/error_codes.mdx @@ -340,6 +340,10 @@ The [`limit`](/reference/api/search#limit) parameter is invalid. It should be an The [`locales`](/reference/api/search#query-locales) parameter is invalid. +## `invalid_settings_embedder` + +The [`embedders`](/reference/api/settings#embedders) index setting value is invalid. + ## `invalid_settings_facet_search` The [`facetSearch`](/reference/api/settings#facet-search) index setting value is invalid.