Skip to content

Conversation

lsh1215
Copy link

@lsh1215 lsh1215 commented Aug 22, 2025

OpenSearch: Document ID management for AWS OpenSearch Serverless (manageDocumentIds)

Summary

  • AWS OpenSearch Serverless vector collections do not allow indexing with custom document IDs (issue: Document ID is not supported when adding embeddings to AWS OpenSearch #3818).
  • OpenSearchVectorStore#doAdd(List<Document>) was updated to make document ID handling configurable. When manageDocumentIds=false, the index request omits the ID so that OpenSearch auto-generates it.
  • The change is verified with unit and integration tests.

Background

  • Error observed: "Document ID is not supported in create/index operation request".
  • Root cause: AWS OpenSearch Serverless (time series/vector collections) disallows custom document IDs and upserts.
  • Goal: Allow clients to opt out of explicit IDs so OpenSearch can auto-generate them during indexing.

Changes

  • File: org.springframework.ai.vectorstore.opensearch.OpenSearchVectorStore
    • Method: doAdd(List<Document> documents)
    • Behavior:
      • manageDocumentIds=true (default): index with explicit IDs (backward-compatible)
      • manageDocumentIds=false: omit ID so that OpenSearch auto-generates it
// doAdd excerpt
if (this.manageDocumentIds) {
    bulkRequestBuilder.operations(op -> op
        .index(idx -> idx.index(this.index).id(openSearchDocument.id()).document(openSearchDocument)));
}
else {
    bulkRequestBuilder.operations(op -> op
        .index(idx -> idx.index(this.index).document(openSearchDocument)));
}

Usage

OpenSearchVectorStore store = OpenSearchVectorStore
    .builder(openSearchClient, embeddingModel)
    .initializeSchema(true)
    .manageDocumentIds(false) // AWS OpenSearch Serverless compatible
    .build();

Testing

Unit tests

  • File: OpenSearchVectorStoreTest
  • Verifies:
    • manageDocumentIds=true: BulkRequest contains explicit IDs
    • manageDocumentIds=false: BulkRequest omits IDs (auto-generated)
    • Single and multiple document cases
    • Embedding model error propagation

Run:

./mvnw -pl vector-stores/spring-ai-opensearch-store -Dtest=OpenSearchVectorStoreTest test

Integration tests

  • File: OpenSearchVectorStoreIT
  • Environment: Testcontainers OpenSearch + OpenAiEmbeddingModel
  • Verifies:
    • manageDocumentIds=false: indexing/search without explicit IDs (AWS Serverless compatible)
    • manageDocumentIds=true: explicit IDs and delete-by-ID
    • Indexing, similarity search, and content/metadata preservation

Run:

./mvnw -pl vector-stores/spring-ai-opensearch-store -am -Dtest=OpenSearchVectorStoreIT test

Caveats and compatibility

  • With manageDocumentIds=false, OpenSearch auto-generates IDs. ID-based deletion may therefore be limited; prefer filter-based deletion in this mode.
  • Existing behavior (explicit IDs) is preserved when manageDocumentIds=true.

Related PR and Next Steps

I believe the failing integration tests in this PR are related to the schema and search path improvements being introduced in [PR #1121].

To move forward, I see two potential paths:

  1. If [PR Enhanced OpenSearchVectorStore - Squashed #1121] is merged first, I will rebase my changes on top of it and resolve any conflicts or test failures.
  2. Alternatively, I can pull the necessary changes from [PR Enhanced OpenSearchVectorStore - Squashed #1121] into this PR to fix the integration tests directly.

Please let me know which approach you prefer. I'm happy to proceed with either option to get this resolved.


Related issue

…ITs; AWS Serverless compat.

- Update OpenSearchVectorStore#doAdd to omit explicit document IDs when manageDocumentIds=false, enabling AWS OpenSearch Serverless compatibility
- Add unit tests for document ID management logic in doAdd
- Add integration tests covering explicit/non-explicit ID modes and delete-by-ID behavior

Closes spring-projectsgh-3818

Signed-off-by: sanghun <[email protected]>
@ilayaperumalg
Copy link
Member

@lsh1215 Thanks for the PR and the detailed description explaining the changes and testing. We'll try to get this in for 1.1.0-M3

@ilayaperumalg ilayaperumalg added this to the 1.1.0.M3 milestone Sep 25, 2025
@ilayaperumalg ilayaperumalg self-assigned this Sep 25, 2025
@ilayaperumalg ilayaperumalg modified the milestones: 1.1.0.M3, 1.1.0.M4 Oct 9, 2025

private String similarityFunction;

private final boolean manageDocumentIds;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be true by default to let Spring AI managing the documentIDs and for backwards compatibility?.


private String similarityFunction = COSINE_SIMILARITY_FUNCTION;

private boolean manageDocumentIds = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here as well for the default value to be true?

@ilayaperumalg
Copy link
Member

@lsh1215 Please check the review comments. Thanks!

@lsh1215
Copy link
Author

lsh1215 commented Oct 18, 2025

@ilayaperumalg Thank you for the review! I'll check the review comments and address them shortly.

The manageDocumentIds flag was initially set to false, which would
break existing users who rely on explicit document ID management.
This change sets the default to true to preserve the current behavior
for all existing OpenSearch users.

AWS OpenSearch Serverless users can explicitly opt-in by setting
manageDocumentIds(false) when they need auto-generated IDs due to
the platform's restrictions on custom document IDs.

This ensures backward compatibility while still providing the
flexibility needed for AWS Serverless environments.

Related: spring-projectsgh-3818
Signed-off-by: sanghun <[email protected]>
@lsh1215
Copy link
Author

lsh1215 commented Oct 19, 2025

c9cd38c

@ilayaperumalg Thank you for the review! You're absolutely right about the default value.

I've updated manageDocumentIds to default to true for backward compatibility. This ensures existing OpenSearch users won't be affected, while AWS Serverless users can explicitly opt-in by setting manageDocumentIds(false).

Changes made:

  • Set manageDocumentIds = true in the Builder class (line 446)

This preserves the current behavior as the default and treats AWS Serverless compatibility as an opt-in feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants