-
Notifications
You must be signed in to change notification settings - Fork 2k
OpenSearch: Omit explicit document IDs when manageDocumentIds=false for AWS Serverless (issue - #3818) #4220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ITs; AWS Serverless compat. - Update OpenSearchVectorStore#doAdd to omit explicit document IDs when manageDocumentIds=false, enabling AWS OpenSearch Serverless compatibility - Add unit tests for document ID management logic in doAdd - Add integration tests covering explicit/non-explicit ID modes and delete-by-ID behavior Closes spring-projectsgh-3818 Signed-off-by: sanghun <[email protected]>
@lsh1215 Thanks for the PR and the detailed description explaining the changes and testing. We'll try to get this in for 1.1.0-M3 |
|
||
private String similarityFunction; | ||
|
||
private final boolean manageDocumentIds; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be true
by default to let Spring AI managing the documentIDs and for backwards compatibility?.
|
||
private String similarityFunction = COSINE_SIMILARITY_FUNCTION; | ||
|
||
private boolean manageDocumentIds = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here as well for the default value to be true?
@lsh1215 Please check the review comments. Thanks! |
@ilayaperumalg Thank you for the review! I'll check the review comments and address them shortly. |
The manageDocumentIds flag was initially set to false, which would break existing users who rely on explicit document ID management. This change sets the default to true to preserve the current behavior for all existing OpenSearch users. AWS OpenSearch Serverless users can explicitly opt-in by setting manageDocumentIds(false) when they need auto-generated IDs due to the platform's restrictions on custom document IDs. This ensures backward compatibility while still providing the flexibility needed for AWS Serverless environments. Related: spring-projectsgh-3818 Signed-off-by: sanghun <[email protected]>
@ilayaperumalg Thank you for the review! You're absolutely right about the default value. I've updated Changes made:
This preserves the current behavior as the default and treats AWS Serverless compatibility as an opt-in feature. |
OpenSearch: Document ID management for AWS OpenSearch Serverless (manageDocumentIds)
Summary
OpenSearchVectorStore#doAdd(List<Document>)
was updated to make document ID handling configurable. WhenmanageDocumentIds=false
, the index request omits the ID so that OpenSearch auto-generates it.Background
Changes
org.springframework.ai.vectorstore.opensearch.OpenSearchVectorStore
doAdd(List<Document> documents)
manageDocumentIds=true
(default): index with explicit IDs (backward-compatible)manageDocumentIds=false
: omit ID so that OpenSearch auto-generates itUsage
Testing
Unit tests
OpenSearchVectorStoreTest
manageDocumentIds=true
: BulkRequest contains explicit IDsmanageDocumentIds=false
: BulkRequest omits IDs (auto-generated)Run:
./mvnw -pl vector-stores/spring-ai-opensearch-store -Dtest=OpenSearchVectorStoreTest test
Integration tests
OpenSearchVectorStoreIT
OpenAiEmbeddingModel
manageDocumentIds=false
: indexing/search without explicit IDs (AWS Serverless compatible)manageDocumentIds=true
: explicit IDs and delete-by-IDRun:
./mvnw -pl vector-stores/spring-ai-opensearch-store -am -Dtest=OpenSearchVectorStoreIT test
Caveats and compatibility
manageDocumentIds=false
, OpenSearch auto-generates IDs. ID-based deletion may therefore be limited; prefer filter-based deletion in this mode.manageDocumentIds=true
.Related PR and Next Steps
I believe the failing integration tests in this PR are related to the schema and search path improvements being introduced in [PR #1121].
To move forward, I see two potential paths:
Please let me know which approach you prefer. I'm happy to proceed with either option to get this resolved.
Related issue