Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "copy" method to AstraDBVectorStore #111

Merged
merged 3 commits into from
Feb 4, 2025
Merged

Conversation

hemidactylus
Copy link
Collaborator

@hemidactylus hemidactylus commented Feb 3, 2025

This method returns a copy of the vector store with all properties unchanged except the caller information a few properties if passed, including the caller information (used for usage stats by Astra DB's Data API), replaced by the new supplied values.

This is needed by the implementation of Graph Vector Store as a Retriever delegating to the vector store class:

new_vstore = vstore.copy(component_name="another_component_name")

The method does instantiate a new AstraDBVectorStore with dummy values and SetupMode=OFF to avoid triggering any actual work, then proceeds to replace all "private" members of the new instance to make it an exact copy, save for the "component name" changed fields. This has required a similar logic to be coded into the AstraDBEnvironment/CollectionEnvironment as well.
(The alternative of making room in the constructor for such a path was deemed too consequence-bearing to be pursued.)

Edit: method renamed from with_component_name to copy and range of accepted parameters expanded, see discussion in the PR

@hemidactylus hemidactylus requested a review from cbornet February 3, 2025 10:26
@@ -395,6 +398,24 @@ def __init__(
except ValueError as validation_error:
raise validation_error from data_api_exception

def with_component_name(self, component_name: str) -> _AstraDBCollectionEnvironment:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about having a copy(**kwargs) method ?

def copy(**kwargs):
    return _AstraDBCollectionEnvironment(
            collection_name=kwargs.get("collection_name", self.collection_name),
            token=kwargs.get("token", self.token)
            api_endpoint=kwargs.get("api_endpoint", self.api_endpoint),
            keyspace=kwargs.get("keyspace", self.keyspace),
            environment=kwargs.get("environment", self.environment),
            ext_callers=kwargs.get("ext_callers", self.ext_callers),
            component_name=kwargs.get("component_name", self.component_name),
            setup_mode=kwargs.get("setup_mode", SetupMode.OFF),
            collection_embedding_api_key=kwargs.get("collection_embedding_api_key", self.collection_embedding_api_key),
        )

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I did not generalize so much is that not many arguments (to the constructor) would lean correctly to this kind of "duplication with just some tweaks". For instance:

  • we hardcode setupMode to be OFF since we want this "copy" to not do any side-effect and be instantaneous (this is the main obstacle)
  • it would be strange, then, to expose the possibility of changing collection name since we don't offer a way to go through the usual create-collection flow. (For such operations, I suppose, we should only leave the "regular legitimate constructor" route)
  • ditto for api endpoint, environment

In my opinion, perhaps the case could be made for this "copy" to support on-the-fly change of just token, ext_callers, collection_embedding_api_key and component_name, probably nothing else would make it "safe" for usage by the coder from the outside.

But then, calling it "copy" seems a stretch and it could remain as is (possibly generalized to these 4 params and aliased if ever the need arose).

At least that was my reasoning, I'm happy to discuss it further!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I would still lean on calling it copy() and limiting the args that can be changed.
If you call copy() without args, that's effectively a shallow copy.
The question is then how to signal to the user which args are allowed.
Maybe it can be in the signature with meaning None is take value from original and not set it to None ?
I originally proposed kwargs because it could distinguish between "not set" and "set to None" but maybe "set to None" is not needed/allowed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a valid point. I am renaming to copy right now, with a modicum of docstring to highlight the details of behaviour.
Also the eternal "unset VS None" issue in this case does not bite since I agree that passing explicit None has no purpose whatsoever (esp. considering there are idiomatic ways to suppress token and/or API Key that do not involve passing None. I mention them in the new docstring).

@@ -723,6 +724,45 @@ def _select_relevance_score_fn(self) -> Callable[[float], float]:
# so here the final score transformation is not reversing the interval.
return lambda score: score

def with_component_name(self, component_name: str) -> AstraDBVectorStore:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also be copy(**kwargs)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(same reasoning as above)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm adapting this in line with above. ==> copy

Comment on lines +734 to +741
collection_name="moot",
api_endpoint="http://moot",
environment=Environment.OTHER,
namespace="moot",
setup_mode=SetupMode.OFF,
collection_vector_service_options=CollectionVectorServiceOptions(
provider="moot",
model_name="moot",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not passing the final values instead of moot directly ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intent here was to ensure future readers understand the nature of this "trick", that the instantiation is misused so to speak - exploiting a path in the constructor where nothing happens.

The concern was that passing the real values (which however need to be supplied with more member-setting right thereafter) might mislead readers.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exploiting a path in the constructor where nothing happens.
I am not very proud of this solution, to be honest. But it probably seems overkill to refactor everything else to make this little flow more straightforward.

@hemidactylus hemidactylus changed the title Add "with_component_name" method to AstraDBVectorStore Add "copy" method to AstraDBVectorStore Feb 4, 2025
Copy link
Collaborator

@cbornet cbornet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@hemidactylus hemidactylus merged commit d1e30f6 into main Feb 4, 2025
13 checks passed
@hemidactylus hemidactylus deleted the SL-with-component-name branch February 4, 2025 15:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants