Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.Net M.E.VectorData: consider removing {Get,Delete,Upsert}RecordOptions (for now) #10172

Open
roji opened this issue Jan 13, 2025 · 2 comments · May be fixed by #10192
Open

.Net M.E.VectorData: consider removing {Get,Delete,Upsert}RecordOptions (for now) #10172

roji opened this issue Jan 13, 2025 · 2 comments · May be fixed by #10192
Assignees
Labels
.NET Issue or Pull requests regarding .NET code

Comments

@roji
Copy link
Member

roji commented Jan 13, 2025

Microsoft.Extensions.VectorData currently has the GetRecordOptions, DeleteRecordOptions and UpsertRecordOptions types, which are accepted by e.g. GetAsync, DeleteAsync, UpsertAsync (and their batching counterparts). DeleteRecordOptions and UpsertRecordOptions are empty, and GetRecordOptions contains only IncludeVectors.

AFAICT, these can serve two possible purposes:

  1. To allow providers to subclass these (e.g. MyDbDeleteRecordOptions), and add their own provider-specific options.
  2. To allow adding new universal options directly on these objects, which apply to all/most vector databases (like GetRecordOptions.IncludeVectors today).

Re the 1st function (provider-specific options), consuming code that wishes to use a provider-specific option will have to interact with provider-specific code in any case (i.e. with the subclass of *RecordOptions); therefore, it seems better to accept provider-specific options on a provider-specific overload exposed by the implementation, rather than having it on the abstraction. In addition, in many cases a provider-specific options object will already be exposed via the low-level SDK of the vector database; a provider-specific overload can simply accept that low-level options type directly, but the current abstraction method requires the options type to extend the *RecordOptions based types, which the low-level options type will never do. So to summarize, it seems better off to leave these provider-specific extension points out of the abstraction, and simply have implementations add whatever overloads make sense for them.

Re the 2nd function (universal options), while it's theoretically possible that we'd want to add new ones, we already have a very rich sample of vector databases, and it seems quite unlikely that we're missing such an option or that a new one would appear. So I'd propose we simplify the API by removing these extension points (am noting that Delete and Upsert are currently empty) - at least for now. Note that we can always re-introduce these in the future via a new, separate overload that accepts the options (this would throw by default; this isn't any worse from today's story, where if we introduce a new property into e.g. DeleteRecordOptions, it wouldn't be respected by all existing providers anyway). The only potential negative point there would be IncludeVectors, which would be an additional parameter on GetAsync (instead of on GetRecordOptions); but even for that case, adding a GetAsync overload alongside that with GetRecordOptions doesn't seem like it would be problematic.

/cc @westey-m

@markwallace-microsoft markwallace-microsoft added .NET Issue or Pull requests regarding .NET code triage labels Jan 13, 2025
@westey-m
Copy link
Contributor

The reason the three options classes were added originally was to easily be able to extend GetAsync, DeleteAsync and UpsertAsync with additional optional options in future if needed, but as you point out, nothing has really shown up that would be required for Delete and Upsert.

Also note, that we are following a pattern in the SDK where required parameters are on the signature but optional are delegated to an options class, with an exception only for CancellationToken.

I think for GetAsync, related to our recent discussed, adding IncludeData may also make sense, if we do something similar on vector & non-vector search, so I would not be that much in favour of removing GetRecordOptions.

@roji
Copy link
Member Author

roji commented Jan 14, 2025

I think for GetAsync, related to our recent discussed, adding IncludeData may also make sense, if we do something similar on vector & non-vector search, so I would not be that much in favour of removing GetRecordOptions.

Makes sense. So I'll submit a PR removing the options for Delete/Upsert (at least for now) and leave the Get options in place.

Also note, that we are following a pattern in the SDK where required parameters are on the signature but optional are delegated to an options class, with an exception only for CancellationToken.

OK. I'm not aware of a strict pattern like that e.g. in .NET API design - where we're reasonably sure there's a small number of option parameters, it seems OK to simply include them directly on the method rather than systematically adding option classes (which add to the concept count). But I agree makes sense for the vector search API, and so at the very least for consistency also for Get... We can always discuss case-by-case as needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
.NET Issue or Pull requests regarding .NET code
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants