Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjusts Seeded knn searches to clean up user and internal interfaces #14170

Merged
merged 7 commits into from
Feb 3, 2025

Conversation

benwtrent
Copy link
Member

@benwtrent benwtrent commented Jan 24, 2025

This is a bugfix and refactor for seeded knn searches.

First, Since we are using collectors, we don't actually need unique queries for every input type. Consequently, I have collapsed the two individual seeded queries into a single query that delegates to a provided kNN query. Then the collector manager is simply wrapped, so that the entry points can be provided.

Second, the interactions in the hnsw graph were not clear. Consequently, I did a minor refactor of HNSW searcher to have a "SeededSearcher", where instead of searching the graph for the entry points, it provides them directly.

Third, instead of continually overloading collectors, I opted to add a new "searchstrategy" value to KnnCollector. This way various strategies can be executed with different options. I think Seeded could eventually be replaced with something.

@benwtrent benwtrent added this to the 10.2.0 milestone Jan 27, 2025
Copy link

@john-wagster john-wagster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines 75 to 76
this.delegate = knnQuery;
this.seedWeight = seedWeight;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor/subjective: init order to match member declaration order

Suggested change
this.delegate = knnQuery;
this.seedWeight = seedWeight;
this.seedWeight = seedWeight;
this.delegate = knnQuery;

/**
* Iterator of valid entry points for the kNN search
*
* @return DocIdSetIterator of entry points, default is empty iterator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @return DocIdSetIterator of entry points, default is empty iterator
* @return DocIdSetIterator of entry points

/**
* Number of valid entry points for the kNN search
*
* @return number of entry points, default is 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @return number of entry points, default is 0
* @return number of entry points

private final int[] seedOrds;

static SeededHnswGraphSearcher fromEntryPoints(
AbstractHnswGraphSearcher delegate, int numEps, DocIdSetIterator eps, HnswGraph graph)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe

Suggested change
AbstractHnswGraphSearcher delegate, int numEps, DocIdSetIterator eps, HnswGraph graph)
AbstractHnswGraphSearcher delegate, int numEps, DocIdSetIterator eps, int graphSize)

Comment on lines 54 to 55
* @param parentBitSet The leaf parent bitset
* @param searchStrategy The search strategy to use
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @param parentBitSet The leaf parent bitset
* @param searchStrategy The search strategy to use
* @param searchStrategy The search strategy to use
* @param parentBitSet The leaf parent bitset

@benwtrent benwtrent merged commit e432161 into apache:main Feb 3, 2025
5 checks passed
@benwtrent benwtrent deleted the seed-refactor branch February 3, 2025 16:06
benwtrent added a commit that referenced this pull request Feb 3, 2025
…14170)

This is a bugfix and refactor for seeded knn searches.

First, Since we are using collectors, we don't actually need unique queries for every input type. Consequently, I have collapsed the two individual seeded queries into a single query that delegates to a provided kNN query. Then the collector manager is simply wrapped, so that the entry points can be provided.

Second, the interactions in the hnsw graph were not clear. Consequently, I did a minor refactor of HNSW searcher to have a "SeededSearcher", where instead of searching the graph for the entry points, it provides them directly.

Third, instead of continually overloading collectors, I opted to add a new "searchstrategy" value to KnnCollector. This way various strategies can be executed with different options. I think Seeded could eventually be replaced with something.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants