-
Notifications
You must be signed in to change notification settings - Fork 1.6k
feat(vector/hnsw): add per‑query ef and distance_threshold to similar_to, fix early termination #9514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Hey, Could you send a copy of the PR to https://github.com/predictable-labs/dgraph/pulls also? I am the author of the bugs that you have fixed. I have made a fork of the repo to be able to make changes myself. I have fixed some bugs in the vector index that makes it much faster. |
Cheers for the quick look. I’ve opened a mirror PR against your fork here: predictable-labs#17 Your branch is ahead of upstream so it shows conflicts in Tests for both parts are in |
|
@joelamming Thanks a lot for sending your changes here also. I can take a look and convert your changes. Basically we have introduced a new Partioned HNSW which is supposed to be much faster. We also figured out a race condition in the hnsw tree which led to less recall. We were able to improve accuracy significantly even without your changes in the new hnsw. |
|
@joelamming Thanks for this contribution, truly amazing. Could I ask that you create an integration test in /query/vector/vector_test.go that illustrates the new arguments/functionality? This will help with documentation and general understanding of this new functionality. Also, a few nits:
Thanks! |
|
Thanks for the nudge -- I’ve pushed a few follow-up commits:
While tracking down the integration failure I also found that Euclidean |
|
@matthewmcneely I see the CI failures from the GitHub Actions runs. I've identified and fixed the parser regression introduced by the The lexer changes to support Yesterday:
Today I'm running the full CI-equivalent suite on an ubuntu-noble-24.04-amd64 EC2 runner (same config as the Actions hosts). For each harness run I'm doing the standard
Please hold off on reviewing until I've confirmed everything is clean on the EC2 instance and I've finished pushing the fixes. Should have results later today. Will provide detailed docs on the architectural tradeoff in detail in the next push -- happy to discuss the approach once you have a chance to review. Thanks for your patience! |
|
@matthewmcneely CI tests all pass on my end after bumping to the 32GB instance
Ran tests:
Given the runner matches Actions (Ubuntu 24.04, 16 vCPU/32 GiB) I don’t anticipate surprises on CI Many thanks! |
|
Just to follow up on this -- the new commits are pushed and it looks like the CI workflows are just waiting for approval to run Thanks for your time! |
Hugely appreciative of the Dgraph team’s work. Native vector search integrated directly into a graph database is kind of a no brainer today. Deployed Dgraph (both vanilla and customised) in systems with 1M+ vectors guiding deep traversal queries across 10M+ nodes -- tight coupling of vector search with graph traversal at massive scale gets us closer to something that could represent the fuzzy nuances of everything in an enterprise. Certainly not the biggest deployment your team will have seen, but this PR fixes an under‑recall edge case in HNSW and introduces opt‑in, per‑query controls that let users dial recall vs latency safely and predictably. I’ve had this running in production for a while and thought it worth proposing to main.
Summary
efanddistance_threshold(string or JSON‑like fourth argument).Motivation
efmeant recall vs latency trade‑offs required global tuning or inflating k (and downstream work).Changes (key files)
tok/hnsw/persistent_hnsw.go: fix early termination, addSearchWithOptions/SearchWithUidAndOptions, applyefoverride at upper layers andmax(k, ef)at bottom layer, applydistance_thresholdin the metric domain (Euclidean squared internally, cosine as 1 − sim).tok/index/index.go: addVectorIndexOptionsandOptionalSearchOptions(non‑breaking).worker/task.go: parse optional fourth argument tosimilar_to(ef,distance_threshold), thread options, route to optional methods when provided, guard zero/negative k.tok/index/search_path.go: addSearchPathResulthelper.tok/hnsw/ef_recall_test.goaddsTestHNSWSearchEfOverrideImprovesRecallTestHNSWDistanceThreshold_EuclideanTestHNSWDistanceThreshold_CosineCHANGELOG.md: Unreleased entry for HNSW fix and per‑query options.Backwards compatibility
similar_to(attr, k, vector_or_uid)is unchanged.efanddistance_thresholdare optional, unsupported metrics safely ignore the threshold.Performance
ef, bottom‑layer candidate size becomesmax(k, ef)(as in HNSW), cost scales accordingly.Rationale and alignment
ef_searchcontrols exploration/recall,kcontrols output size.efanddistance_thresholdsemantics for familiarity.Checklist
CHANGELOG.mddescribing this PR