Skip to content

Commit

Permalink
Merge pull request #433 from dyashuni/filter_warning
Browse files Browse the repository at this point in the history
Add code comments that python filter works slow in multi-threaded mode
  • Loading branch information
yurymalkov authored Jan 15, 2023
2 parents d86f8f9 + 32f4b02 commit 2175362
Show file tree
Hide file tree
Showing 6 changed files with 11 additions and 5 deletions.
4 changes: 3 additions & 1 deletion .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,9 @@ jobs:

- name: Test
timeout-minutes: 15
run: python -m unittest discover -v --start-directory tests/python --pattern "bindings_test*.py"
run: |
python -m unittest discover -v --start-directory examples --pattern "example*.py"
python -m unittest discover -v --start-directory tests/python --pattern "bindings_test*.py"
test_cpp:
runs-on: ${{matrix.os}}
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ For other spaces use the nmslib library https://github.com/nmslib/nmslib.
* `knn_query(data, k = 1, num_threads = -1, filter = None)` make a batch query for `k` closest elements for each element of the
* `data` (shape:`N*dim`). Returns a numpy array of (shape:`N*k`).
* `num_threads` sets the number of cpu threads to use (-1 means use default).
* `filter` filters elements by its labels, returns elements with allowed ids
* `filter` filters elements by its labels, returns elements with allowed ids. Note that search with a filter works slow in python in multithreaded mode. It is recommended to set `num_threads=1`
* Thread-safe with other `knn_query` calls, but not with `add_items`.

* `load_index(path_to_index, max_elements = 0, allow_replace_deleted = False)` loads the index from persistence to the uninitialized index.
Expand Down
3 changes: 2 additions & 1 deletion examples/EXAMPLES.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,8 @@ print("Querying only even elements")
# Define filter function that allows only even ids
filter_function = lambda idx: idx%2 == 0
# Query the elements for themselves and search only for even elements:
labels, distances = hnsw_index.knn_query(data, k=1, filter=filter_function)
# Warning: search with python filter works slow in multithreaded mode, therefore we set num_threads=1
labels, distances = hnsw_index.knn_query(data, k=1, num_threads=1, filter=filter_function)
# labels contain only elements with even id
```

Expand Down
3 changes: 2 additions & 1 deletion examples/example_filter.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,5 +41,6 @@
# Define filter function that allows only even ids
filter_function = lambda idx: idx%2 == 0
# Query the elements for themselves and search only for even elements:
labels, distances = hnsw_index.knn_query(data, k=1, filter=filter_function)
# Warning: search with a filter works slow in python in multithreaded mode, therefore we set num_threads=1
labels, distances = hnsw_index.knn_query(data, k=1, num_threads=1, filter=filter_function)
# labels contain only elements with even id
1 change: 1 addition & 0 deletions python_bindings/bindings.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -623,6 +623,7 @@ class Index {
data_numpy_l = new hnswlib::labeltype[rows * k];
data_numpy_d = new dist_t[rows * k];

// Warning: search with a filter works slow in python in multithreaded mode. For best performance set num_threads=1
CustomFilterFunctor idFilter(filter);
CustomFilterFunctor* p_idFilter = filter ? &idFilter : nullptr;

Expand Down
3 changes: 2 additions & 1 deletion tests/python/bindings_test_filter.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,8 @@ def testRandomSelf(self):
print("Querying only even elements")
# Query the even elements for themselves and measure recall:
filter_function = lambda id: id%2 == 0
labels, distances = hnsw_index.knn_query(data, k=1, filter=filter_function)
# Warning: search with a filter works slow in python in multithreaded mode, therefore we set num_threads=1
labels, distances = hnsw_index.knn_query(data, k=1, num_threads=1, filter=filter_function)
self.assertAlmostEqual(np.mean(labels.reshape(-1) == np.arange(len(data))), .5, 3)
# Verify that there are only even elements:
self.assertTrue(np.max(np.mod(labels, 2)) == 0)
Expand Down

0 comments on commit 2175362

Please sign in to comment.