Skip to content

Add neighbors algorithm based on NSW graphs #143

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,3 +44,12 @@ Robust
robust.RobustWeightedClassifier
robust.RobustWeightedRegressor
robust.RobustWeightedKMeans

Neighbors
====================

.. autosummary::
:toctree: generated/
:template: class.rst

neighbors.NSWGraph
35 changes: 35 additions & 0 deletions doc/modules/nswgraph.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
.. _neighbors:

============================================================
Neighbors search with NSW graphs
============================================================
.. _nswgraph:
.. currentmodule:: sklearn_extra.neighbors


A navigable small-world graph is a type of mathematical graph in which most nodes are not neighbors of one another,
but the neighbors of any given node are likely to be neighbors of each other and most nodes can be reached
from every other node by some small number of hops or steps [1]_.
The number of steps regulates by the property which must be satisfied by the navigable small-world graph:

* The minimum number of edges that must be traversed to travel between two randomly chosen nodes grows proportionally to the logarithm of the number of nodes in the network [2]_.

:class:`NSWGraph` is the approximate nearest neighbor algorithm based on navigable small world graphs.
The algorithm tends to be more optimal in case of high-dimensional data [3]_ in comparison with
existing Scikit-Learn approximate nearest neighbor algorithms based on :class:`KDTree <sklearn.neighbors.KDTree>`
and :class:`BallTree <sklearn.neighbors.BallTree>`.

See `Scikit-Learn User-guide <https://scikit-learn.org/stable/modules/neighbors.html>`_
for more general information on Nearest Neighbors search.


.. topic:: References:

.. [1] Porter, Mason A. “Small-World Network.” Scholarpedia.
Available at: http://www.scholarpedia.org/article/Small-world_network.

.. [2] Kleinberg, Jon. "The small-world phenomenon and decentralized search." SiAM News 37.3 (2004): 1-2.

.. [3] Malkov, Y., Ponomarenko, A., Logvinov, A., & Krylov, V. (2014).
Approximate nearest neighbor algorithm based on navigable small world graphs.
Information Systems, 45, 61-68.
2 changes: 2 additions & 0 deletions doc/user_guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,5 @@ User guide
modules/cluster.rst
modules/robust.rst
modules/kernel_approximation.rst
modules/nswgraph.rst

6 changes: 6 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,12 @@
include_dirs=[np.get_include()],
language="c++",
),
Extension(
"sklearn_extra.neighbors._navigable_small_world_graph",
["sklearn_extra/neighbors/_navigable_small_world_graph.pyx"],
include_dirs=[np.get_include()],
language="c++",
),
]
),
"cmdclass": dict(build_ext=build_ext),
Expand Down
3 changes: 3 additions & 0 deletions sklearn_extra/neighbors/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from ._nswgraph import NSWGraph

__all__ = ["NSWGraph"]
59 changes: 59 additions & 0 deletions sklearn_extra/neighbors/_navigable_small_world_graph.pxd
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# distutils: language=c++
# NSWG-based ANN classification
# Authors: Lev Svalov <[email protected]>
# Stanislav Protasov <[email protected]>
# License: BSD 3 clause
import numpy as np
cimport numpy as np
np.import_array()
from libcpp.vector cimport vector
from libcpp.set cimport set as set_c
from libcpp.pair cimport pair as pair
from libcpp.queue cimport priority_queue
from libcpp cimport bool
ctypedef np.int_t ITYPE_t
ctypedef np.float64_t DTYPE_t
ctypedef bool BTYPE_t

cdef class BaseNSWGraph:
"""
Declaration of Cython additional class for the NSWGraph implementation
"""

# attributes declaration with types
cdef ITYPE_t dimension
cdef ITYPE_t regularity
cdef ITYPE_t guard_hops
cdef ITYPE_t attempts
cdef BTYPE_t quantize
cdef ITYPE_t quantization_levels
cdef ITYPE_t number_nodes
cdef DTYPE_t norm_factor
cdef vector[vector[DTYPE_t]] nodes
cdef vector[set_c[ITYPE_t]] neighbors
cdef vector[vector[DTYPE_t]] lookup_table
cdef vector[DTYPE_t] quantization_values


# methods declaration with types and non-utilization of Global Interpreter Lock (GIL)

cdef DTYPE_t eucl_dist(self, vector[DTYPE_t] v1, vector[DTYPE_t] v2) nogil

cdef priority_queue[pair[DTYPE_t, ITYPE_t]] delete_duplicate(self, priority_queue[pair[DTYPE_t, ITYPE_t]] queue) nogil

cdef void search_nsw_basic(self, vector[DTYPE_t] query,
set_c[ITYPE_t]* visitedSet,
priority_queue[pair[DTYPE_t, ITYPE_t]]* candidates,
priority_queue[pair[DTYPE_t, ITYPE_t]]* result,
ITYPE_t* res_hops,
ITYPE_t k) nogil

cdef void _build_navigable_graph(self, vector[vector[DTYPE_t]] values) nogil

cdef pair[vector[ITYPE_t], vector[DTYPE_t]] _multi_search(self, vector[DTYPE_t] query, ITYPE_t k) nogil

cdef vector[vector[DTYPE_t]] ndarray_to_vector_2(self, np.ndarray array)

cdef np.ndarray _get_quantized(self, np.ndarray vector)

cdef np.ndarray _quantization(self, np.ndarray data)
Loading