-
Notifications
You must be signed in to change notification settings - Fork 3.7k
MetricType and distances
There are two primary methods supported by Faiss indices, L2 and inner product. Others are supported by IndexFlat
.
For the full list of metrics, see here.
Faiss reports squared Euclidean (L2) distance, avoiding the square root. This is still monotonic as the Euclidean distance, but if exact distances are needed, an additional square root of the result is needed.
This metric is invariant to rotations of the data (orthonormal matrix transformations).
This is typically used for maximum inner product search in recommendation systems. The norm of the query vectors does not affect the ranking of results (of course the norm of the database vectors does matter). This is not by itself cosine similarity, unless the vectors are normalized (lie on the surface of a unit hypersphere; see cosine similarity below).
The cosine similarity between vectors
It is a similarity, not a distance, so one would typically search vectors with a larger similarity.
By normalizing query and database vectors beforehand, the problem can be mapped back to a maximum inner product search. To do this:
- build an index with
METRIC_INNER_PRODUCT
- normalize the vectors prior to adding them to the index (with
faiss.normalize_L2
in Python) - normalize the vectors prior to searching them
Note that this is equivalent to using an index with METRIC_L2
, except that the distances are related by
Additional metrics are supported by IndexFlat
, IndexHNSW
and GpuIndexFlat
.
METRIC_L1
, METRIC_Linf
and METRIC_Lp
metrics are supported. METRIC_Lp
includes use of Index::metric_arg
(C++) / index.metric_arg
(Python) to set the power.
METRIC_Canberra
, METRIC_BrayCurtis
and METRIC_JensenShannon
are available as well. For Mahalanobis see below.
We also have HammingComputer
that supports hamming distance
computation. Here is an example usage
The Mahalanobis distance is equivalent to L2 distance in a transformed space. To transform to that space:
- compute the covariance matrix of the data
- multiply all vectors (query and database) by the inverse of the Cholesky decomposition of the covariance matrix.
- index in a METRIC_L2 index.
Example here: mahalnobis_to_L2.ipynb
How to whiten data with Faiss and compute Mahalnobis distance: demo_whitening.ipynb
Vectors can be transformed by adding one dimension so that max IP search becomes equivalent to L2 search. See Speeding up the xbox recommender system using a euclidean transformation for inner-product spaces, Bachrach et al, ACM conf on recommender systems, 2014 section 3 that transforms inner product computations into L2 distance computations. See an implementation here demo_IP_to_L2.ipynb.
Note however that while mathematically equivalent, this may not interact well with quantization (and possibly other index structures, see Non-metric similarity graphs for maximum inner product search, Morozov & Babenko, Neurips'18)
The reverse transformation also uses an additional dimension to the vector, see Asymmetric Mapping Quantization for Nearest Neighbor Search, Hong et al, PAMI'20.
For cosine similarity and inner product, just query the opposite vector.
For L2, there is a trick that involves one additional dimension: demo_farthest_L2.ipynb
Faiss building blocks: clustering, PCA, quantization
Index IO, cloning and hyper parameter tuning
Threads and asynchronous calls
Inverted list objects and scanners
Indexes that do not fit in RAM
Brute force search without an index
Fast accumulation of PQ and AQ codes (FastScan)
Setting search parameters for one query
Binary hashing index benchmark