MetricType and distances

Notes on `MetricType` and distances

There are two primary methods supported by Faiss indices, L2 and inner product. Others are supported by IndexFlat.

For the full list of metrics, see here.

METRIC_L2

Faiss reports squared Euclidean (L2) distance, avoiding the square root. This is still monotonic as the Euclidean distance, but if exact distances are needed, an additional square root of the result is needed.

METRIC_INNER_PRODUCT

This is typically used for maximum inner product search. This is not by itself cosine similarity, unless the vectors are normalized (lie on the surface of a unit hypersphere; see cosine similarity below).

How can I index vectors for cosine similarity?

To do this:

build an index with METRIC_INNER_PRODUCT
normalize the vectors prior to adding them to the index (with faiss.normalize_L2 in Python)
normalize the vectors prior to searching them

Note that this is equivalent to using an index with METRIC_L2, except that the distances are related by d_L2^2 = 2 - 2 * d_IP.

Additional metrics

Additional metrics are only supported by IndexFlat, and soon by GpuIndexFlat.

METRIC_L1, METRIC_Linf and METRIC_Lp metrics are supported. METRIC_Lp includes use of Index::metric_arg (C++) / index.metric_arg (Python) to set the power.

METRIC_Canberra, METRIC_BrayCurtis and METRIC_JensenShannon are available as well. For Mahalanobis see below.

How can I index vectors for Mahalanobis distance?

The Mahalanobis distance is equivalent to L2 distance in a transformed space. To transform to that space:

compute the covariance matrix of the data
multiply all vectors (query and database) by the inverse of the Cholesky decomposition of the covariance matrix.
index in a METRIC_L2 index.

Example here: mahalnobis_to_L2.ipynb

How to whiten data with Faiss and compute Mahalnobis distance: demo_whitening.ipynb

How can I do max Inner Product search on indexes that support only L2?

Vectors can be transformed by adding one dimension so that max IP search becomes equivalent to L2 search. See Speeding up the xbox recommender system using a euclidean transformation for inner-product spaces, Bachrach et al, ACM conf on recommender systems, 2014 section 3 that transforms inner product computations into L2 distance computations. See an implementation here demo_IP_to_L2.ipynb.

Note however that while mathematically equivalent, this may not interact well with quantization (and possibly other index structures, see Non-metric similarity graphs for maximum inner product search, Morozov & Babenko, Neurips'18])