Is there an existing issue for this?
Bug summary
For a 4.8 GB HNSW index this costs ~2 s of disk I/O per query, making FAISS search indistinguishable from a flat scan in practice.
Related issues
ef_search from HNSWIndexConfig is never applied at search time
- the FAISS default (16) happens to match the config default, so not an issue for now but should be addressed alongside this fix.
metadata.npz is similarly reloaded on each call via _load_meta.
Proposed fix
Add an instance-level cache keyed on (metric_type, index_config). A single-slot cache (last-used tuple) is sufficient for sequential access patterns; a dict is safer for callers that alternate between index configs on the same instance.
def __init__(self, ...):
...
self._index_cache: dict[tuple, faiss.Index] = {}
def _load_index(self, metric_type, index_config):
key = (metric_type, index_config)
if key not in self._index_cache:
path = self._faiss_path(metric_type, index_config)
idx = faiss.read_index(str(path))
if isinstance(index_config, HNSWIndexConfig):
inner = idx
while hasattr(inner, 'index') and not hasattr(inner, 'hnsw'):
inner = inner.index
if hasattr(inner, 'hnsw'):
inner.hnsw.efSearch = index_config.ef_search
self._index_cache[key] = idx
return self._index_cache[key]
Note: the index object is wrapped in IndexIDMap on disk, so ef_search must be applied to the inner index after unwrapping.
At larger-scale coverage (~3.5M vectors, 1024-dim float32) the HNSW index would be ~15 GB. Holding that in process RAM via instance cache may not be acceptable in all deployments
see the FAISS on-disk indexes wiki re IO_FLAG_MMAP_IFC as a lower-RSS alternative for flat/HNSW indexes
Code for reproduction
## Root cause
`FAISSCache._load_index` (`storage/faiss/faiss_cache.py`):
def _load_index(self, metric_type, index_config):
path = self._faiss_path(metric_type, index_config)
index = faiss.read_index(str(path)) # called fresh every time
return index
Error messages
Is there an existing issue for this?
Bug summary
For a 4.8 GB HNSW index this costs ~2 s of disk I/O per query, making FAISS search indistinguishable from a flat scan in practice.
Related issues
ef_searchfromHNSWIndexConfigis never applied at search timemetadata.npzis similarly reloaded on each call via_load_meta.Proposed fix
Add an instance-level cache keyed on
(metric_type, index_config). A single-slot cache (last-used tuple) is sufficient for sequential access patterns; a dict is safer for callers that alternate between index configs on the same instance.Note: the index object is wrapped in
IndexIDMapon disk, soef_searchmust be applied to the inner index after unwrapping.At larger-scale coverage (~3.5M vectors, 1024-dim float32) the HNSW index would be ~15 GB. Holding that in process RAM via instance cache may not be acceptable in all deployments
see the FAISS on-disk indexes wiki re
IO_FLAG_MMAP_IFCas a lower-RSS alternative for flat/HNSW indexesCode for reproduction
Error messages