FAISSCache._load_index calls read_index() on every invocation - no in-memory caching of the loaded index object

### Is there an existing issue for this?

- [x] I have searched the existing issues

### Bug summary

For a 4.8 GB HNSW index this costs ~2 s of disk I/O per query, making FAISS search indistinguishable from a flat scan in practice.

## Related issues

- `ef_search` from `HNSWIndexConfig` is never applied at search time  
- the FAISS default (16) happens to match the config default, so not an issue for now but should be addressed alongside this fix.
- `metadata.npz` is similarly reloaded on each call via `_load_meta`.

## Proposed fix

Add an instance-level cache keyed on `(metric_type, index_config)`. A single-slot cache (last-used tuple) is sufficient for sequential access patterns; a dict is safer for callers that alternate between index configs on the same instance.

```python
def __init__(self, ...):
    ...
    self._index_cache: dict[tuple, faiss.Index] = {}

def _load_index(self, metric_type, index_config):
    key = (metric_type, index_config)
    if key not in self._index_cache:
        path = self._faiss_path(metric_type, index_config)
        idx = faiss.read_index(str(path))
        if isinstance(index_config, HNSWIndexConfig):
            inner = idx
            while hasattr(inner, 'index') and not hasattr(inner, 'hnsw'):
                inner = inner.index
            if hasattr(inner, 'hnsw'):
                inner.hnsw.efSearch = index_config.ef_search
        self._index_cache[key] = idx
    return self._index_cache[key]
```

Note: the index object is wrapped in `IndexIDMap` on disk, so `ef_search` must be applied to the inner index after unwrapping.


At larger-scale coverage (~3.5M vectors, 1024-dim float32) the HNSW index would be ~15 GB. Holding that in process RAM via instance cache may not be acceptable in all deployments  

see the [FAISS on-disk indexes wiki](https://github.com/facebookresearch/faiss/wiki/Indexes-that-do-not-fit-in-RAM) re `IO_FLAG_MMAP_IFC` as a lower-RSS alternative for flat/HNSW indexes


### Code for reproduction

```python
## Root cause

`FAISSCache._load_index` (`storage/faiss/faiss_cache.py`):


def _load_index(self, metric_type, index_config):
    path = self._faiss_path(metric_type, index_config)
    index = faiss.read_index(str(path))   # called fresh every time
    return index
```

### Error messages

```python-traceback

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FAISSCache._load_index calls read_index() on every invocation - no in-memory caching of the loaded index object #24

Is there an existing issue for this?

Bug summary

Related issues

Proposed fix

Code for reproduction

Error messages

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

FAISSCache._load_index calls read_index() on every invocation - no in-memory caching of the loaded index object #24

Description

Is there an existing issue for this?

Bug summary

Related issues

Proposed fix

Code for reproduction

Error messages

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions