Skip to content

fix(SemanticMemory): pre-load sqlite-vec before probe to stop vec0-table quarantine loop#227

Open
brianjones-v4n wants to merge 1 commit into
JKHeadley:mainfrom
brianjones-v4n:fix/semantic-memory-vec0-probe-load-order
Open

fix(SemanticMemory): pre-load sqlite-vec before probe to stop vec0-table quarantine loop#227
brianjones-v4n wants to merge 1 commit into
JKHeadley:mainfrom
brianjones-v4n:fix/semantic-memory-vec0-probe-load-order

Conversation

@brianjones-v4n
Copy link
Copy Markdown
Contributor

Problem

SemanticMemory.open() runs a synchronous probe-read on every existing user table after the integrity_check pragma. When the on-disk schema contains the entity_embeddings vec0 virtual table — which is the steady-state shape for any deployment that's ever attached an EmbeddingProvider and called initializeVectorSearch() — the probe throws SqliteError: no such module: vec0. The catch arm dispatches that as corruption, quarantines the DB, creates a marker file, and reopens a fresh empty DB. On the next open() the schema is regenerated, the vec0 table comes back asynchronously via initializeVectorSearch, and the cycle repeats.

Observed in production (long-running WSL/Linux deployment, instar 0.28.x):

  • Quarantine cadence: every ~60s. ~1,400 marker files/day.
  • Peak accumulation: 4,902 quarantined DB copies + matching marker files ≈ 610 MB on disk.
  • Symptom from the agent's view: "vec0 missing" health-check pings every 30 minutes. Health remained "degraded" continuously for 18+ hours before someone deleted the marker files; deletion does nothing because the loop runs forever and the markers regenerate within minutes.

The misleading framing is that sqlite-vec is missing. It isn't — the package is installed and vec.load(db) succeeds when called. The bug is load-order: the probe runs before any path inside SemanticMemory has loaded the extension into the connection.

Approach

Two changes inside SemanticMemory.open():

  1. Pre-load sqlite-vec directly after constructing the DB, before the integrity / probe block. Uses await import('sqlite-vec') and vec.load(this.db) inline rather than going through EmbeddingProvider.loadVecModule/loadVecExtension because:

    • EmbeddingProvider may not be attached at open() time (it's set via setEmbeddingProvider(), which can happen before or after open per the existing dual-path).
    • The probe needs vec0 queryable regardless of whether vector search is wired up for this session — the on-disk schema doesn't know about that runtime choice.
    • Failure is non-fatal (wrapped in a quiet try/catch with @silent-fallback-ok); if sqlite-vec is genuinely not installed, the per-table guard below handles it.
  2. Per-table missing-module guard inside the probe loop. The probe now also selects sql from sqlite_master and, when a row is a CREATE VIRTUAL TABLE, wraps its read in a try/catch that skips on /no such module/i. Other errors from a virtual table still propagate to the outer quarantine path — the carve-out is specifically scoped to "the module isn't loaded into this connection," which is not a corruption signal.

Both changes are independently sufficient for the vec0 case. Together they're defense-in-depth: if a future deployment has sqlite-vec uninstalled but an old entity_embeddings table still on disk, the guard prevents quarantine; if the extension is available, the pre-load makes the probe actually exercise the virtual table.

Files changed

  • src/memory/SemanticMemory.ts — pre-load + per-table guard, ~30 lines added inside open()
  • tests/unit/semantic-memory-corruption-recovery.test.ts — new `describe` block with four tests for the load-order contract

Tests

Four new tests in the existing corruption-recovery file:

  • does not quarantine a healthy DB that contains a vec0 virtual table — seeds a DB via the normal setEmbeddingProvider + initializeVectorSearch path (so the on-disk schema has a real CREATE VIRTUAL TABLE entity_embeddings USING vec0(...) row), closes, reopens, asserts no marker / corrupt files.
  • opening repeatedly does not accumulate quarantine artifacts (no probe-loop regression) — opens the seeded DB three times in succession, asserts zero accumulation. This is the regression test for the production symptom.
  • opens cleanly with no embedding provider attached — exercises the deferred-attachment shape: a session opens the DB without ever calling setEmbeddingProvider, but the on-disk schema still has the vec0 table from a previous run. The pre-load + per-table guard must cover this.
  • genuine probe-detected corruption still quarantines — regression guard for the carve-out. A partially corrupt DB (valid header, bad interior page on a normal table) must still be quarantined. Verifies we didn't accidentally widen the probe's tolerance.

Red-green verified: with the source change stashed but the new tests applied, the three load-order tests fail ("promise rejected SqliteError" / probe throws "no such module: vec0"). With the source change applied, all 16 tests in the file pass (13 existing + 3 new affirmative + the regression guard which passes in both states).

Broader semantic-memory unit suite (114 tests across semantic-memory.test.ts, semantic-memory-privacy.test.ts, semantic-memory-evidence.test.ts, SemanticMemory-invokeFromRemediator.test.ts): all pass with the change applied.

Concerns I'd raise on my own PR

  • Async import on every open. await import('sqlite-vec') happens on every open(), even though Node caches the module after the first resolution. Cost is dominated by the first call. Acceptable, but a private _vecModulePreloaded cache parallel to EmbeddingProvider._sqliteVecModule would shave a microsecond if it ever matters.
  • Pre-load swallows non-missing-module errors silently. A genuine load failure (corrupted sqlite-vec install, ABI mismatch) is swallowed by the outer try/catch. The per-table guard inside the probe will then take the hit and skip the vec0 table, which is the right behavior for the probe but means the operator gets no signal that vector search is broken. The existing initVectorSearch / initializeVectorSearch paths already surface this via _vectorAvailable; the open-time pre-load doesn't need to duplicate that signaling, but documenting the asymmetry inline might help the next reader.
  • The per-table missing-module guard catches the symptom, not the cause. If a future change adds another async-loaded virtual-table module (e.g. someone wires up an SQLite FTS plugin that isn't built into the linked SQLite), they'll either need to add another pre-load here or rely on the guard to silently skip the table. The guard is correct but quietly grows the surface of "things that look like missing modules." Not a blocker; flagged because it's the kind of thing Echo's reviews tend to surface in concerns Intelligence dispatch system: broadcast guidance to agents #2.

Notes

Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com

…ables with missing modules

The secondary probe-read in `SemanticMemory.open()` would treat
"no such module: vec0" as corruption, quarantining the DB and
producing an infinite ~60s recovery loop whenever the on-disk
schema contained the `entity_embeddings` vec0 virtual table
(observed ~1,400 quarantine files/day in a long-running
deployment, ~610 MB peak).

The bug is load-order, not missing-package: sqlite-vec is
installed, but the probe runs synchronously in `open()` before
any deferred extension load can happen. The next open finds the
DB it just created, the vec0 table is still there, the probe
fails again, and the cycle repeats.

Two changes:

1. Pre-load `sqlite-vec` directly (not via `EmbeddingProvider`)
   immediately after `constructor(this.config.dbPath)`. The
   provider may not be attached at open() time, but the on-disk
   schema doesn't know about that runtime choice — the probe
   needs vec0 queryable regardless.

2. Per-table missing-module guard in the probe: if a row in
   `sqlite_master` is a `CREATE VIRTUAL TABLE` and the probe
   throws "no such module", skip it. Treating a missing
   extension as corruption is the wrong dispatch — the table
   isn't broken, its module just isn't here.

Both changes are independently sufficient for the vec0 case.
Together they cover sqlite-vec being unavailable at all (the
guard handles it) and future virtual-table modules that follow
the same pattern.

Tests cover the load-order contract:

- Healthy DB with vec0 virtual table opens without quarantine
- Repeated opens accumulate zero quarantine artifacts
- DB opens cleanly even with no embedding provider attached
- Genuine probe-detected corruption (torn interior page) still
  quarantines — regression guard for the carve-out

Red-green verified: with the source change reverted, the three
new probe tests fail; with it applied, all 16 tests in the file
pass (13 existing + 3 new). The full SemanticMemory unit suite
(114 tests across 4 files) is green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 15, 2026

@brianjones-v4n is attempting to deploy a commit to the sagemind Team on Vercel.

A member of the Team first needs to authorize it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant