fix(SemanticMemory): pre-load sqlite-vec before probe to stop vec0-table quarantine loop#227
Open
brianjones-v4n wants to merge 1 commit into
Conversation
…ables with missing modules The secondary probe-read in `SemanticMemory.open()` would treat "no such module: vec0" as corruption, quarantining the DB and producing an infinite ~60s recovery loop whenever the on-disk schema contained the `entity_embeddings` vec0 virtual table (observed ~1,400 quarantine files/day in a long-running deployment, ~610 MB peak). The bug is load-order, not missing-package: sqlite-vec is installed, but the probe runs synchronously in `open()` before any deferred extension load can happen. The next open finds the DB it just created, the vec0 table is still there, the probe fails again, and the cycle repeats. Two changes: 1. Pre-load `sqlite-vec` directly (not via `EmbeddingProvider`) immediately after `constructor(this.config.dbPath)`. The provider may not be attached at open() time, but the on-disk schema doesn't know about that runtime choice — the probe needs vec0 queryable regardless. 2. Per-table missing-module guard in the probe: if a row in `sqlite_master` is a `CREATE VIRTUAL TABLE` and the probe throws "no such module", skip it. Treating a missing extension as corruption is the wrong dispatch — the table isn't broken, its module just isn't here. Both changes are independently sufficient for the vec0 case. Together they cover sqlite-vec being unavailable at all (the guard handles it) and future virtual-table modules that follow the same pattern. Tests cover the load-order contract: - Healthy DB with vec0 virtual table opens without quarantine - Repeated opens accumulate zero quarantine artifacts - DB opens cleanly even with no embedding provider attached - Genuine probe-detected corruption (torn interior page) still quarantines — regression guard for the carve-out Red-green verified: with the source change reverted, the three new probe tests fail; with it applied, all 16 tests in the file pass (13 existing + 3 new). The full SemanticMemory unit suite (114 tests across 4 files) is green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@brianjones-v4n is attempting to deploy a commit to the sagemind Team on Vercel. A member of the Team first needs to authorize it. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
SemanticMemory.open()runs a synchronous probe-read on every existing user table after theintegrity_checkpragma. When the on-disk schema contains theentity_embeddingsvec0 virtual table — which is the steady-state shape for any deployment that's ever attached anEmbeddingProviderand calledinitializeVectorSearch()— the probe throwsSqliteError: no such module: vec0. The catch arm dispatches that as corruption, quarantines the DB, creates a marker file, and reopens a fresh empty DB. On the nextopen()the schema is regenerated, the vec0 table comes back asynchronously viainitializeVectorSearch, and the cycle repeats.Observed in production (long-running WSL/Linux deployment, instar 0.28.x):
The misleading framing is that sqlite-vec is missing. It isn't — the package is installed and
vec.load(db)succeeds when called. The bug is load-order: the probe runs before any path insideSemanticMemoryhas loaded the extension into the connection.Approach
Two changes inside
SemanticMemory.open():Pre-load
sqlite-vecdirectly after constructing the DB, before the integrity / probe block. Usesawait import('sqlite-vec')andvec.load(this.db)inline rather than going throughEmbeddingProvider.loadVecModule/loadVecExtensionbecause:EmbeddingProvidermay not be attached atopen()time (it's set viasetEmbeddingProvider(), which can happen before or after open per the existing dual-path).@silent-fallback-ok); if sqlite-vec is genuinely not installed, the per-table guard below handles it.Per-table missing-module guard inside the probe loop. The probe now also selects
sqlfromsqlite_masterand, when a row is aCREATE VIRTUAL TABLE, wraps its read in a try/catch that skips on/no such module/i. Other errors from a virtual table still propagate to the outer quarantine path — the carve-out is specifically scoped to "the module isn't loaded into this connection," which is not a corruption signal.Both changes are independently sufficient for the vec0 case. Together they're defense-in-depth: if a future deployment has sqlite-vec uninstalled but an old
entity_embeddingstable still on disk, the guard prevents quarantine; if the extension is available, the pre-load makes the probe actually exercise the virtual table.Files changed
src/memory/SemanticMemory.ts— pre-load + per-table guard, ~30 lines added insideopen()tests/unit/semantic-memory-corruption-recovery.test.ts— new `describe` block with four tests for the load-order contractTests
Four new tests in the existing corruption-recovery file:
setEmbeddingProvider+initializeVectorSearchpath (so the on-disk schema has a realCREATE VIRTUAL TABLE entity_embeddings USING vec0(...)row), closes, reopens, asserts no marker / corrupt files.setEmbeddingProvider, but the on-disk schema still has the vec0 table from a previous run. The pre-load + per-table guard must cover this.Red-green verified: with the source change stashed but the new tests applied, the three load-order tests fail ("promise rejected SqliteError" / probe throws "no such module: vec0"). With the source change applied, all 16 tests in the file pass (13 existing + 3 new affirmative + the regression guard which passes in both states).
Broader semantic-memory unit suite (114 tests across
semantic-memory.test.ts,semantic-memory-privacy.test.ts,semantic-memory-evidence.test.ts,SemanticMemory-invokeFromRemediator.test.ts): all pass with the change applied.Concerns I'd raise on my own PR
await import('sqlite-vec')happens on everyopen(), even though Node caches the module after the first resolution. Cost is dominated by the first call. Acceptable, but aprivate _vecModulePreloadedcache parallel toEmbeddingProvider._sqliteVecModulewould shave a microsecond if it ever matters.initVectorSearch/initializeVectorSearchpaths already surface this via_vectorAvailable; the open-time pre-load doesn't need to duplicate that signaling, but documenting the asymmetry inline might help the next reader.Notes
Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com