Skip to content

fix(search): make FTS sync idempotent to stop duplicate rows on backfill#61

Merged
tompscanlan merged 2 commits into
flo-bit:mainfrom
tompscanlan:fix/fts-no-duplicate-rows-on-backfill
Jun 17, 2026
Merged

fix(search): make FTS sync idempotent to stop duplicate rows on backfill#61
tompscanlan merged 2 commits into
flo-bit:mainfrom
tompscanlan:fix/fts-no-duplicate-rows-on-backfill

Conversation

@tompscanlan

@tompscanlan tompscanlan commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

Makes contrail's D1 full-text-search sync idempotent so repeated backfills stop accumulating duplicate FTS rows.

Root cause

buildFtsStatements only deleted the existing FTS row before inserting when the record was already in existingMap. Backfill runs with skipReplayDetection, which leaves existingMap empty, so every re-applied record looks brand-new and appends another row to the fts_<coll> virtual table (FTS5 has no uniqueness constraint). Re-running a backfill therefore accumulates duplicate FTS rows, and the search JOIN (JOIN fts ON fts.uri = r.uri) then fans each matching record out into one result row per duplicate.

Downstream this surfaced as a hard failure in an appview consumer (atmo): a keyed list ({#each results as r (r.uri)}) throws on the duplicate keys and blanks the page. The records table is unaffected because it uses INSERT ... ON CONFLICT(uri) DO UPDATE.

Fix

Make the delete-then-insert unconditional so FTS sync is idempotent regardless of replay detection, and drop the now-unused existingMap parameter from buildFtsStatements.

Testing

  • New failing test added first (TDD): re-applying the same record twice with skipReplayDetection returned 2 rows from queryRecords (RED), then 1 after the fix (GREEN).
  • Full @atmo-dev/contrail suite: 438 passed, 3 skipped (Postgres-only); typecheck clean.
  • Verified live against a D1 that had accumulated dupes: cleaned 653 duplicate fts_event rows, and the search JOIN for previously-fanning terms now returns rows == distinct_uris.

Note for existing indexes

Deployments that already accumulated duplicates need a one-time cleanup; this fix prevents new ones but does not retro-clean. Run it after deploying the fix, or duplicates re-accumulate.

There is one FTS5 virtual table per searchable collection, named fts_<short> (plus spaces_fts_<short> when spaces mode is used). List the ones that exist:

SELECT name FROM sqlite_master WHERE type='table' AND sql LIKE '%USING fts5%';

Then dedupe each, keeping the lowest rowid per uri. For the default events config that is a single table:

DELETE FROM fts_event WHERE rowid NOT IN (SELECT MIN(rowid) FROM fts_event GROUP BY uri);

(The fts5 shadow tables *_data / *_idx / *_docsize / *_config are maintained automatically; only delete from the virtual table itself.)

buildFtsStatements only deleted the existing FTS row before inserting when the record was already in existingMap. Backfill runs with skipReplayDetection, leaving existingMap empty, so re-applied records looked new and appended duplicate FTS rows. The FTS5 virtual table has no uniqueness constraint, so duplicates accumulated and the search JOIN fanned each event out into one row per duplicate, breaking keyed lists in the appview. Delete-then-insert is now unconditional.
@tompscanlan tompscanlan force-pushed the fix/fts-no-duplicate-rows-on-backfill branch from 25e29ce to 32ace91 Compare June 17, 2026 10:49
@tompscanlan tompscanlan changed the title fix(search): idempotent FTS sync + populate FTS for NSID-keyed collections fix(search): make FTS sync idempotent to stop duplicate rows on backfill Jun 17, 2026
… fields

The idempotent FTS sync returned early when buildFtsContent produced no
content, skipping the delete. An update that cleared every searchable field
therefore left the prior FTS row in place, so old terms kept matching through
the search JOIN. Run the delete unconditionally and gate only the re-insert on
content.
@tompscanlan tompscanlan marked this pull request as ready for review June 17, 2026 11:43
@tompscanlan tompscanlan merged commit 2b9baa7 into flo-bit:main Jun 17, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant