Skip to content

fix(workspace): scope projectPrefixStore.ListDocuments to this project#263

Open
kryptt wants to merge 1 commit into
yoanbernabeu:mainfrom
kryptt:pr/fix-workspace-list-documents
Open

fix(workspace): scope projectPrefixStore.ListDocuments to this project#263
kryptt wants to merge 1 commit into
yoanbernabeu:mainfrom
kryptt:pr/fix-workspace-list-documents

Conversation

@kryptt

@kryptt kryptt commented May 29, 2026

Copy link
Copy Markdown

Problem

projectPrefixStore.ListDocuments in cli/watch.go is a pure passthrough to the underlying shared vector store. In workspace mode the shared store holds every project's documents, so the indexer's scan-vs-stored diff (in indexer.IndexAllWithBatchProgress) compares this project's freshly-scanned files against the whole workspace's document set, treats every other project's document as "no longer present on disk", and runs RemoveFile against each one.

The removes are silent no-ops — projectPrefixStore.DeleteByFile re-adds the project prefix to the already-prefixed key, producing a doubly-prefixed path that never matches a stored row — so the database stays consistent. But two visible problems leak through:

  1. Misleading stats. Every per-project scan reports the workspace's total document count under "files removed":

    ```
    Initial scan complete: 0 files indexed, 0 chunks created,
    8173 files removed, 24 skipped (took 1.866s)
    ```

    when this project contains 24 of those 8173 documents and nothing is actually being removed.

  2. Wasted work. Thousands of Postgres roundtrips per scan as the indexer fires no-op DeleteByFile + DeleteDocument calls for every other project's file. With multiple workspace restarts (e.g. service restarts, container redeploys) the noise compounds.

I noticed this debugging a separate issue in a 7-project workspace; every scan was emitting "8173 files removed" with no underlying change.

Fix

ListDocuments now filters to entries whose path begins with this project's workspaceName/projectName/ prefix and strips the prefix before returning, so the indexer's relative-path bookkeeping aligns with what its scanner enumerates. Legitimate deletions still target the right rows because the surrounding DeleteByFile / DeleteDocument wrappers re-add the prefix.

Test plan

TestProjectPrefixStore_PassThroughAndGetChunks was updated to feed a mixed-workspace listing:

```go
listDocumentsResult: []string{
"ws/proj/main.go",
"ws/proj/sub/file.go",
"ws/other/main.go", // different project — filtered
"different-ws/proj/main.go", // different workspace — filtered
"unprefixed.go", // no prefix — filtered
},
```

The assertion now requires exactly [\"main.go\", \"sub/file.go\"] back — both filtered to this project and stripped of the prefix. The original test was asserting the buggy passthrough behaviour (len(docs) == 2 against [\"a\", \"b\"]) and would fail under the fix; both pieces now align on the new contract.

go test ./... -count=1 is clean.

projectPrefixStore.ListDocuments was a pure passthrough to the
underlying shared vector store, returning every document across every
project in the workspace. The indexer then computed its scan-vs-stored
diff against that workspace-wide set, treated every other project's
document as "no longer on disk", and ran RemoveFile against each one.

The removes silently no-op'd — projectPrefixStore.DeleteByFile re-adds
the project prefix to the already-prefixed path, producing a
doubly-prefixed key that never matches a stored row — so DB integrity
was preserved, but two side effects leaked through:

  * Misleading stats: every per-project scan reported the entire
    workspace's document count under "files removed", e.g.
      Initial scan complete: 0 files indexed, 0 chunks created,
        8173 files removed, 24 skipped (took 1.866s)
    when only 24 of the workspace's 8173 docs are in this project and
    nothing was actually removed.

  * Wasted work: thousands of Postgres roundtrips per scan as the
    indexer fires no-op deletes for every other project's file.

Fix: ListDocuments now filters to entries whose path begins with this
project's prefix and strips the prefix before returning, so the
indexer's relative-path bookkeeping lines up with what its scanner
enumerates. The delete-loop's RemoveFile call re-adds the prefix via
the existing wrapper, so legitimate removes still target the right
rows.

Test plan: TestProjectPrefixStore_PassThroughAndGetChunks updated to
feed a mixed-workspace listing and assert (a) only this project's
entries come back, and (b) the prefix is stripped. The previous
assertion (len(docs) == 2 against ["a", "b"]) was checking the buggy
passthrough behaviour and would fail under the fix; both pieces now
align on the new contract.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant