Skip to content

fix: use nonsuspicious index in lexicalFallbackSkills#711

Open
sethconvex wants to merge 5 commits intoopenclaw:mainfrom
sethconvex:fix/lexical-fallback-nonsuspicious-index
Open

fix: use nonsuspicious index in lexicalFallbackSkills#711
sethconvex wants to merge 5 commits intoopenclaw:mainfrom
sethconvex:fix/lexical-fallback-nonsuspicious-index

Conversation

@sethconvex
Copy link
Contributor

@sethconvex sethconvex commented Mar 11, 2026

Summary

  • lexicalFallbackSkills was scanning 500 full skill docs via by_active_updated and filtering isSuspicious in JS — same pattern we fixed in listPublicPageV2
  • When nonSuspiciousOnly is set, now uses by_nonsuspicious_updated index to skip suspicious rows at the DB level
  • Replaced eager take(500) with async iterator + early exit — stops scanning as soon as we have enough token matches, avoiding reading hundreds of unused full docs

Test plan

  • bunx convex typecheck passes
  • bun run test — all 13 search tests pass (including new index assertion test)
  • Deploy to prod and monitor lexicalFallbackSkills bandwidth in Convex insights

🤖 Generated with Claude Code

…idth

lexicalFallbackSkills was scanning 500 full skill docs via by_active_updated
and filtering isSuspicious in JS. When nonSuspiciousOnly is set, use
by_nonsuspicious_updated to skip suspicious rows at the index level.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@vercel
Copy link
Contributor

vercel bot commented Mar 11, 2026

Someone is attempting to deploy a commit to the Amantus Machina Team on Vercel.

A member of the Team first needs to authorize it.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 11, 2026

Greptile Summary

This PR correctly threads the by_nonsuspicious_updated index into lexicalFallbackSkills to mirror the optimization already applied in listPublicPageV2, and the index selection branch (lines 277–286) is sound. However, there is a critical logic regression on line 290: the JS fallback filter condition was inverted from args.nonSuspiciousOnly to !args.nonSuspiciousOnly, which silently drops suspicious skills from results whenever nonSuspiciousOnly is not set — the exact opposite of the intended and historically consistent behaviour.

  • Bug (search.ts:290)!args.nonSuspiciousOnly should be args.nonSuspiciousOnly (or the line removed entirely). Every other nonSuspiciousOnly guard in the codebase (filterPublicSkillPage, hydrateResults) follows if (args.nonSuspiciousOnly && isSkillSuspicious(…)), meaning suspicious skills are only excluded when explicitly requested. The inverted condition now filters suspicious skills in the default path and skips filtering when the flag is set (relying on the index), making the default-path behavior a regression.
  • Test gap (search.test.ts:114–130) — The new "uses by_active_updated" test only populates recentSkills with a clean skill, so the inverted condition passes undetected. Adding a suspicious skill and asserting it is returned (not filtered) when nonSuspiciousOnly is unset would catch this class of bug.

Confidence Score: 2/5

  • Not safe to merge — the inverted JS filter silently changes the observable results of lexicalFallbackSkills for all callers that do not pass nonSuspiciousOnly.
  • The index-selection logic is correct, but the inverted condition on line 290 is a clear behavioural regression: suspicious skills are now excluded from default (non-flagged) searches, reversing the semantics that are consistently upheld everywhere else in the codebase. The test suite does not catch it because the added tests never place a suspicious skill in the default path's fixture data.
  • convex/search.ts line 290 — the inverted !args.nonSuspiciousOnly condition needs to be corrected.

Last reviewed commit: 836c83f

convex/search.ts Outdated
for (const skill of recentSkills) {
if (seenSkillIds.has(skill._id)) continue
if (args.nonSuspiciousOnly && isSkillSuspicious(skill)) continue
if (!args.nonSuspiciousOnly && isSkillSuspicious(skill)) continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inverted condition silently drops suspicious skills in the default path

The negation !args.nonSuspiciousOnly flips the original semantics. In every other nonSuspiciousOnly guard in this codebase — filterPublicSkillPage (skills.ts:1894), the hydrateResults check (search.ts:230), and the original line here — the pattern is always:

if (args.nonSuspiciousOnly && isSkillSuspicious(skill)) continue // filter only when explicitly requested

With the current change, when a caller omits nonSuspiciousOnly (the common/default case):

  • !args.nonSuspiciousOnly!undefinedtrue
  • so isSkillSuspicious(skill) now causes a continue, silently excluding suspicious skills even though nobody asked for that

When nonSuspiciousOnly: true the condition is false, so the check is a no-op — that part is correct (the index already filtered). But the false/undefined case is now broken.

The correct fix is to remove the line entirely (both the index and the by_active_updated path no longer need a JS guard) or keep the original guard:

Suggested change
if (!args.nonSuspiciousOnly && isSkillSuspicious(skill)) continue
if (args.nonSuspiciousOnly && isSkillSuspicious(skill)) continue

The existing test at line 114 doesn't catch this because recentSkills only contains a clean skill; a suspicious skill would also pass through (incorrectly) without the test failing.

Prompt To Fix With AI
This is a comment left during a code review.
Path: convex/search.ts
Line: 290

Comment:
**Inverted condition silently drops suspicious skills in the default path**

The negation `!args.nonSuspiciousOnly` flips the original semantics. In every other `nonSuspiciousOnly` guard in this codebase — `filterPublicSkillPage` (skills.ts:1894), the `hydrateResults` check (search.ts:230), and the original line here — the pattern is always:

```ts
if (args.nonSuspiciousOnly && isSkillSuspicious(skill)) continue // filter only when explicitly requested
```

With the current change, when a caller omits `nonSuspiciousOnly` (the common/default case):
- `!args.nonSuspiciousOnly``!undefined``true`
- so `isSkillSuspicious(skill)` now causes a `continue`, **silently excluding suspicious skills even though nobody asked for that**

When `nonSuspiciousOnly: true` the condition is `false`, so the check is a no-op — that part is correct (the index already filtered). But the `false`/`undefined` case is now broken.

The correct fix is to **remove the line entirely** (both the index and the `by_active_updated` path no longer need a JS guard) or keep the original guard:

```suggestion
      if (args.nonSuspiciousOnly && isSkillSuspicious(skill)) continue
```

The existing test at line 114 doesn't catch this because `recentSkills` only contains a clean skill; a suspicious skill would also pass through (incorrectly) without the test failing.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in d378123 — corrected the inverted condition and added a test with a suspicious skill to cover the default path.

Comment on lines 114 to 130
it('uses by_active_updated index when nonSuspiciousOnly is not set', async () => {
const clean = makeSkillDoc({ id: 'skills:clean', slug: 'orf-clean', displayName: 'ORF Clean' })

const ctx = makeLexicalCtx({
exactSlugSkill: null,
recentSkills: [clean],
})

const result = await lexicalFallbackSkillsHandler(ctx, {
query: 'orf',
queryTokens: ['orf'],
limit: 10,
})

expect(result).toHaveLength(1)
expect(ctx._withIndexFn).toHaveBeenCalledWith('by_active_updated', expect.any(Function))
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test doesn't cover the suspicious-skill passthrough case

The test for the default (nonSuspiciousOnly unset) path only puts a clean skill in recentSkills, so it cannot detect whether suspicious skills are incorrectly filtered. Adding a suspicious skill and asserting it appears in the result would have caught the inverted condition on line 290:

it('uses by_active_updated index when nonSuspiciousOnly is not set', async () => {
  const clean = makeSkillDoc({ id: 'skills:clean', slug: 'orf-clean', displayName: 'ORF Clean' })
  const suspicious = makeSkillDoc({
    id: 'skills:sus',
    slug: 'orf-sus',
    displayName: 'ORF Sus',
    moderationFlags: ['flagged.suspicious'],
  })

  const ctx = makeLexicalCtx({ exactSlugSkill: null, recentSkills: [clean, suspicious] })
  const result = await lexicalFallbackSkillsHandler(ctx, {
    query: 'orf',
    queryTokens: ['orf'],
    limit: 10,
  })

  // suspicious skills are NOT filtered when nonSuspiciousOnly is unset
  expect(result).toHaveLength(2)
  expect(ctx._withIndexFn).toHaveBeenCalledWith('by_active_updated', expect.any(Function))
})
Prompt To Fix With AI
This is a comment left during a code review.
Path: convex/search.test.ts
Line: 114-130

Comment:
**Test doesn't cover the suspicious-skill passthrough case**

The test for the default (`nonSuspiciousOnly` unset) path only puts a clean skill in `recentSkills`, so it cannot detect whether suspicious skills are incorrectly filtered. Adding a suspicious skill and asserting it appears in the result would have caught the inverted condition on line 290:

```ts
it('uses by_active_updated index when nonSuspiciousOnly is not set', async () => {
  const clean = makeSkillDoc({ id: 'skills:clean', slug: 'orf-clean', displayName: 'ORF Clean' })
  const suspicious = makeSkillDoc({
    id: 'skills:sus',
    slug: 'orf-sus',
    displayName: 'ORF Sus',
    moderationFlags: ['flagged.suspicious'],
  })

  const ctx = makeLexicalCtx({ exactSlugSkill: null, recentSkills: [clean, suspicious] })
  const result = await lexicalFallbackSkillsHandler(ctx, {
    query: 'orf',
    queryTokens: ['orf'],
    limit: 10,
  })

  // suspicious skills are NOT filtered when nonSuspiciousOnly is unset
  expect(result).toHaveLength(2)
  expect(ctx._withIndexFn).toHaveBeenCalledWith('by_active_updated', expect.any(Function))
})
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in d378123 — added a suspicious skill to the test fixture and asserted it passes through when nonSuspiciousOnly is unset.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 836c83f419

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

convex/search.ts Outdated
for (const skill of recentSkills) {
if (seenSkillIds.has(skill._id)) continue
if (args.nonSuspiciousOnly && isSkillSuspicious(skill)) continue
if (!args.nonSuspiciousOnly && isSkillSuspicious(skill)) continue

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep suspicious skills when nonSuspiciousOnly is false

This condition is inverted relative to the nonSuspiciousOnly flag semantics used elsewhere in search (hydrateResults only filters when the flag is true). With the current code, the default path (nonSuspiciousOnly unset/false) now drops suspicious skills from lexical fallback, so queries that rely on fallback can silently miss matches even though the caller did not request filtering.

Useful? React with 👍 / 👎.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in d378123.

Instead of eagerly reading 500 full skill docs and filtering after,
use an async iterator to scan and match incrementally, stopping as
soon as we have enough results. This avoids reading docs we'll never use.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 09d093e90e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

convex/search.ts Outdated
if (!matchesExactTokens(args.queryTokens, [skill.displayName, skill.slug, skill.summary]))
continue
matched.push(skill)
if (matched.length >= limit) break

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Do not stop fallback scan before highlighted filtering

Breaking the scan as soon as matched.length >= limit can drop valid results when highlightedOnly is enabled, because the highlighted filter is applied later (filtered = args.highlightedOnly ? ...). If the first limit token matches are not highlighted, this loop exits early and returns too few (or zero) results even when highlighted matches exist later in the 500-row scan window; previously the function collected all scanned matches before applying post-filters.

Useful? React with 👍 / 👎.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in d378123 — removed the early break so the scan collects all matches before post-filtering.

Fetch all search results once (limit: 200) and paginate client-side
instead of re-running the full search pipeline on every "load more".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d41b253fcb

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

? !isSearching && displayCount < searchResults.length
: canLoadMoreList
const isLoadingMore = hasQuery ? isSearching && searchResults.length > 0 : isLoadingMoreList
const isLoadingMore = hasQuery ? false : isLoadingMoreList

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Allow repeated load-more calls for search results

Setting isLoadingMore to a constant false in search mode prevents the in-flight guard from ever being cleared after loadMore sets loadMoreInFlightRef.current = true. Because the reset effect only runs when isLoadingMore changes, query results can only load one extra page (25→50) and subsequent "Load more"/auto-load attempts are ignored whenever a search returns more than 50 items.

Useful? React with 👍 / 👎.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 17c308d — reset loadMoreInFlightRef immediately after setDisplayCount since search pagination is synchronous.

DangerouslyShip and others added 2 commits March 11, 2026 00:47
- Fix inverted condition that silently dropped suspicious skills in the
  default (non-filtered) path
- Remove early break that could exit before collecting enough results
  when highlightedOnly is enabled
- Add test with suspicious skill to cover the default path

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Since search mode pagination is synchronous (just bumping displayCount),
reset the in-flight guard right away so subsequent load-more calls aren't
blocked.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 17c308de90

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

highlightedOnly,
nonSuspiciousOnly,
limit: searchLimit,
limit: 200,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Remove hard cap that truncates search results at 200

This change makes every query fetch a fixed limit: 200, and combined with the new client-side pagination (displayCount < searchResults.length) users can never load results beyond the first 200 items. For broad queries that legitimately have more than 200 matches, the "Load more" flow now silently stops early, whereas the previous implementation increased the server limit and could continue fetching additional pages.

Useful? React with 👍 / 👎.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional — the total result set is <200 in practice (vector search returns up to 256 candidates, lexical scan up to 500 docs, but the intersection after scoring is small). The previous approach re-ran the entire search pipeline (embedding + vector search + lexical scan + hydration + scoring) on every load-more click, which was the perf problem this PR solves.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant