feat(walkthrough): dedicated Step 5 for semantic search

George-iam · George-iam · commit f3b96ebeee3a · 2026-05-13T12:33:08.000Z
User wanted full vs search trade-off as its own onboarding step, not a
subsection inside Step 4. New Step 5 'Semantic search — opt-in for large
knowledge bases' spells out:
  - Full mode = default, zero setup
  - Search mode = catalog-only + axme_search_kb + ~770 MB one-time
  - Recommendation table by KB size
  - Enable now button OR decide later (just skip)

Completion: onCommand:axme.enableSemanticSearch. firstChat.md's old
semantic-search subsection collapsed to a one-liner pointing to Step 5.
diff --git a/extension/package.json b/extension/package.json
@@ -128,6 +128,15 @@
             "completionEvents": [
               "onView:axme.monitor"
             ]
+          },
+          {
+            "id": "axme.step.semanticSearch",
+            "title": "Semantic search — opt-in for large knowledge bases",
+            "description": "AXME runs in **full mode** by default — every memory + decision body is loaded into the agent at session start. Zero setup, simple.\n\nFor larger KBs (>50 entries, or decisions with long rationale): **semantic search mode** loads only the catalog at startup, agent fetches bodies on demand via smart similarity search. Saves significant tokens.\n\nThe trade-off: a one-time ~770 MB download (`@huggingface/transformers` runtime). It's reversible any time.\n\n[Enable semantic search now](command:axme.enableSemanticSearch)\n\nLeave it for later? Just skip this step — full mode keeps working. You can enable from the sidebar (Knowledge base → Search mode) or `AXME: Enable semantic search` whenever.",
+            "media": { "markdown": "walkthroughs/semanticSearch.md" },
+            "completionEvents": [
+              "onCommand:axme.enableSemanticSearch"
+            ]
           }
         ]
       }
diff --git a/extension/walkthroughs/firstChat.md b/extension/walkthroughs/firstChat.md
@@ -55,24 +55,11 @@ project-specific rules or remove ones you don't need.
   background audits, last audit failed, or a recent handoff). Otherwise
   hidden.
 
-## Semantic search (opt-in)
+## Semantic search
 
-By default, AXME loads every memory + decision body into the agent's context
-at session start (**full mode**). Works great until your knowledge base grows
-past ~50 entries — then context bloat becomes a problem.
-
-**Semantic search mode** loads only the catalog (slug + title + 1-line
-description) at startup and exposes `axme_search_kb` so the agent fetches
-relevant bodies on demand. Saves significant tokens on large KBs.
-
-Enable from the sidebar's **Knowledge base** section (`Search mode: full →
-[Enable]` button) or via `AXME: Enable semantic search` command. The first
-enable downloads `@huggingface/transformers` (~770 MB) into
-`~/.local/share/axme-code/runtime/` and indexes every existing memory +
-decision. Subsequent re-enables are instant.
-
-Disable any time with the sidebar toggle or `AXME: Disable semantic search`.
-The runtime and the embeddings index stay on disk — re-enabling is fast.
+If your knowledge base grows past ~50 entries, switching from full mode to
+semantic search mode saves significant tokens. See **Step 5: Semantic
+search** in this walkthrough for the trade-offs and one-click enable.
 
 ## Power-user palette commands
 
diff --git a/extension/walkthroughs/semanticSearch.md b/extension/walkthroughs/semanticSearch.md
@@ -0,0 +1,59 @@
+# Semantic search — opt-in for large knowledge bases
+
+AXME has two modes for loading the knowledge base at session start.
+
+## Full mode (default — works out of the box)
+
+Every memory + decision body is loaded into the agent's context.
+
+- ✅ **Zero setup** — works immediately after `axme-code setup`
+- ✅ **Simple** — agent sees everything at startup, no extra tool call
+- ✅ **Best for small / medium KBs** (under ~50 entries)
+- ⚠️ **Context bloat on large KBs** — long decision rationales eat tokens
+  fast on Cursor's per-turn budget
+
+## Semantic search mode (opt-in)
+
+Loads only the **catalog** (titles + 1-line descriptions) at startup. The
+agent fetches full bodies on demand via `axme_search_kb` — semantic
+similarity search across memories and decisions.
+
+- ✅ **Major token savings** on large KBs (>50 entries, especially decisions
+  with long reasoning blocks)
+- ✅ **Smart fuzzy search** — "how did we handle auth?" finds relevant
+  entries by meaning, not by keyword match. The model
+  (`@huggingface/transformers` MiniLM) embeds each entry once and
+  compares vector distance to your query.
+- ⚠️ **One-time install**: `@huggingface/transformers@^4.0.1` lands in
+  `~/.local/share/axme-code/runtime/` — about **770 MB on Linux**
+  (smaller on macOS / Windows; the bulk is `onnxruntime-node` platform
+  prebuilts).
+- ⚠️ **Initial indexing** takes a few seconds (typical KB) to a couple
+  minutes (very large KB).
+- ✅ **Live re-embedding** — once enabled, every new save via
+  `axme_save_memory` / `axme_save_decision` auto-updates the index.
+
+## When to enable
+
+| KB size | Recommendation |
+|---|---|
+| Under 30 entries | Stick with full mode. The extra ~770 MB and indexing aren't worth it. |
+| 30–50 entries | Either works. Semantic search starts saving tokens; full still convenient. |
+| Over 50 entries | **Enable.** Token savings become significant. |
+| Decisions with long rationale bodies | Enable — full mode bloats context fastest here. |
+
+## Enable now or later — it's a non-irreversible decision
+
+You can switch any time:
+
+- **Sidebar**: Knowledge base section → `Search mode: full` row → click `Enable`
+- **Command Palette**: `AXME: Enable semantic search`
+- **CLI** (if you prefer terminal): `axme-code config set context.mode search`
+
+To switch back: same surfaces, `Disable` button or `AXME: Disable semantic
+search`. The runtime and the embeddings index stay on disk — re-enabling
+is instant after the first install.
+
+**This walkthrough step auto-completes when you click Enable.** If you
+choose to stay in full mode for now, just skip the step — everything still
+works.

Original file line number	Diff line number	Diff line change
`@@ -128,6 +128,15 @@`
`128`	`128`	`"completionEvents": [`
`129`	`129`	`"onView:axme.monitor"`
`130`	`130`	`]`
	`131`	`+ },`
	`132`	`+ {`
	`133`	`+ "id": "axme.step.semanticSearch",`
	`134`	`+ "title": "Semantic search — opt-in for large knowledge bases",`
	`135`	+ "description": "AXME runs in full mode by default — every memory + decision body is loaded into the agent at session start. Zero setup, simple.\n\nFor larger KBs (>50 entries, or decisions with long rationale): semantic search mode loads only the catalog at startup, agent fetches bodies on demand via smart similarity search. Saves significant tokens.\n\nThe trade-off: a one-time ~770 MB download (`@huggingface/transformers` runtime). It's reversible any time.\n\n[Enable semantic search now](command:axme.enableSemanticSearch)\n\nLeave it for later? Just skip this step — full mode keeps working. You can enable from the sidebar (Knowledge base → Search mode) or `AXME: Enable semantic search` whenever.",
	`136`	`+ "media": { "markdown": "walkthroughs/semanticSearch.md" },`
	`137`	`+ "completionEvents": [`
	`138`	`+ "onCommand:axme.enableSemanticSearch"`
	`139`	`+ ]`
`131`	`140`	`}`
`132`	`141`	`]`
`133`	`142`	`}`