feat: introduce /recall skill for reliable memory retrieval #4

bigph00t · 2026-01-25T08:56:07Z

Summary

Adds /recall skill for token-efficient memory retrieval using 2-step workflow
Step 1: Search returns index with IDs/titles (~100 tokens/result)
Step 2: Fetch full observations by ID (~500 tokens/result)
Adds /api/recall endpoint to worker service

Skill Invocation

As plugin skill: /claude-mem:recall
For shorter /recall: copy to ~/.claude/skills/recall/SKILL.md

Test plan

Tested /recall skill discovery
Tested search API endpoint
Tested recall API endpoint

🤖 Generated with Claude Code

Replace MCP subprocess approach with persistent Chroma HTTP server for improved performance and reliability. This re-enables Chroma on Windows by eliminating the subprocess spawning that caused console popups. Changes: - NEW: ChromaServerManager.ts - Manages local Chroma server lifecycle via `npx chroma run` - REFACTOR: ChromaSync.ts - Uses chromadb npm package's ChromaClient instead of MCP subprocess (removes Windows disabling) - UPDATE: worker-service.ts - Starts Chroma server on initialization - UPDATE: GracefulShutdown.ts - Stops Chroma server on shutdown - UPDATE: SettingsDefaultsManager.ts - New Chroma configuration options - UPDATE: build-hooks.js - Mark optional chromadb deps as external Benefits: - Eliminates subprocess spawn latency on first query - Single server process instead of per-operation subprocesses - No Python/uvx dependency for local mode - Re-enables Chroma vector search on Windows - Future-ready for cloud-hosted Chroma (claude-mem pro) - Cross-platform: Linux, macOS, Windows Configuration: CLAUDE_MEM_CHROMA_MODE=local|remote CLAUDE_MEM_CHROMA_HOST=127.0.0.1 CLAUDE_MEM_CHROMA_PORT=8000 CLAUDE_MEM_CHROMA_SSL=false Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Updated chromadb from ^1.9.2 to ^3.2.2 (includes CLI binary) - Changed heartbeat endpoint from /api/v1 to /api/v2 The 1.9.x version did not include the CLI, causing `npx chroma run` to fail. Version 3.2.2 includes the chroma CLI and uses the v2 API. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Added @chroma-core/default-embed dependency for local embeddings - Updated ChromaSync to use DefaultEmbeddingFunction with collections - Added isServerReachable() async method for reliable server detection - Fixed start() to detect and reuse existing Chroma servers - Updated build script to externalize native ONNX binaries - Added runtime dependency to plugin/package.json The embedding function uses all-MiniLM-L6-v2 model locally via ONNX, eliminating need for external embedding API calls. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Wire tenant, database, and API key settings into ChromaSync for remote/pro mode. In remote mode: - Passes tenant and database to ChromaClient for data isolation - Adds Authorization header when API key is configured - Logs tenant isolation connection details Local mode unchanged - uses default_tenant without explicit params. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

bigph00t · 2026-01-25T09:00:49Z

What the `/recall` skill does

The skill bundles the MCP 3-layer search workflow into a simpler prompted 2-step process:

Search → Returns compact index with IDs (~100 tokens/result)
Fetch → Claude decides which IDs are relevant, then retrieves full details (~500 tokens/result)

The key innovation is that it's prompted - the SKILL.md file teaches Claude:

WHEN to use memory (questions about past work, missing context, repeated tasks)
HOW to execute token-efficiently (filter before fetching)

So when a user asks "How did we fix the rate limiting issue?", Claude automatically:

Searches memory for "rate limiting"
Reviews the index results
Uses judgment to pick the 2-3 most relevant observations
Fetches only those, saving ~10x tokens vs loading everything

The skill is bundled with claude-mem as /claude-mem:recall, or users can copy it to ~/.claude/skills/recall/ to get the shorter /recall command.

greptile-apps · 2026-01-25T09:02:50Z

Greptile Overview

Greptile Summary

This PR introduces the /recall skill for token-efficient memory retrieval using a 2-step workflow that significantly improves how Claude accesses past work.

Key Changes:

New /recall Skill: Added skill documentation (plugin/skills/recall/SKILL.md) that teaches Claude when and how to retrieve memories efficiently through a prompted 2-step process
New /api/recall Endpoint: Added SearchManager.recall() method and corresponding HTTP route to fetch full observations by IDs
Chroma Architecture Migration: Migrated from MCP client to direct HTTP client using chromadb npm package and ChromaClient
Local Server Management: New ChromaServerManager singleton manages Chroma HTTP server lifecycle via npx chroma run
Multi-tenancy Support: Added configuration settings for tenant isolation to support future cloud/pro features
Context Prompt Updates: Modified formatters to guide Claude to use /recall instead of MCP tools

2-Step Workflow:

Search returns index with IDs/titles (~100 tokens/result)
Claude judges relevance and fetches full details for selected IDs (~500 tokens/result)

This achieves ~10x token savings compared to bulk loading while maintaining full context retrieval capabilities.

Architecture Improvements:

Removes Windows console popup issues from MCP subprocess spawning
Enables future cloud vector search for pro users
Maintains backward compatibility with existing search infrastructure

Confidence Score: 4/5

This PR is safe to merge with good architectural improvements, though the large Chroma migration should be monitored in production
Score reflects solid implementation of the /recall skill and API endpoint (5/5 quality), but the substantial Chroma migration from MCP to HTTP introduces architectural changes that warrant production monitoring. The code quality is excellent with proper error handling, cross-platform support, and graceful degradation. The 2-step workflow is well-designed and the skill documentation is clear. Minor concern is the large refactor of ChromaSync affecting vector search reliability.
Monitor src/services/sync/ChromaSync.ts and src/services/sync/ChromaServerManager.ts in production to ensure the HTTP-based Chroma connection is stable across platforms. The rest of the changes are straightforward additions.

Important Files Changed

Filename	Overview
plugin/skills/recall/SKILL.md	Added `/recall` skill documentation with clear 2-step workflow instructions for token-efficient memory retrieval
src/services/worker/SearchManager.ts	Added `recall()` method to fetch observations by IDs with proper validation and formatted markdown output
src/services/worker/http/routes/SearchRoutes.ts	Added `/api/recall` endpoint handler that delegates to `SearchManager.recall()`
src/services/sync/ChromaServerManager.ts	New singleton manager for Chroma HTTP server lifecycle with cross-platform process management
src/services/sync/ChromaSync.ts	Migrated from MCP client to direct HTTP ChromaClient with tenant isolation support for future pro features
src/services/worker-service.ts	Integrated ChromaServerManager to start local Chroma server during worker initialization

Sequence Diagram

sequenceDiagram
    participant User
    participant Claude
    participant RecallSkill as /recall Skill
    participant SearchAPI as /api/search
    participant RecallAPI as /api/recall
    participant SearchMgr as SearchManager
    participant SessionStore as SessionStore
    participant ChromaDB as ChromaDB

    User->>Claude: "How did we fix the auth bug?"
    Claude->>RecallSkill: Invokes /recall skill
    
    Note over Claude,RecallAPI: Step 1: Search for relevant memories
    Claude->>SearchAPI: GET /api/search?query=auth+bug&limit=15
    SearchAPI->>SearchMgr: search(query, limit)
    SearchMgr->>ChromaDB: queryChroma(query, 100)
    ChromaDB-->>SearchMgr: vector search results (IDs only)
    SearchMgr->>SessionStore: Get metadata for IDs
    SessionStore-->>SearchMgr: titles, types, dates
    SearchMgr-->>SearchAPI: Index (~100 tokens/result)
    SearchAPI-->>Claude: [ID, Title, Type, Date] list
    
    Note over Claude: Claude reviews index and<br/>decides which memories<br/>are relevant
    
    Note over Claude,RecallAPI: Step 2: Fetch full details for selected IDs
    Claude->>RecallAPI: GET /api/recall?ids=234,567,891
    RecallAPI->>SearchMgr: recall([234,567,891])
    SearchMgr->>SessionStore: getObservationsByIds([234,567,891])
    SessionStore-->>SearchMgr: Full observation records
    SearchMgr-->>RecallAPI: Formatted markdown (~500 tokens/result)
    RecallAPI-->>Claude: Full memory narratives
    
    Claude->>User: Answers question using<br/>retrieved context

The 3-layer memory system (search → timeline → get_observations) is powerful but wasn't being used due to friction and poor prompting. This PR wraps the workflow into a single /recall skill that: 1. Prompts Claude with clear trigger conditions for when to use it 2. Bundles the 2-step workflow (search index → fetch selected IDs) 3. Preserves token efficiency by letting Claude decide what to fetch 4. Updates context injection with assertive prompting Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Move from plugin/commands/recall.md to plugin/skills/recall/SKILL.md for proper Claude Code skill discovery. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add instructions for getting the shorter /recall command by copying to personal skills directory. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Added comprehensive documentation for the /recall skill: - Explains the 2-step prompted workflow (search → fetch) - Documents token efficiency (~10x savings) - Shows invocation options (/recall vs /claude-mem:recall) - Includes installation instructions for personal skills Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

bigphoot and others added 6 commits January 23, 2026 23:07

Update src/services/sync/ChromaServerManager.ts

f2c1000

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

fix: Remove duplicate else block from merge

e6c99e2

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

bigphoot and others added 4 commits January 25, 2026 01:18

refactor: move recall skill to proper skills directory

7425d71

Move from plugin/commands/recall.md to plugin/skills/recall/SKILL.md for proper Claude Code skill discovery. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

docs: add installation note for /recall skill invocation

aa9e781

Add instructions for getting the shorter /recall command by copying to personal skills directory. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

bigph00t force-pushed the feature/recall-skill branch from bb75919 to 4108e57 Compare January 25, 2026 09:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: introduce /recall skill for reliable memory retrieval #4

feat: introduce /recall skill for reliable memory retrieval #4

Uh oh!

bigph00t commented Jan 25, 2026

Uh oh!

bigph00t commented Jan 25, 2026

Uh oh!

greptile-apps bot commented Jan 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: introduce /recall skill for reliable memory retrieval #4

Are you sure you want to change the base?

feat: introduce /recall skill for reliable memory retrieval #4

Uh oh!

Conversation

bigph00t commented Jan 25, 2026

Summary

Skill Invocation

Test plan

Uh oh!

bigph00t commented Jan 25, 2026

What the /recall skill does

Uh oh!

greptile-apps bot commented Jan 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Overview

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

What the `/recall` skill does

greptile-apps bot commented Jan 25, 2026 •

edited

Loading