Skip to content

Conversation

@bigph00t
Copy link
Owner

Summary

  • Adds /recall skill for token-efficient memory retrieval using 2-step workflow
  • Step 1: Search returns index with IDs/titles (~100 tokens/result)
  • Step 2: Fetch full observations by ID (~500 tokens/result)
  • Adds /api/recall endpoint to worker service

Skill Invocation

  • As plugin skill: /claude-mem:recall
  • For shorter /recall: copy to ~/.claude/skills/recall/SKILL.md

Test plan

  • Tested /recall skill discovery
  • Tested search API endpoint
  • Tested recall API endpoint

🤖 Generated with Claude Code

bigphoot and others added 6 commits January 23, 2026 23:07
Replace MCP subprocess approach with persistent Chroma HTTP server for
improved performance and reliability. This re-enables Chroma on Windows
by eliminating the subprocess spawning that caused console popups.

Changes:
- NEW: ChromaServerManager.ts - Manages local Chroma server lifecycle
  via `npx chroma run`
- REFACTOR: ChromaSync.ts - Uses chromadb npm package's ChromaClient
  instead of MCP subprocess (removes Windows disabling)
- UPDATE: worker-service.ts - Starts Chroma server on initialization
- UPDATE: GracefulShutdown.ts - Stops Chroma server on shutdown
- UPDATE: SettingsDefaultsManager.ts - New Chroma configuration options
- UPDATE: build-hooks.js - Mark optional chromadb deps as external

Benefits:
- Eliminates subprocess spawn latency on first query
- Single server process instead of per-operation subprocesses
- No Python/uvx dependency for local mode
- Re-enables Chroma vector search on Windows
- Future-ready for cloud-hosted Chroma (claude-mem pro)
- Cross-platform: Linux, macOS, Windows

Configuration:
  CLAUDE_MEM_CHROMA_MODE=local|remote
  CLAUDE_MEM_CHROMA_HOST=127.0.0.1
  CLAUDE_MEM_CHROMA_PORT=8000
  CLAUDE_MEM_CHROMA_SSL=false

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Updated chromadb from ^1.9.2 to ^3.2.2 (includes CLI binary)
- Changed heartbeat endpoint from /api/v1 to /api/v2

The 1.9.x version did not include the CLI, causing `npx chroma run` to fail.
Version 3.2.2 includes the chroma CLI and uses the v2 API.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Added @chroma-core/default-embed dependency for local embeddings
- Updated ChromaSync to use DefaultEmbeddingFunction with collections
- Added isServerReachable() async method for reliable server detection
- Fixed start() to detect and reuse existing Chroma servers
- Updated build script to externalize native ONNX binaries
- Added runtime dependency to plugin/package.json

The embedding function uses all-MiniLM-L6-v2 model locally via ONNX,
eliminating need for external embedding API calls.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Wire tenant, database, and API key settings into ChromaSync for
remote/pro mode. In remote mode:
- Passes tenant and database to ChromaClient for data isolation
- Adds Authorization header when API key is configured
- Logs tenant isolation connection details

Local mode unchanged - uses default_tenant without explicit params.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@bigph00t
Copy link
Owner Author

What the /recall skill does

The skill bundles the MCP 3-layer search workflow into a simpler prompted 2-step process:

  1. Search → Returns compact index with IDs (~100 tokens/result)
  2. Fetch → Claude decides which IDs are relevant, then retrieves full details (~500 tokens/result)

The key innovation is that it's prompted - the SKILL.md file teaches Claude:

  • WHEN to use memory (questions about past work, missing context, repeated tasks)
  • HOW to execute token-efficiently (filter before fetching)

So when a user asks "How did we fix the rate limiting issue?", Claude automatically:

  1. Searches memory for "rate limiting"
  2. Reviews the index results
  3. Uses judgment to pick the 2-3 most relevant observations
  4. Fetches only those, saving ~10x tokens vs loading everything

The skill is bundled with claude-mem as /claude-mem:recall, or users can copy it to ~/.claude/skills/recall/ to get the shorter /recall command.

@greptile-apps
Copy link

greptile-apps bot commented Jan 25, 2026

Greptile Overview

Greptile Summary

This PR introduces the /recall skill for token-efficient memory retrieval using a 2-step workflow that significantly improves how Claude accesses past work.

Key Changes:

  • New /recall Skill: Added skill documentation (plugin/skills/recall/SKILL.md) that teaches Claude when and how to retrieve memories efficiently through a prompted 2-step process
  • New /api/recall Endpoint: Added SearchManager.recall() method and corresponding HTTP route to fetch full observations by IDs
  • Chroma Architecture Migration: Migrated from MCP client to direct HTTP client using chromadb npm package and ChromaClient
  • Local Server Management: New ChromaServerManager singleton manages Chroma HTTP server lifecycle via npx chroma run
  • Multi-tenancy Support: Added configuration settings for tenant isolation to support future cloud/pro features
  • Context Prompt Updates: Modified formatters to guide Claude to use /recall instead of MCP tools

2-Step Workflow:

  1. Search returns index with IDs/titles (~100 tokens/result)
  2. Claude judges relevance and fetches full details for selected IDs (~500 tokens/result)

This achieves ~10x token savings compared to bulk loading while maintaining full context retrieval capabilities.

Architecture Improvements:

  • Removes Windows console popup issues from MCP subprocess spawning
  • Enables future cloud vector search for pro users
  • Maintains backward compatibility with existing search infrastructure

Confidence Score: 4/5

  • This PR is safe to merge with good architectural improvements, though the large Chroma migration should be monitored in production
  • Score reflects solid implementation of the /recall skill and API endpoint (5/5 quality), but the substantial Chroma migration from MCP to HTTP introduces architectural changes that warrant production monitoring. The code quality is excellent with proper error handling, cross-platform support, and graceful degradation. The 2-step workflow is well-designed and the skill documentation is clear. Minor concern is the large refactor of ChromaSync affecting vector search reliability.
  • Monitor src/services/sync/ChromaSync.ts and src/services/sync/ChromaServerManager.ts in production to ensure the HTTP-based Chroma connection is stable across platforms. The rest of the changes are straightforward additions.

Important Files Changed

Filename Overview
plugin/skills/recall/SKILL.md Added /recall skill documentation with clear 2-step workflow instructions for token-efficient memory retrieval
src/services/worker/SearchManager.ts Added recall() method to fetch observations by IDs with proper validation and formatted markdown output
src/services/worker/http/routes/SearchRoutes.ts Added /api/recall endpoint handler that delegates to SearchManager.recall()
src/services/sync/ChromaServerManager.ts New singleton manager for Chroma HTTP server lifecycle with cross-platform process management
src/services/sync/ChromaSync.ts Migrated from MCP client to direct HTTP ChromaClient with tenant isolation support for future pro features
src/services/worker-service.ts Integrated ChromaServerManager to start local Chroma server during worker initialization

Sequence Diagram

sequenceDiagram
    participant User
    participant Claude
    participant RecallSkill as /recall Skill
    participant SearchAPI as /api/search
    participant RecallAPI as /api/recall
    participant SearchMgr as SearchManager
    participant SessionStore as SessionStore
    participant ChromaDB as ChromaDB

    User->>Claude: "How did we fix the auth bug?"
    Claude->>RecallSkill: Invokes /recall skill
    
    Note over Claude,RecallAPI: Step 1: Search for relevant memories
    Claude->>SearchAPI: GET /api/search?query=auth+bug&limit=15
    SearchAPI->>SearchMgr: search(query, limit)
    SearchMgr->>ChromaDB: queryChroma(query, 100)
    ChromaDB-->>SearchMgr: vector search results (IDs only)
    SearchMgr->>SessionStore: Get metadata for IDs
    SessionStore-->>SearchMgr: titles, types, dates
    SearchMgr-->>SearchAPI: Index (~100 tokens/result)
    SearchAPI-->>Claude: [ID, Title, Type, Date] list
    
    Note over Claude: Claude reviews index and<br/>decides which memories<br/>are relevant
    
    Note over Claude,RecallAPI: Step 2: Fetch full details for selected IDs
    Claude->>RecallAPI: GET /api/recall?ids=234,567,891
    RecallAPI->>SearchMgr: recall([234,567,891])
    SearchMgr->>SessionStore: getObservationsByIds([234,567,891])
    SessionStore-->>SearchMgr: Full observation records
    SearchMgr-->>RecallAPI: Formatted markdown (~500 tokens/result)
    RecallAPI-->>Claude: Full memory narratives
    
    Claude->>User: Answers question using<br/>retrieved context
Loading

bigphoot and others added 4 commits January 25, 2026 01:18
The 3-layer memory system (search → timeline → get_observations) is
powerful but wasn't being used due to friction and poor prompting.
This PR wraps the workflow into a single /recall skill that:

1. Prompts Claude with clear trigger conditions for when to use it
2. Bundles the 2-step workflow (search index → fetch selected IDs)
3. Preserves token efficiency by letting Claude decide what to fetch
4. Updates context injection with assertive prompting

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Move from plugin/commands/recall.md to plugin/skills/recall/SKILL.md
for proper Claude Code skill discovery.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add instructions for getting the shorter /recall command by copying
to personal skills directory.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Added comprehensive documentation for the /recall skill:
- Explains the 2-step prompted workflow (search → fetch)
- Documents token efficiency (~10x savings)
- Shows invocation options (/recall vs /claude-mem:recall)
- Includes installation instructions for personal skills

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@bigph00t bigph00t force-pushed the feature/recall-skill branch from bb75919 to 4108e57 Compare January 25, 2026 09:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant