| last_validated | 2026-03-16 |
|---|
This guide covers how to use Agent Brain for document indexing and semantic search using the Claude Code plugin.
- Overview
- Plugin Commands
- Plugin Agents
- Search Modes
- Two-Stage Retrieval with Reranking
- Indexing
- Folder Management
- File Type Presets
- Content Injection
- Chunk Eviction
- File Watcher
- Embedding Cache
- Job Queue
- Provider Configuration
- Multi-Project Support
- Runtime Autodiscovery
- Runtime Installation
- CLI Reference
- Local Integration Check
- Troubleshooting
Agent Brain is a RAG (Retrieval-Augmented Generation) system that indexes and searches documentation and source code. The primary interface is the Claude Code plugin which provides:
| Component | Count | Description |
|---|---|---|
| Commands | 30 | Slash commands for all operations |
| Agents | 3 | Intelligent assistants for complex tasks |
| Skills | 2 | Context for optimal search and configuration |
- Indexing: Reads documents/code, splits into semantic chunks, generates embeddings
- Storage: Stores chunks in ChromaDB with metadata for filtering
- Retrieval: Finds similar chunks using hybrid search (semantic + keyword)
- GraphRAG: Extracts entities and relationships for dependency queries
| Command | Description | Best For |
|---|---|---|
/agent-brain-search |
Smart hybrid search | General questions |
/agent-brain-semantic |
Pure vector search | Conceptual queries |
/agent-brain-keyword |
BM25 keyword search | Exact terms, function names |
/agent-brain-bm25 |
Alias for keyword search | Error messages, symbols |
/agent-brain-vector |
Alias for semantic search | "How does X work?" |
/agent-brain-hybrid |
Hybrid with alpha control | Fine-tuned searches |
/agent-brain-graph |
Knowledge graph search | Dependencies, relationships |
/agent-brain-multi |
All modes with RRF fusion | Maximum recall |
| Command | Description |
|---|---|
/agent-brain-start |
Start server (auto-port allocation) |
/agent-brain-stop |
Stop the running server |
/agent-brain-status |
Check health and document count |
/agent-brain-list |
List all running instances |
/agent-brain-index |
Index documents or code |
/agent-brain-reset |
Clear the index |
/agent-brain-jobs |
Manage indexing job queue |
| Command | Description |
|---|---|
/agent-brain-folders |
Manage indexed folders (list, add, remove) |
/agent-brain-inject |
Inject custom metadata into chunks during indexing |
/agent-brain-types |
List available file type presets for indexing |
/agent-brain-cache |
View embedding cache metrics or clear the cache |
| Command | Description |
|---|---|
/agent-brain-setup |
Complete guided setup wizard |
/agent-brain-install |
Install pip packages |
/agent-brain-install-agent |
Install for different AI runtimes (Claude, OpenCode, Gemini, Codex) |
/agent-brain-init |
Initialize project directory |
/agent-brain-config |
View/edit configuration |
/agent-brain-verify |
Verify configuration |
/agent-brain-help |
Show help information |
/agent-brain-version |
Show version information |
| Command | Description |
|---|---|
/agent-brain-providers |
List and configure providers |
/agent-brain-embeddings |
Configure embedding provider |
/agent-brain-summarizer |
Configure summarization provider |
Agent Brain includes three intelligent agents that handle complex, multi-step tasks:
Performs multi-step searches across different modes and synthesizes answers.
Triggers: "Find all references to...", "Search for...", "What files contain..."
Example:
You: "Find all references to the authentication module"
Search Assistant:
1. Searches documentation for auth concepts
2. Searches code for auth imports and usage
3. Uses graph mode to find dependencies
4. Returns comprehensive list with file locations
Deep exploration with follow-up queries and cross-referencing.
Triggers: "Research how...", "Investigate...", "Analyze the architecture of..."
Example:
You: "Research how error handling is implemented"
Research Assistant:
1. Identifies error handling patterns in docs
2. Finds exception classes and try/catch blocks
3. Traces error propagation through call graph
4. Synthesizes findings with code references
Guided installation, configuration, and troubleshooting.
Triggers: "Help me set up Agent Brain", "Configure...", "Why isn't... working"
Example:
You: "Help me set up Agent Brain with Ollama"
Setup Assistant:
1. Checks if Ollama is installed
2. Verifies embedding model is pulled
3. Configures provider settings
4. Tests the configuration
5. Reports success or guides through fixes
Combines semantic similarity with keyword matching. Best for general questions.
/agent-brain-search "how does the caching system work"
Adjust the balance with --alpha:
--alpha 0.7- More semantic (conceptual queries)--alpha 0.3- More keyword (specific terms)
/agent-brain-hybrid "authentication flow" --alpha 0.7
Pure embedding-based search. Best for conceptual understanding.
/agent-brain-semantic "explain the overall architecture"
TF-IDF based search. Best for exact terms, function names, error codes.
/agent-brain-keyword "NullPointerException"
/agent-brain-bm25 "getUserById"
Traverses entity relationships. Best for dependency and relationship queries.
/agent-brain-graph "what classes use AuthService"
/agent-brain-graph "what calls the validate function"
Combines all modes using Reciprocal Rank Fusion. Best for maximum recall.
/agent-brain-multi "everything about data validation"
Agent Brain can optionally use two-stage retrieval to improve search precision by 15-20%.
Without Reranking (Default):
- Query is embedded using the embedding model
- Vector similarity search finds top_k most similar documents
- Results are returned
With Reranking Enabled:
- Query is embedded using the embedding model
- Vector + BM25 hybrid search retrieves 10x more candidates
- Cross-encoder model scores each candidate for relevance to the query
- Results are reordered by cross-encoder score
- Top_k results are returned
Embedding models (bi-encoders) are fast but approximate. They encode the query and documents separately, then compare vectors. This can miss nuanced relevance.
Cross-encoders process the query AND document together, allowing the model to attend across both texts. This is slower but more accurate.
Enable reranking when:
- Precision matters more than latency
- Queries are complex or nuanced
- Initial results seem "close but not quite right"
Keep reranking disabled when:
- Latency is critical (real-time search)
- Running on resource-constrained hardware
- Search quality is already acceptable
Enable with environment variable:
export ENABLE_RERANKING=trueOr in config.yaml:
reranker:
provider: sentence-transformers
model: cross-encoder/ms-marco-MiniLM-L-6-v2sentence-transformers (Recommended):
- Uses HuggingFace CrossEncoder models
- Downloads model on first use (~50MB)
- Fast inference (~50ms for 100 candidates)
ollama (Fully Local):
- Uses Ollama chat completions for scoring
- No external downloads
- Slower (~500ms for 100 candidates)
- Requires Ollama running locally
When reranking is enabled, results include additional metadata:
rerank_score: Cross-encoder relevance scoreoriginal_rank: Position before reranking (1-indexed)
/agent-brain-index ./docs
/agent-brain-index . --include-code
/agent-brain-index ./src --include-code --languages python,typescript
/agent-brain-index ./src --include-type python
/agent-brain-index ./project --include-type python,docs
Improves semantic search for code by generating LLM descriptions:
/agent-brain-index ./src --include-code --generate-summaries
Agent Brain supports AST-aware chunking for:
- Python (.py)
- TypeScript (.ts, .tsx)
- JavaScript (.js, .jsx)
- Java (.java)
- Go (.go)
- Rust (.rs)
- C (.c, .h)
- C++ (.cpp, .hpp, .cc)
- C# (.cs, .csx)
- Swift (.swift)
Other languages use intelligent text-based chunking.
/agent-brain-status
/agent-brain-reset
/agent-brain-index . --include-code
Agent Brain tracks indexed folders and provides commands to list, add, and remove them. Folders are persisted in a JSONL manifest that enables incremental re-indexing -- only changed files are processed on subsequent runs.
Show all indexed folders with chunk counts and last-indexed timestamps:
agent-brain folders list
Example output:
Folder Path Chunks Last Indexed
/home/user/docs 312 2026-02-24T12:00:00
/home/user/src 1024 2026-02-24T13:30:00
Queue an indexing job for a folder. Supports all indexing options:
agent-brain folders add ./docs
agent-brain folders add ./src --include-code
agent-brain folders add ./src --include-type python,docs
agent-brain folders add ./docs --force
Adding an already-indexed folder triggers incremental re-indexing (only changed files are processed). Use --force to bypass the manifest and re-index everything.
Remove all indexed chunks associated with a folder:
agent-brain folders remove ./old-docs
agent-brain folders remove ./old-docs --yes # skip confirmation
The folder does not need to exist on disk to be removed from the index.
When adding a folder, you can enable automatic re-indexing via the file watcher (see File Watcher section). Folders with watch_mode=auto are monitored for changes and re-indexed automatically.
Use the plugin command for the same operations:
/agent-brain-folders list
/agent-brain-folders add ./src --include-code
/agent-brain-folders remove ./old-docs --yes
File type presets are named groups of glob patterns that simplify indexing. Instead of specifying individual file extensions, use a preset name with the --include-type flag.
| Preset | Extensions |
|---|---|
python |
*.py, *.pyi, *.pyw |
javascript |
*.js, *.jsx, *.mjs, *.cjs |
typescript |
*.ts, *.tsx |
go |
*.go |
rust |
*.rs |
java |
*.java |
csharp |
*.cs |
c |
*.c, *.h |
cpp |
*.cpp, *.hpp, *.cc, *.hh |
web |
*.html, *.css, *.scss, *.jsx, *.tsx |
docs |
*.md, *.txt, *.rst, *.pdf |
text |
*.md, *.txt, *.rst |
pdf |
*.pdf |
code |
All programming language extensions combined |
# Index only Python files
agent-brain index ./src --include-type python
# Index Python and documentation files
agent-brain index ./project --include-type python,docs
# Index all code files
agent-brain index ./repo --include-type code
# Combine presets with custom patterns
agent-brain index ./project --include-type typescript --include-patterns "*.json"Use the types command to see all presets:
/agent-brain-types
Presets can be combined with commas: --include-type python,docs. The code preset is a union of all individual language presets.
Content injection enriches chunk metadata during indexing using custom Python scripts or static JSON metadata files. Injectors run after chunking but before embedding generation (step 2.5 in the pipeline), so enriched metadata is stored alongside vectors in the index.
Provide a Python script that exports a process_chunk function:
agent-brain inject ./docs --script enrich.pyThe script must define:
def process_chunk(chunk: dict) -> dict:
"""Enrich a single chunk with custom metadata."""
chunk["project"] = "my-project"
chunk["team"] = "backend"
return chunkInput keys available: chunk_id, content, source, language, start_line, end_line, summary
Constraints:
- Values must be scalars (str, int, float, bool) -- lists and dicts are stripped for ChromaDB compatibility
- Core schema keys (
chunk_id,source, etc.) cannot be overwritten - Exceptions are caught per-chunk and logged as warnings (the pipeline continues)
Merge a static JSON file into every chunk from a folder:
agent-brain inject ./src --folder-metadata project-meta.json --include-codeJSON format:
{
"project": "my-project",
"team": "backend",
"version": "2.0"
}Validate an injector against sample chunks without actually indexing:
agent-brain inject ./docs --script enrich.py --dry-run/agent-brain-inject ./docs --script enrich.py
/agent-brain-inject ./src --folder-metadata project-meta.json --include-code
/agent-brain-inject ./docs --script enrich.py --dry-run
At least one of --script or --folder-metadata must be provided.
When files change or are removed, Agent Brain automatically evicts stale chunks from the index during the next indexing run. This is powered by the manifest tracker, which records per-file checksums, modification times, and chunk IDs.
- Manifest comparison: On each indexing run, the current filesystem state is compared against the prior folder manifest.
- Diff computation: Files are categorized as added, changed, deleted, or unchanged.
- mtime check first: If the file modification time is unchanged, the file is skipped (fast path).
- Checksum verification: If mtime changed, a SHA-256 content checksum confirms whether the content actually changed (handles
touch,git checkout, etc.).
- Bulk eviction: Chunk IDs for deleted and changed files are removed from the storage backend in bulk.
- Re-indexing: Only added and changed files are processed, saving time on large codebases.
Use --force to bypass the manifest and re-index all files:
agent-brain index ./src --forceForce mode evicts all prior chunks for the folder and processes every file fresh.
Manifests are stored as JSON files in the state directory:
.agent-brain/manifests/<sha256(folder_path)>.json
Each manifest records per-file checksums, mtimes, and chunk IDs for targeted deletion.
The file watcher service monitors indexed folders for changes and triggers automatic incremental re-indexing. It uses watchfiles (based on the Rust notify crate) for efficient filesystem event detection.
- One asyncio task is created per watched folder
- When file changes are detected, an incremental indexing job is enqueued
- Jobs are deduplicated -- if a job for the same folder is already pending, no duplicate is created
- Changes are debounced to avoid rapid re-indexing (default: 30 seconds)
| Mode | Behavior |
|---|---|
off |
No automatic re-indexing (default) |
auto |
Watch for changes and re-index automatically |
Configure the file watcher via config.yaml:
file_watcher:
default_debounce_seconds: 30 # Global debounce intervalPer-folder debounce can be set when adding a folder with watch mode enabled.
The watcher automatically ignores common non-source directories: .git/, __pycache__/, node_modules/, .venv/, dist/, build/, .next/, .nuxt/, coverage/, htmlcov/.
Watcher-triggered jobs are tagged with source="auto" to distinguish them from manual indexing jobs. They always use force=False (incremental mode via the manifest tracker).
Agent Brain automatically caches embeddings in a two-layer architecture to avoid redundant API calls. The cache is transparent -- it requires no setup and works with any embedding provider.
- Layer 1 (Memory): In-memory LRU cache with fixed capacity (default: 1,000 entries). Sub-millisecond lookups with zero I/O.
- Layer 2 (Disk): aiosqlite SQLite database in WAL mode. Single-digit millisecond lookups. Persists across server restarts. Default limit: 500 MB (~42,000 entries at 3,072 dimensions).
Keys are computed as SHA-256(content_text):provider:model:dimensions. This ensures cached embeddings are invalidated when the embedding provider or model changes.
On startup, the cache compares the current provider fingerprint against the stored fingerprint. If they differ, all cached embeddings are automatically cleared to prevent dimension mismatches.
Use the CLI or plugin command to view cache status and clear the cache:
# View cache metrics
agent-brain cache status
# View metrics as JSON
agent-brain cache status --json
# Clear the cache (prompts for confirmation)
agent-brain cache clear
# Clear without confirmation
agent-brain cache clear --yesPlugin commands:
/agent-brain-cache status
/agent-brain-cache clear --yes
| Metric | Description |
|---|---|
| Entries (disk) | Total embeddings persisted in the SQLite database |
| Entries (memory) | Embeddings in the in-memory LRU (fastest tier) |
| Hit Rate | Percentage of lookups served from cache (higher is better) |
| Hits | Total successful cache lookups this session |
| Misses | Cache misses (embedding computed via API) |
| Size | Disk space used by the cache database |
A healthy cache has a hit rate above 80% after the first full indexing cycle.
- After changing embedding provider or model (prevents dimension mismatches)
- If embeddings seem incorrect or queries return poor results
- To force fresh embeddings after significant content changes
As of v3.0.0, indexing operations are queued and processed asynchronously.
- Submit:
POST /indexreturns immediately with a job ID - Queue: Jobs are stored in
.agent-brain/jobs/index_queue.jsonl - Process: Background worker processes jobs sequentially
- Track: Poll job status or use CLI
--watchoption
# List all jobs
agent-brain jobs
# Watch queue with live updates
agent-brain jobs --watch
# Get job details
agent-brain jobs job_abc123def456
# Cancel a job
agent-brain jobs job_abc123def456 --cancel| Status | Description |
|---|---|
pending |
Queued, waiting to run |
running |
Currently processing |
done |
Completed successfully |
failed |
Failed with error |
cancelled |
Cancelled by user |
The queue automatically deduplicates identical requests. If you submit the same folder with the same options while a job is pending or running, you get back the existing job ID.
# Check if indexing is done
agent-brain status --json | jq '.indexing.indexing_in_progress'
# Or poll specific job
agent-brain jobs job_abc123 | grep statusAgent Brain supports pluggable providers for embeddings and summarization.
/agent-brain-providers
| Provider | Models | Local |
|---|---|---|
| OpenAI | text-embedding-3-large, text-embedding-3-small | No |
| Ollama | nomic-embed-text, mxbai-embed-large | Yes |
| Cohere | embed-english-v3.0, embed-multilingual-v3.0 | No |
| Provider | Models | Local |
|---|---|---|
| Anthropic | claude-haiku-4-5-20251001, claude-sonnet-4-5-20250514 | No |
| OpenAI | gpt-5, gpt-5-mini | No |
| Gemini | gemini-3-flash, gemini-3-pro | No |
| Grok | grok-4, grok-4-fast | No |
| Ollama | llama4:scout, mistral-small3.2, qwen3-coder | Yes |
Run completely offline with Ollama:
/agent-brain-providers
# Select Ollama for embeddings
# Select Ollama for summarization
Agent Brain supports multiple isolated instances for different projects.
/agent-brain-init
Creates .agent-brain/ with project-specific configuration.
/agent-brain-start
Automatically allocates a unique port (no conflicts).
/agent-brain-list
Shows all running Agent Brain servers across projects.
Commands automatically resolve the project root:
cd src/deep/nested/directory
/agent-brain-status # Finds the parent project's server
The CLI automatically discovers the server URL without manual configuration.
When you run agent-brain start, the server writes a runtime.json file:
.agent-brain/runtime.json
Contents:
{
"base_url": "http://127.0.0.1:49321",
"port": 49321,
"bind_host": "127.0.0.1",
"pid": 12345,
"started_at": "2026-02-03T10:00:00Z",
"foreground": false
}The CLI resolves the server URL in this priority:
- Environment variable:
AGENT_BRAIN_URL - Runtime file:
.agent-brain/runtime.json(searches cwd upward) - Config file:
config.yaml(if contains URL) - Default:
http://127.0.0.1:8000
Config files are searched in this order:
.agent-brain/config.yaml(cwd, then walk upward)~/.config/agent-brain/config.yaml(XDG config)~/.agent-brain/config.yaml(legacy, deprecated)- Environment variable:
AGENT_BRAIN_CONFIG
# Start server (writes runtime.json automatically)
agent-brain start
# CLI auto-discovers server URL - no --url flag needed
agent-brain status
agent-brain index ./docs
agent-brain query "search term"Agent Brain can be installed for multiple AI runtimes. The install-agent command converts the canonical Claude plugin format into the target runtime's native format.
| Runtime | Command | Default Directory |
|---|---|---|
| Claude Code | --agent claude |
.claude/plugins/agent-brain/ |
| OpenCode | --agent opencode |
.opencode/plugins/agent-brain/ |
| Gemini CLI | --agent gemini |
.gemini/plugins/agent-brain/ |
| Codex | --agent codex |
.codex/skills/agent-brain/ |
| Any skill-based | --agent skill-runtime --dir <path> |
(required) |
# Install for Claude Code (default)
agent-brain install-agent --agent claude
# Install for Codex (generates AGENTS.md at project root)
agent-brain install-agent --agent codex
# Install for any skill-based runtime (e.g., Qwen, Cursor)
agent-brain install-agent --agent skill-runtime --dir ./my-skills
# Preview what would be installed
agent-brain install-agent --agent codex --dry-run
# Install globally (user-level)
agent-brain install-agent --agent claude --global
# JSON output for automation
agent-brain install-agent --agent codex --jsonThe skill-runtime converter flattens all plugin artifacts into skill directories:
- Commands become individual skill directories with
SKILL.md - Agents become orchestration skill directories referencing dependent skills
- Skills are copied with references intact
- Templates are placed in
agent-brain-setup/assets/ - Scripts are placed in
agent-brain-verify/scripts/
The codex adapter is a preset built on skill-runtime that also:
- Installs to
.codex/skills/agent-brain/by default - Generates/updates
AGENTS.mdat the project root - Adds invocation guidance headers to each skill
- Uses HTML comment markers for idempotent AGENTS.md updates
To add support for a new runtime, implement the RuntimeConverter protocol:
from agent_brain_cli.runtime.converter_base import RuntimeConverter
class MyConverter:
@property
def runtime_type(self) -> RuntimeType: ...
def convert_command(self, command: PluginCommand) -> str: ...
def convert_agent(self, agent: PluginAgent) -> str: ...
def convert_skill(self, skill: PluginSkill) -> str: ...
def install(self, bundle: PluginBundle, target: Path, scope: Scope) -> list[Path]: ...Then register it in install_agent.py's CONVERTERS dict.
For advanced users or automation, the CLI provides direct access:
pip install agent-brain-rag agent-brain-cli# Initialize project
agent-brain init
# Start/stop server
agent-brain start # Backgrounds by default
agent-brain start --foreground # Run in foreground
agent-brain stop
# Index documents
agent-brain index ./docs --include-code
# Index with file type presets
agent-brain index ./src --include-type python
# Folder management
agent-brain folders list
agent-brain folders add ./src --include-code
agent-brain folders remove ./old-docs --yes
# Content injection
agent-brain inject ./docs --script enrich.py
agent-brain inject ./src --folder-metadata project-meta.json
# Query
agent-brain query "your question" --mode hybrid
# Job management (v3.0+)
agent-brain jobs # List all jobs
agent-brain jobs --watch # Watch with live updates
agent-brain jobs JOB_ID # Job details
agent-brain jobs JOB_ID --cancel # Cancel job
# Cache management
agent-brain cache status
agent-brain cache clear --yes
# File type presets
agent-brain types list
# Runtime installation
agent-brain install-agent --agent claude
agent-brain install-agent --agent codex
agent-brain install-agent --agent skill-runtime --dir ./skills
# Status
agent-brain status
agent-brain list# Search modes
agent-brain query "term" --mode vector
agent-brain query "term" --mode bm25
agent-brain query "term" --mode hybrid --alpha 0.7
agent-brain query "term" --mode graph
agent-brain query "term" --mode multi
# Result tuning
agent-brain query "term" --top-k 10 --threshold 0.3
# Filtering
agent-brain query "term" --source-types code
agent-brain query "term" --languages python,typescript
# Output formats
agent-brain query "term" --json
agent-brain query "term" --scoresBefore releasing or after major changes, run the local integration check to validate E2E functionality.
./scripts/local_integration_check.shOr using Task:
task local-integration- Server startup: Verifies server starts and writes
runtime.json - Runtime autodiscovery: CLI finds server URL from
runtime.json - Job queue: Indexing job completes without 409/500 errors
- Query: Returns valid HTTP 200 response
- CLI commands:
agent-brain jobsworks correctly
=== Agent Brain Local Integration Check ===
Step 1: Cleaning up stray processes...
Step 2: Cleaning up old state...
Step 3: Starting server in foreground...
Step 4: Checking runtime.json...
Found runtime.json
Server URL: http://127.0.0.1:49321
Step 5: Waiting for health endpoint...
Server is healthy!
...
=== Integration Check PASSED ===
If the check fails:
- runtime.json not found: Server failed to start - check for port conflicts
- Job failed: Check server logs in
.agent-brain/logs/ - Query failed: Index may be empty - verify test data was created
/agent-brain-status
If not running:
/agent-brain-start
- Check document count:
/agent-brain-status - If 0 documents, re-index:
/agent-brain-index ./docs - Try lowering threshold:
/agent-brain-search "term" --threshold 0.3 - Try different search mode:
/agent-brain-keyword "exact term"
/agent-brain-verify
This checks:
- Package installation
- API key configuration
- Server connectivity
- Provider setup
/agent-brain-providers
Verify your API keys are set correctly for the selected provider.
/agent-brain-reset
/agent-brain-init
/agent-brain-start
/agent-brain-index . --include-code
- Quick Start - Get running in minutes
- Plugin Guide - All 30 commands in detail
- API Reference - REST API documentation
- GraphRAG Guide - Knowledge graph features
- Provider Configuration - Provider setup