What
Transform Smriti from flat text ingestion to a structured, queryable memory pipeline — where every tool call, file edit, git operation, error, and thinking block is parsed, typed, stored in sidecar tables, and available for analytics, search, and team sharing.
Why
Currently Smriti drops 80%+ of the structured data in AI coding sessions. A Claude Code transcript contains tool calls with typed inputs, file diffs, command outputs, git operations, token costs, and thinking blocks — but the flat text parser reduces all of this to a single string. This means:
- No file tracking: Can't answer "what files did I edit this week?"
- No error analysis: Can't find sessions where builds failed or tests broke
- No cost visibility: No token/cost tracking across sessions or projects
- No git correlation: Can't link sessions to commits, branches, or PRs
- No cross-agent view: Different agents (Claude, Cline, Aider) can't share a unified memory
- No security layer: Secrets in sessions get shared without redaction
This roadmap addresses all of these gaps across 5 phases.
Sub-Issues
Phase Overview
| Phase |
Deliverable |
Status |
| Phase 1 |
Enriched Claude Code Parser (#5) |
Done — 13 block types, 6 sidecar tables, 142 tests |
| Phase 2 |
Cline + Aider Parsers (#6) |
Planned |
| Phase 3 |
Watch Daemon (#7) + Search & Analytics (#8) |
Planned |
| Phase 4 |
Secret Redaction & Policy (#9) |
Planned |
| Phase 5 |
Telemetry (#10) + Testing & Perf (#11) |
Planned |
Storage Inventory
Complete map of every data type, where it lives, and whether it's indexed:
| Data |
Source |
Table |
Key Columns |
Indexed? |
| Session text (FTS) |
All agents |
memory_fts (QMD) |
content |
FTS5 full-text |
| Session metadata |
Ingestion |
smriti_session_meta |
session_id, agent_id, project_id |
Yes (agent, project) |
| Project registry |
Path derivation |
smriti_projects |
id, path, description |
PK |
| Agent registry |
Seed data |
smriti_agents |
id, parser, log_pattern |
PK |
| Tool usage |
Block extraction |
smriti_tool_usage |
message_id, tool_name, success, duration_ms |
Yes (session, tool_name) |
| File operations |
Block extraction |
smriti_file_operations |
message_id, operation, file_path, project_id |
Yes (session, path) |
| Commands |
Block extraction |
smriti_commands |
message_id, command, exit_code, is_git |
Yes (session, is_git) |
| Git operations |
Block extraction |
smriti_git_operations |
message_id, operation, branch, pr_url |
Yes (session, operation) |
| Errors |
Block extraction |
smriti_errors |
message_id, error_type, message |
Yes (session, type) |
| Token costs |
Metadata accumulation |
smriti_session_costs |
session_id, model, input/output/cache tokens, cost |
PK |
| Category tags (session) |
Categorization |
smriti_session_tags |
session_id, category_id, confidence, source |
Yes (category) |
| Category tags (message) |
Categorization |
smriti_message_tags |
message_id, category_id, confidence, source |
Yes (category) |
| Category taxonomy |
Seed data |
smriti_categories |
id, name, parent_id |
PK |
| Share tracking |
Team sharing |
smriti_shares |
session_id, content_hash, author |
Yes (hash) |
| Vector embeddings |
smriti embed |
content_vectors + vectors_vec (QMD) |
content_hash, embedding |
Virtual table |
| Telemetry events |
Opt-in collection |
~/.smriti/telemetry.json |
timestamp, event, data |
N/A (JSONL file) |
| Structured blocks |
Block extraction |
memory_messages.metadata.blocks (JSON) |
MessageBlock[] |
No (JSON blob) |
| Message metadata |
Parsing |
memory_messages.metadata (JSON) |
cwd, gitBranch, model, tokenUsage |
No (JSON blob) |
Block Type Reference
The 13 MessageBlock types extracted during ingestion:
| Block Type |
Fields |
Stored In |
text |
text |
FTS (via plainText) |
thinking |
thinking, budgetTokens |
JSON blob only |
tool_call |
toolId, toolName, input |
smriti_tool_usage |
tool_result |
toolId, success, output, error, durationMs |
Updates tool_usage success |
file_op |
operation, path, diff, pattern |
smriti_file_operations |
command |
command, cwd, exitCode, stdout, stderr, isGit |
smriti_commands |
search |
searchType, pattern, path, url, resultCount |
JSON blob only |
git |
operation, branch, message, files, prUrl, prNumber |
smriti_git_operations |
error |
errorType, message, retryable |
smriti_errors |
image |
mediaType, path, dataHash |
JSON blob only |
code |
language, code, filePath, lineStart |
JSON blob only |
system_event |
eventType, data |
Cost accumulation |
control |
controlType, command |
JSON blob only |
Real User Testing Plan
| Scenario |
What to Measure |
Risk if Untested |
| Fresh install + first ingest |
Time-to-first-search, error quality |
Bad first impression, confusing errors |
| 500+ sessions accumulated |
Search latency, DB file size, smriti status accuracy |
Performance cliff after months of use |
| Multi-project workspace |
Project ID derivation accuracy, cross-project search |
Wrong project attribution for sessions |
| Team sharing (2+ devs) |
Sync conflicts, dedup accuracy, content hash stability |
Duplicate or lost knowledge articles |
| Long-running session (4+ hrs) |
Memory during ingest, block count accuracy, cost tracking |
OOM or missed data at end of session |
| Rapid session creation |
Watch daemon debouncing, no duplicate ingestion |
Double-counting sessions |
| Agent switch mid-task |
Cross-agent file tracking, unified timeline |
Gaps in activity log |
| Secret in session |
Detection rate, redaction completeness, share blocking |
Leaked credentials in .smriti/ |
| Large JSONL file (50MB+) |
Parse time, memory usage, incremental ingest |
Crash or multi-minute ingest |
| Corrupt/truncated files |
Error messages, graceful skip, no data loss |
Silent data corruption |
Configuration Reference
| Env Var |
Default |
Phase |
Description |
QMD_DB_PATH |
~/.cache/qmd/index.sqlite |
— |
Database path |
CLAUDE_LOGS_DIR |
~/.claude/projects |
1 |
Claude Code logs |
CODEX_LOGS_DIR |
~/.codex |
— |
Codex CLI logs |
SMRITI_PROJECTS_ROOT |
~/zero8.dev |
1 |
Projects root for ID derivation |
OLLAMA_HOST |
http://127.0.0.1:11434 |
— |
Ollama endpoint |
QMD_MEMORY_MODEL |
qwen3:8b-tuned |
— |
Ollama model for synthesis |
SMRITI_CLASSIFY_THRESHOLD |
0.5 |
— |
LLM classification trigger |
SMRITI_AUTHOR |
$USER |
— |
Git author for team sharing |
SMRITI_WATCH_DEBOUNCE_MS |
2000 |
3 |
Watch daemon debounce interval |
SMRITI_TELEMETRY |
0 |
5 |
Enable telemetry collection |
Current State
Phase 1 is complete:
- 13 structured block types defined in
src/ingest/types.ts
- Block extraction engine in
src/ingest/blocks.ts
- Enriched Claude parser in
src/ingest/claude.ts
- 6 sidecar tables in
src/db.ts with indexes and insert helpers
- 142 tests passing, 415 expect() calls across 9 test files
What
Transform Smriti from flat text ingestion to a structured, queryable memory pipeline — where every tool call, file edit, git operation, error, and thinking block is parsed, typed, stored in sidecar tables, and available for analytics, search, and team sharing.
Why
Currently Smriti drops 80%+ of the structured data in AI coding sessions. A Claude Code transcript contains tool calls with typed inputs, file diffs, command outputs, git operations, token costs, and thinking blocks — but the flat text parser reduces all of this to a single string. This means:
This roadmap addresses all of these gaps across 5 phases.
Sub-Issues
smriti watchwith fs.watch for real-time ingestionPhase Overview
Storage Inventory
Complete map of every data type, where it lives, and whether it's indexed:
memory_fts(QMD)smriti_session_metasmriti_projectssmriti_agentssmriti_tool_usagesmriti_file_operationssmriti_commandssmriti_git_operationssmriti_errorssmriti_session_costssmriti_session_tagssmriti_message_tagssmriti_categoriessmriti_sharessmriti embedcontent_vectors+vectors_vec(QMD)~/.smriti/telemetry.jsonmemory_messages.metadata.blocks(JSON)memory_messages.metadata(JSON)Block Type Reference
The 13
MessageBlocktypes extracted during ingestion:textthinkingtool_callsmriti_tool_usagetool_resultfile_opsmriti_file_operationscommandsmriti_commandssearchgitsmriti_git_operationserrorsmriti_errorsimagecodesystem_eventcontrolReal User Testing Plan
smriti statusaccuracy.smriti/Configuration Reference
QMD_DB_PATH~/.cache/qmd/index.sqliteCLAUDE_LOGS_DIR~/.claude/projectsCODEX_LOGS_DIR~/.codexSMRITI_PROJECTS_ROOT~/zero8.devOLLAMA_HOSThttp://127.0.0.1:11434QMD_MEMORY_MODELqwen3:8b-tunedSMRITI_CLASSIFY_THRESHOLD0.5SMRITI_AUTHOR$USERSMRITI_WATCH_DEBOUNCE_MS2000SMRITI_TELEMETRY0Current State
Phase 1 is complete:
src/ingest/types.tssrc/ingest/blocks.tssrc/ingest/claude.tssrc/db.tswith indexes and insert helpers