What
A comprehensive testing and benchmarking plan that validates Smriti against real-world usage scenarios: large databases, concurrent access, cross-agent queries, and performance under load.
Why
Unit tests verify correctness in isolation, but real usage involves hundreds of sessions, thousands of messages, multiple agents writing simultaneously, and databases that grow over months. We need to validate performance doesn't degrade and structured data stays consistent at scale.
Tasks
Correctness Testing
Performance Benchmarks
Stress Testing
Security Testing
Files
- `test/benchmark.test.ts` — new Performance benchmarks
- `test/stress.test.ts` — new Stress and edge case tests
- `test/security.test.ts` — new Security validation tests
- `test/e2e.test.ts` — new End-to-end round-trip tests
- `test/fixtures/large/` — new Large synthetic test data
- `scripts/generate-fixtures.ts` — new Test data generator
Acceptance Criteria
Real User Testing Plan
| Scenario |
What to Measure |
Risk if Untested |
| Fresh install + first ingest |
Time-to-first-search, error messages |
Bad first impression |
| 500+ sessions accumulated |
Search latency, DB size, `smriti status` accuracy |
Performance cliff |
| Multi-project workspace |
Project ID derivation accuracy, cross-project search |
Wrong project attribution |
| Team sharing (2+ developers) |
Sync conflicts, dedup accuracy, content hash stability |
Duplicate/lost knowledge |
| Long-running session (4+ hours) |
Memory during ingest, block count accuracy, cost tracking |
OOM or missed data |
| Rapid session creation |
Watch daemon debouncing, no duplicate ingestion |
Double-counting |
| Agent switch mid-task |
Cross-agent file operation tracking, timeline accuracy |
Gaps in activity log |
Testing
bun test test/benchmark.test.ts # Performance benchmarks
bun test test/stress.test.ts # Stress tests
bun test test/security.test.ts # Security validation
bun test test/e2e.test.ts # End-to-end round-trips
bun run scripts/generate-fixtures.ts # Generate large test data
What
A comprehensive testing and benchmarking plan that validates Smriti against real-world usage scenarios: large databases, concurrent access, cross-agent queries, and performance under load.
Why
Unit tests verify correctness in isolation, but real usage involves hundreds of sessions, thousands of messages, multiple agents writing simultaneously, and databases that grow over months. We need to validate performance doesn't degrade and structured data stays consistent at scale.
Tasks
Correctness Testing
Performance Benchmarks
Stress Testing
Security Testing
Files
Acceptance Criteria
Real User Testing Plan
Testing