What's New
KB Prompt Injection Pre-filter (Gap 13)
zc_fetch now scans all fetched web content for obvious prompt injection patterns before the content enters the knowledge base. Matched spans are redacted in-place — the agent never sees the raw payload, even via zc_search retrieval.
11 patterns across 4 categories (high-specificity, low false-positive):
| Category | Examples |
|---|---|
instruction-override |
"ignore/disregard/forget/override previous instructions" |
role-override |
"SYSTEM OVERRIDE", "SYSTEM OVERRIDE:" |
trust-label-bypass |
Attempts to re-characterize [UNTRUSTED EXTERNAL CONTENT] as "a diagnostic artifact" |
context-boundary |
[END OF CONTEXT], [REAL INSTRUCTIONS START], [IGNORE THE ABOVE] |
Each match replaced with: ⚠️[INJECTION PATTERN REDACTED: <type>]
zc_fetch response shows a visible warning banner listing match count and detected type names when patterns are found.
Broad patterns (curl|bash, eval()) are intentionally excluded — too many false positives in legitimate documentation. The [UNTRUSTED EXTERNAL CONTENT] trust label and Claude's safety training remain the primary defense layer.
27 New Unit Tests
fetcher.test.ts now has 27 new tests covering all pattern categories, clean content passthrough (no false positives), case variants, multi-pattern counting, replacement non-re-triggering, and regex flag validation.
Total: 300 unit tests | 84 security attack vectors (78 PASS · 0 FAIL · 6 WARN)
Threat Model Documentation
SECURITY_REPORT.md now includes:
- Gap 13 write-up with full pattern table and excluded-pattern rationale
- 3 accepted-risk "Known Limitations" from external deep-dive analysis: persistent context poisoning (partially mitigated), working memory DoS (low exploitability), adversarial vector collisions (theoretical/negligible for local deployment)
Migration
No database migrations required. No breaking changes.
🤖 Generated with Claude Code