Skip to content

v0.7.2 — KB Prompt Injection Pre-filter

Latest

Choose a tag to compare

@iampantherr iampantherr released this 30 Mar 11:54

What's New

KB Prompt Injection Pre-filter (Gap 13)

zc_fetch now scans all fetched web content for obvious prompt injection patterns before the content enters the knowledge base. Matched spans are redacted in-place — the agent never sees the raw payload, even via zc_search retrieval.

11 patterns across 4 categories (high-specificity, low false-positive):

Category Examples
instruction-override "ignore/disregard/forget/override previous instructions"
role-override "SYSTEM OVERRIDE", "SYSTEM OVERRIDE:"
trust-label-bypass Attempts to re-characterize [UNTRUSTED EXTERNAL CONTENT] as "a diagnostic artifact"
context-boundary [END OF CONTEXT], [REAL INSTRUCTIONS START], [IGNORE THE ABOVE]

Each match replaced with: ⚠️[INJECTION PATTERN REDACTED: <type>]

zc_fetch response shows a visible warning banner listing match count and detected type names when patterns are found.

Broad patterns (curl|bash, eval()) are intentionally excluded — too many false positives in legitimate documentation. The [UNTRUSTED EXTERNAL CONTENT] trust label and Claude's safety training remain the primary defense layer.

27 New Unit Tests

fetcher.test.ts now has 27 new tests covering all pattern categories, clean content passthrough (no false positives), case variants, multi-pattern counting, replacement non-re-triggering, and regex flag validation.

Total: 300 unit tests | 84 security attack vectors (78 PASS · 0 FAIL · 6 WARN)

Threat Model Documentation

SECURITY_REPORT.md now includes:

  • Gap 13 write-up with full pattern table and excluded-pattern rationale
  • 3 accepted-risk "Known Limitations" from external deep-dive analysis: persistent context poisoning (partially mitigated), working memory DoS (low exploitability), adversarial vector collisions (theoretical/negligible for local deployment)

Migration

No database migrations required. No breaking changes.


🤖 Generated with Claude Code