Skip to content

Production-readiness audit: security/resilience fixes, coverage tools, template-inheritance guidance#1

Merged
souf92i merged 1 commit into
mainfrom
audit-production-hardening
Jun 18, 2026
Merged

Production-readiness audit: security/resilience fixes, coverage tools, template-inheritance guidance#1
souf92i merged 1 commit into
mainfrom
audit-production-hardening

Conversation

@souf92i

@souf92i souf92i commented Jun 18, 2026

Copy link
Copy Markdown
Member

Summary

Production-readiness hardening from a full audit of the server, plus a knowledge-base fix for a real LLM failure mode. 56 files, all gates green. Rebased onto the latest main (includes the recent publishConfig commit; package.json reconciled).

Security & data-leak fixes

  • Closed an SSRF bypass: the guard missed IPv4-mapped IPv6 ([::ffff:127.0.0.1], which the URL parser compresses to hex) and NAT64-wrapped internal addresses. Now blocked, with regression tests.
  • Symmetric redaction on read paths (get_* / list_*), matching the write path — defense-in-depth against secret-bearing fields reaching the model.
  • Added secure (the encrypted half of Stream's {clear, secure}) to the redaction set.
  • test_trigger now scrubs reflected credential headers (Authorization, …) and request/response bodies from the returned REST result.
  • redactValue no longer mangles hex identifiers (thumbprints/serials) as "blobs".

HTTP client resilience

  • Retry now drains discarded 429/5xx bodies (undici connection-leak fix) and adds jitter.
  • Richer transport-error classification (request timeout, TLS handshake, more connection codes) with actionable remediation.
  • Lazy-init reads bounded by the size cap; init promise resets on failure (no permanent wedge).
  • Bounded error-body reads; postMultipart honors exportTimeout; noRetry for the state-changing generate_crl / generate_krl GETs.

Auth

  • mTLS now requires an https:// URL (a plain http:// URL silently dropped the client cert).
  • Startup warnings for STREAM_VERIFY_SSL=false and for both auth modes configured at once; mTLS files read once; removed a dead base-class hook.

MCP ergonomics & correctness

  • Populated the (previously empty) tool-disambiguation map for the confusable clusters.
  • Acronym-aware tool titles (List CAs, Get OCSP signer); auto-appended "Safety tier" line for tools that lacked it.
  • Unexpected (non-Stream) errors are redacted before reaching the model.
  • Server instructions point to search_docs / the tool-selection playbook.
  • CA full-replace strip-set consolidated to one source of truth; removed dead code (getStripMergePut, src/models/payloads.ts, assertConfigBody).

New read-only coverage tools

  • get_published_crl, get_published_aia, get_published_krl — fetch the actual published artifacts (closes the "generate then verify the CRL" loop; wires the previously-dead getBytes).
  • list_enabled_identity_providers.
  • verify-truth now prints an advisory reverse-coverage report (97/98 /api/v1 routes referenced).

Knowledge: template inheritance (the reported failure)

An LLM building a trust chain created multiple OCSP/CRL-bearing templates instead of one that inherits from the CA. Verified against the Stream source (X509LifecycleApiV1Controller resolves ca.crldps/ca.aia when the template's *FromCA flag is set) and updated the create_template description + param docs + the templates / ca-management / revocation knowledge resources to: prefer crldpsFromCA / aiaFromCA (AIA carries the OCSP responder URL), configure that wiring once on the CA, and keep templates general — one template per genuine policy difference, not per CA.

Ops / release

  • publishConfig (reconciled with main: kept registry + access: public, added provenance).
  • engines.node lowered to >=22.19 (undici's real floor; the code uses no Node-24-only APIs) — see decision below.
  • prepublishOnly build guard, coverage provider + CI gate, advisory bun audit, Dependabot, SECURITY.md + CONTRIBUTING.md, doc tool-counts updated to 157 with an exact-count drift test.

Decision for reviewers

  • engines.node >=22.19 vs main's prior >=24.10. Lowered per the audit (broader compatibility, no Node-24 API dependency). Flip back if a Node-24 floor is intentional policy.

Test plan

  • format:check, lint, typecheck, build
  • verify:truth (route-truth + advisory coverage)
  • 344 unit tests
  • 10 deterministic scenario tests
  • e2e foundation + opt-in mutation smoke (STREAM_E2E_MUTATE=1) against QA
  • paid test:llm:live smoke — 8/8
  • Reviewer: confirm engines >=22.19 is acceptable
  • Reviewer: re-run a real trust-chain flow to confirm the model now inherits CRL DP/AIA/OCSP from the CA

Not in scope (deferred)

SBOM generation and a hard bun audit gate; retrofitting outputSchema/structuredContent across all mutate tools. The full audit report is kept out of VCS (gitignored).

…eritance guidance

Audit-driven fixes: SSRF guard, read-path redaction, retry/timeout resilience, mTLS https.

Adds coverage gate, Dependabot, SECURITY/CONTRIBUTING; engines >=22.19.

New tools: get_published_crl/aia/krl, list_enabled_identity_providers.

Knowledge: prefer CA inheritance (crldpsFromCA/aiaFromCA), minimal general templates.
@souf92i souf92i merged commit a9c3926 into main Jun 18, 2026
4 checks passed
@souf92i souf92i deleted the audit-production-hardening branch June 18, 2026 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant