Release v3.13.0 — 5-Tier Scoring Pyramid + SaaS Middleware · anulum/director-ai

What's New

Tier	Backend	Accuracy	Latency	Install
5	NLI (FactCG)	75.6% BA	14.6 ms	`pip install director-ai[nli]`
4	Distilled NLI (preview)	~70% BA	5 ms	`pip install director-ai[nli-lite]`
3	Embedding (bge-small)	~65% BA	3 ms	`pip install director-ai[embed]`
2	Rules engine (8 rules)	rule-based	<1 ms	`pip install director-ai`
1	Heuristic (lite)	~55% BA	<1 ms	`pip install director-ai`

Select via config: scorer_backend="rules", "embed", "deberta", or "lite".

APIKeyMiddleware: Bearer/X-API-Key auth, constant-time hmac validation, audit-safe key hashing
RateLimitMiddleware: per-key token-bucket with configurable RPM and burst
Cloud Run Dockerfile (deploy/cloud-run/Dockerfile.saas): FactCG ONNX pre-baked, non-root, healthcheck

4 rules scorer rules wired to Rust backfire_kernel: EntityGrounding, NumericConsistency, NegationFlip, WordOverlap

FactCG accuracy corrected: 75.8% → 75.6% per-dataset mean BA (AggreFact leaderboard convention)
Leaderboard position: #6 (verified). With per-dataset tuning: 77.76% (potential #1)
FaithLens 86.4% retraction (fabricated figure removed)
HuggingFace supply-chain hardening: MODEL_REGISTRY with pinned revision SHAs
Circular reasoning Python fallback punctuation bug
169 new CLI integration tests for 11 benchmark scripts

9 scorer backends: deberta, onnx, minicheck, lite, rules, embed, nli-lite, rust, backfire
5000+ tests (5157 collected)
25 files updated in version bump

Note: Distilled NLI (Tier 4, nli-lite) is preview — code and backend are functional, model training pending.

Full changelog: https://github.com/anulum/director-ai/blob/main/CHANGELOG.md#3130--2026-04-13