Skip to content

v3.13.0 — 5-Tier Scoring Pyramid + SaaS Middleware

Choose a tag to compare

@anulum anulum released this 13 Apr 01:46
· 129 commits to main since this release

What's New

5-Tier Scoring Pyramid

Tier Backend Accuracy Latency Install
5 NLI (FactCG) 75.6% BA 14.6 ms pip install director-ai[nli]
4 Distilled NLI (preview) ~70% BA 5 ms pip install director-ai[nli-lite]
3 Embedding (bge-small) ~65% BA 3 ms pip install director-ai[embed]
2 Rules engine (8 rules) rule-based <1 ms pip install director-ai
1 Heuristic (lite) ~55% BA <1 ms pip install director-ai

Select via config: scorer_backend="rules", "embed", "deberta", or "lite".

SaaS Middleware

  • APIKeyMiddleware: Bearer/X-API-Key auth, constant-time hmac validation, audit-safe key hashing
  • RateLimitMiddleware: per-key token-bucket with configurable RPM and burst
  • Cloud Run Dockerfile (deploy/cloud-run/Dockerfile.saas): FactCG ONNX pre-baked, non-root, healthcheck

Rust Fast-Path

4 rules scorer rules wired to Rust backfire_kernel: EntityGrounding, NumericConsistency, NegationFlip, WordOverlap

Fixes

  • FactCG accuracy corrected: 75.8% → 75.6% per-dataset mean BA (AggreFact leaderboard convention)
  • Leaderboard position: #6 (verified). With per-dataset tuning: 77.76% (potential #1)
  • FaithLens 86.4% retraction (fabricated figure removed)
  • HuggingFace supply-chain hardening: MODEL_REGISTRY with pinned revision SHAs
  • Circular reasoning Python fallback punctuation bug
  • 169 new CLI integration tests for 11 benchmark scripts

Stats

  • 9 scorer backends: deberta, onnx, minicheck, lite, rules, embed, nli-lite, rust, backfire
  • 5000+ tests (5157 collected)
  • 25 files updated in version bump

Note: Distilled NLI (Tier 4, nli-lite) is preview — code and backend are functional, model training pending.

Full changelog: https://github.com/anulum/director-ai/blob/main/CHANGELOG.md#3130--2026-04-13