Skip to content

feat(tone-gate): B17_FALSE_BLOCKER — Never a False Blocker#378

Merged
JKHeadley merged 2 commits into
mainfrom
echo/never-false-blocker-spec
May 25, 2026
Merged

feat(tone-gate): B17_FALSE_BLOCKER — Never a False Blocker#378
JKHeadley merged 2 commits into
mainfrom
echo/never-false-blocker-spec

Conversation

@JKHeadley
Copy link
Copy Markdown
Owner

Summary

Two sibling guard rules in the outbound-message authority (MessagingToneGate), closing a pair of "surrender" gravity wells. Approved in Telegram topic 12896 (B17 design + spec) and topic 12143 (B16); bundling confirmed by the operator ("Yes").

  • B17_FALSE_BLOCKER (new — topic 12896): holds an outbound message that defers a doable task to a person ("needs a human / I can't / blocked pending you / second opinion / reverse-engineering") when it names no genuinely-human-only item and shows no inventory of the agent's own means (computer use, terminal, send-keys, MCP). The deference-shaped sibling of B16 — where B16 surrenders on feasibility, B17 surrenders on agency.
  • B16_UNVERIFIED_WALL ("A Wall Is a Hypothesis"): built + approved earlier (topic 12143) but stranded uncommitted with no branch/PR. Landed here as the required base for B17 (B17's de-confliction references B16's prompt rule).

B17 design highlights

  • De-confliction + straddle: missing mechanism → B16; person required → B17; the fused straddle ("no API, so a human must") is evaluated under B17 so it can't slip between the rules. Citation precedence B15 > B16 > B17.
  • Allowlist (favors false-negatives): password/secret only the user holds, CAPTCHA, legal/billing/payment authorization, required approvals, account/access grants, external rate-limit waits, genuine value judgments, deferrals after a named-outcome inventory, and self-fetched cross-model review all pass.
  • Signal vs authority: the deferral-detector PreToolUse hook (signal only — never blocks) primes the checklist for the new excuse-shapes; the MessagingToneGate is the single block authority. Self-fetched cross-model review is not flagged.
  • Honest limit: the gate sees only message text, so a fabricated inventory can still pass (same as B16) — stated in the rule; named-outcome requirement raises the cost.

Scope notes

  • B17 ships with the server (no migrator entry); the deferral-detector reaches existing agents via the always-overwrite built-in-hook path (orphan-TODO patterns preserved in the regenerated copy).
  • Convergence: internal adversarial review (2 reviewers) + lessons-grep. The branded multi-model /crossreview round is not wired on this checkout — noted, offered as an optional pre-merge add.

Test plan

  • Unit: messaging-tone-gate-b17.test.ts (13), messaging-tone-gate-b16.test.ts (9)
  • Integration: telegram-reply-b17-false-blocker.test.ts (2), telegram-reply-b16-wall.test.ts (2)
  • Smoke suite (62 files / 2371 tests) green; tsc clean; pre-push full suite green
  • Real-LLM test-as-self (real ClaudeCliIntelligenceProvider → Haiku): founding codex-trust case + fused straddle BLOCK with B17; password escalation, value judgment, required approval, self-fetched second opinion, post-inventory deferral all PASS. Founding case initially passed; prompt tightened (UI-interaction clarification + worked example) until it reliably blocks — a gap only the live pass surfaced.

🤖 Generated with Claude Code

@vercel
Copy link
Copy Markdown

vercel Bot commented May 25, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
instar Ready Ready Preview, Comment May 25, 2026 8:52am

Request Review

Instar Agent (echo) and others added 2 commits May 25, 2026 01:49
Sibling of B16 (A Wall Is a Hypothesis): a MessagingToneGate authority rule
that holds outbound messages deferring a task to a human / second opinion /
reverse-engineering when no capability inventory was shown, with a tiny
genuine-human-only allowlist. Signal tier = deferral-detector extension.

Converged via internal adversarial review + lessons-grep. Awaiting Justin's
approval before any code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sibling of B16: holds an outbound message that defers a doable task to a
person ("needs a human / I can't / second opinion / reverse-engineering")
when no genuinely-human-only item is named and no inventory of the agent's
own means (computer use, terminal, send-keys, MCP) is shown. Where B16
surrenders on feasibility, B17 surrenders on agency; the fused straddle
("no API so a human must") is evaluated under B17.

- MessagingToneGate: B17 rule (VALID_RULES + prompt + closing list + doc
  comments), de-confliction + straddle + precedence B15>B16>B17, allowlist
  (password/CAPTCHA/legal/approval/account-grant/rate-limit/value-judgment),
  named-outcome inventory requirement, self-fetched cross-model carve-out.
- deferral-detector (signal only): needs_human_to / reverse-engineering
  patterns + guarded second-opinion (self-fetched review not flagged);
  template + regenerated deployed copy (orphan-TODO preserved).
- Constitution: STANDARDS-REGISTRY "Never a False Blocker"; principles P12.
- Spec marked approved (topic 12896); side-effects review; NEXT.md.

Tests: unit (13) + integration (2) green; smoke suite green; tsc clean.
Real-LLM test-as-self: founding codex-trust case + fused straddle BLOCK;
password/value-judgment/required-approval/self-fetched-review/post-inventory
all PASS (prompt tightened after the live pass surfaced the founding miss).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@JKHeadley
Copy link
Copy Markdown
Owner Author

Rebased onto main (v1.2.72). B16 A Wall Is a Hypothesis landed on main separately while this was in flight, so the redundant B16 base commit was dropped — this PR now adds only B17, cleanly on top of main's B16. B17's de-confliction/straddle logic references main's B16 rule. Re-verified after rebase: 26 unit+integration tests green, tsc clean, and a real-LLM spot-check confirms the founding codex-trust case still blocks with B17.

@JKHeadley JKHeadley changed the title feat(tone-gate): B17_FALSE_BLOCKER (Never a False Blocker) + land B16 (A Wall Is a Hypothesis) feat(tone-gate): B17_FALSE_BLOCKER — Never a False Blocker May 25, 2026
@JKHeadley JKHeadley merged commit d491fbc into main May 25, 2026
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant