feat(tone-gate): B17_FALSE_BLOCKER — Never a False Blocker#378
Merged
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Sibling of B16 (A Wall Is a Hypothesis): a MessagingToneGate authority rule that holds outbound messages deferring a task to a human / second opinion / reverse-engineering when no capability inventory was shown, with a tiny genuine-human-only allowlist. Signal tier = deferral-detector extension. Converged via internal adversarial review + lessons-grep. Awaiting Justin's approval before any code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sibling of B16: holds an outbound message that defers a doable task to a
person ("needs a human / I can't / second opinion / reverse-engineering")
when no genuinely-human-only item is named and no inventory of the agent's
own means (computer use, terminal, send-keys, MCP) is shown. Where B16
surrenders on feasibility, B17 surrenders on agency; the fused straddle
("no API so a human must") is evaluated under B17.
- MessagingToneGate: B17 rule (VALID_RULES + prompt + closing list + doc
comments), de-confliction + straddle + precedence B15>B16>B17, allowlist
(password/CAPTCHA/legal/approval/account-grant/rate-limit/value-judgment),
named-outcome inventory requirement, self-fetched cross-model carve-out.
- deferral-detector (signal only): needs_human_to / reverse-engineering
patterns + guarded second-opinion (self-fetched review not flagged);
template + regenerated deployed copy (orphan-TODO preserved).
- Constitution: STANDARDS-REGISTRY "Never a False Blocker"; principles P12.
- Spec marked approved (topic 12896); side-effects review; NEXT.md.
Tests: unit (13) + integration (2) green; smoke suite green; tsc clean.
Real-LLM test-as-self: founding codex-trust case + fused straddle BLOCK;
password/value-judgment/required-approval/self-fetched-review/post-inventory
all PASS (prompt tightened after the live pass surfaced the founding miss).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ad5faf5 to
966e3ec
Compare
Owner
Author
|
Rebased onto main (v1.2.72). B16 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two sibling guard rules in the outbound-message authority (
MessagingToneGate), closing a pair of "surrender" gravity wells. Approved in Telegram topic 12896 (B17 design + spec) and topic 12143 (B16); bundling confirmed by the operator ("Yes").B17 design highlights
deferral-detectorPreToolUse hook (signal only — never blocks) primes the checklist for the new excuse-shapes; theMessagingToneGateis the single block authority. Self-fetched cross-model review is not flagged.Scope notes
deferral-detectorreaches existing agents via the always-overwrite built-in-hook path (orphan-TODO patterns preserved in the regenerated copy)./crossreviewround is not wired on this checkout — noted, offered as an optional pre-merge add.Test plan
messaging-tone-gate-b17.test.ts(13),messaging-tone-gate-b16.test.ts(9)telegram-reply-b17-false-blocker.test.ts(2),telegram-reply-b16-wall.test.ts(2)ClaudeCliIntelligenceProvider→ Haiku): founding codex-trust case + fused straddle BLOCK with B17; password escalation, value judgment, required approval, self-fetched second opinion, post-inventory deferral all PASS. Founding case initially passed; prompt tightened (UI-interaction clarification + worked example) until it reliably blocks — a gap only the live pass surfaced.🤖 Generated with Claude Code