Feature hasn't been suggested before.
(Closest prior issues, and how this differs, are listed at the bottom — this proposal is distinct in mechanism.)
Describe the enhancement you want to request
Problem. opencode's permission model is per-rule binary: allow (auto-approve), ask (prompt every time), or deny. In agentic sessions there's no middle ground — nothing inspects an actual bash command / webfetch / MCP call and decides "this specific one is safe, don't prompt." Users end up approving constantly, or allow-listing broadly and losing the guardrail.
Proposal. An opt-in classifier (after Claude Code's "auto mode") that gates only the would-auto-approve path: when a rule resolves to allow, consult a model first.
- allow → proceed silently (as today)
- block → deny-and-continue: a tool error the agent reacts to (no halt)
- classifier error / escalation → fail closed (human prompt)
It never overrides an explicit user deny/ask.
Design points (implemented in the reference branch):
- Pluggable backend. Default uses the user's own configured model via the AI SDK (single-pass
<block>yes/no</block>). Pluggable so a local/self-hosted or hosted guardrails model can be slotted in.
- Reasoning-blind transcript — the classifier sees only user messages + the bare tool-call payload, never assistant prose or prior tool output (prompt-injection + anti-rationalization defense).
- Full interception path via
Permission.ask — covers edits, webfetch, MCP, task/subagent, external-dir; read-only tools short-circuit before any model call.
- Escalation backstop — per-session denial counters (3 consecutive / 20 total), reset each user turn, so a false positive can't loop forever.
classifier config block (sibling of permission); off by default.
Reference implementation: openguardrails/opencode@feat/auto-mode-classifier (draft PR #33586). bun run typecheck clean; unit tests cover reasoning-blindness, fail-closed parsing, the allowlist, and the policy slots.
Closest prior issues (and how this differs):
Happy to align with any of these and to the team's preferred shape (naming, config location, experimental-flag gating).
Feature hasn't been suggested before.
(Closest prior issues, and how this differs, are listed at the bottom — this proposal is distinct in mechanism.)
Describe the enhancement you want to request
Problem. opencode's permission model is per-rule binary:
allow(auto-approve),ask(prompt every time), ordeny. In agentic sessions there's no middle ground — nothing inspects an actualbashcommand / webfetch / MCP call and decides "this specific one is safe, don't prompt." Users end up approving constantly, or allow-listing broadly and losing the guardrail.Proposal. An opt-in classifier (after Claude Code's "auto mode") that gates only the would-auto-approve path: when a rule resolves to
allow, consult a model first.It never overrides an explicit user
deny/ask.Design points (implemented in the reference branch):
<block>yes/no</block>). Pluggable so a local/self-hosted or hosted guardrails model can be slotted in.Permission.ask— covers edits, webfetch, MCP, task/subagent, external-dir; read-only tools short-circuit before any model call.classifierconfig block (sibling ofpermission); off by default.Reference implementation:
openguardrails/opencode@feat/auto-mode-classifier(draft PR #33586).bun run typecheckclean; unit tests cover reasoning-blindness, fail-closed parsing, the allowlist, and the policy slots.Closest prior issues (and how this differs):
soft_denypolicy slot fed to the classifier, not a new action.DeniedErrorreason. Same hook point (permission.ask); this is a model-driven gate rather than plugin-state scoping, and could share the structured-deny-reason mechanism.Happy to align with any of these and to the team's preferred shape (naming, config location,
experimental-flag gating).