[FEATURE]: LLM command-approval classifier ("auto mode") for permission gating

### Feature hasn't been suggested before.

- [x] I have verified this feature I'm about to request hasn't been suggested before.

(Closest prior issues, and how this differs, are listed at the bottom — this proposal is distinct in mechanism.)

### Describe the enhancement you want to request

**Problem.** opencode's permission model is per-rule binary: `allow` (auto-approve), `ask` (prompt every time), or `deny`. In agentic sessions there's no middle ground — nothing inspects an actual `bash` command / webfetch / MCP call and decides "this specific one is safe, don't prompt." Users end up approving constantly, or allow-listing broadly and losing the guardrail.

**Proposal.** An opt-in classifier (after Claude Code's "auto mode") that gates **only the would-auto-approve path**: when a rule resolves to `allow`, consult a model first.

- allow → proceed silently (as today)
- block → deny-and-continue: a tool error the agent reacts to (no halt)
- classifier error / escalation → fail closed (human prompt)

It never overrides an explicit user `deny`/`ask`.

Design points (implemented in the reference branch):

- **Pluggable backend.** Default uses the user's own configured model via the AI SDK (single-pass `<block>yes/no</block>`). Pluggable so a local/self-hosted or hosted guardrails model can be slotted in.
- **Reasoning-blind transcript** — the classifier sees only user messages + the bare tool-call payload, never assistant prose or prior tool output (prompt-injection + anti-rationalization defense).
- **Full interception path** via `Permission.ask` — covers edits, webfetch, MCP, task/subagent, external-dir; read-only tools short-circuit before any model call.
- **Escalation backstop** — per-session denial counters (3 consecutive / 20 total), reset each user turn, so a false positive can't loop forever.
- `classifier` config block (sibling of `permission`); **off by default**.

**Reference implementation:** `openguardrails/opencode@feat/auto-mode-classifier` (draft PR #33586). `bun run typecheck` clean; unit tests cover reasoning-blindness, fail-closed parsing, the allowlist, and the policy slots.

**Closest prior issues (and how this differs):**

- #20298 (open) — auto-approve safe bash via **tree-sitter** classification. Same goal, different mechanism: a deterministic parser vs an LLM classifier covering all tools. Complementary — tree-sitter could be a fast pre-filter ahead of the model.
- #9651 (closed) — a "soft deny" permission *action*. This reuses the concept as a `soft_deny` *policy slot* fed to the classifier, not a new action.
- #22558 (open) — dynamic permission scoping via plugin state + structured `DeniedError` reason. Same hook point (`permission.ask`); this is a model-driven gate rather than plugin-state scoping, and could share the structured-deny-reason mechanism.

Happy to align with any of these and to the team's preferred shape (naming, config location, `experimental`-flag gating).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEATURE]: LLM command-approval classifier ("auto mode") for permission gating #33585

Feature hasn't been suggested before.

Describe the enhancement you want to request

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[FEATURE]: LLM command-approval classifier ("auto mode") for permission gating #33585

Description

Feature hasn't been suggested before.

Describe the enhancement you want to request

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions