Skip to content

fix: resolve merge conflict and re-apply all guard fixes on finalize_…#3

Closed
sergiobayona wants to merge 1 commit into
mainfrom
feat/guardrails
Closed

fix: resolve merge conflict and re-apply all guard fixes on finalize_…#3
sergiobayona wants to merge 1 commit into
mainfrom
feat/guardrails

Conversation

@sergiobayona
Copy link
Copy Markdown
Owner

…run refactor

Main branch introduced a finalize_run helper (chatwoot#56) that centralised the exit path. Our guardrails branch had inline output guards + manual finalization. This commit resolves the conflict and re-applies all three guard fixes on top of the new architecture.

Changes re-applied:

  1. GuardRunner.run now tracks any_rewrite across the chain and returns action: :rewrite when any guard rewrote content. Previously always returned :pass, making caller .rewrite? checks dead code.

  2. last_message_matches? moved below the input guard block so dedup compares against post-guard (potentially rewritten) input. Prevents stale dedup from silently discarding guard rewrites.

  3. Output guards serialize Hash/Array content (from response_schema) to JSON before the guard chain and deserialize after rewrite, so guards always operate on Strings.

Conflict resolution: output guards run before finalize_run, passing the guarded final_output to the helper. Tripwire rescue remains separate (needs guardrail_tripwire field not supported by finalize_run).

Tests: 10 new examples covering input guard rewrites, output guard rewrites, structured output guards (redact/tripwire/pass-through), and dedup regression.

432 examples, 0 failures, 98.24% line coverage.

@sergiobayona sergiobayona force-pushed the feat/guardrails branch 5 times, most recently from 462e78e to a002424 Compare March 18, 2026 01:54
Introduce a guardrail layer that intercepts content before it reaches
an agent (input guards) and before it returns to the caller (output
guards). Guards are composable, ordered, and follow the same
thread-safe, stateless design as Tools.

A guard's `call` method returns one of three outcomes:
- **pass** (nil or GuardResult.pass): content proceeds unchanged
- **rewrite** (GuardResult.rewrite): content is replaced before
  continuing to the next guard or the LLM
- **tripwire** (GuardResult.tripwire): the run is aborted immediately
  with a dedicated error and metadata on the RunResult

Key design decisions:
- Guards are agent-scoped (`input_guards:` / `output_guards:` kwargs),
  not global, enabling fine-grained per-agent policies
- Fail-open by default: a guard that raises an unexpected exception
  logs and passes. `strict: true` converts exceptions to tripwires
- Input guards run once before the first LLM call; output guards run
  only on the final response (not intermediate tool-call turns)
- Guard chains execute in array order; each guard sees the output of
  the previous guard's potential rewrite
- Structured output (Hash/Array from response_schema) is serialized
  to JSON before the guard chain and deserialized back after rewrite
- GuardRunner.run tracks rewrites across the chain and returns
  action: :rewrite so callers can detect changes
- Dedup check (last_message_matches?) runs after input guards so
  rewritten input is compared against history
- Tripwire rescue uses finalize_run with guardrail_tripwire kwarg;
  StandardError rescue has a safety-net re-raise for Tripwire

New files:
- lib/agents/guard.rb         — base class, Tripwire exception, DSL
- lib/agents/guard_result.rb  — value object (pass/rewrite/tripwire)
- lib/agents/guard_runner.rb  — ordered chain executor

Integration points:
- Agent: accepts input_guards/output_guards, propagated through clone
- Runner: input guards before LLM, output guards before finalize_run,
  Guard::Tripwire rescue with guardrail_tripwire metadata on RunResult
- RunResult: new `guardrail_tripwire` field and `tripwired?` predicate
- CallbackManager: new `guard_triggered` event type
- AgentRunner: `on_guard_triggered` callback registration
- Instrumentation: `agents.run.guard.*` OTel spans with phase/action
  attributes, compatible with Langfuse

Tests: 12 new examples covering input guard rewrites, output guard
rewrites, structured output guards (redact/tripwire/pass-through),
dedup regression, and tripwire metadata and callback emission.
Existing specs updated to stub the new guard attributes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant