fix: resolve merge conflict and re-apply all guard fixes on finalize_… by sergiobayona · Pull Request #3 · sergiobayona/ai-agents

sergiobayona · 2026-03-17T22:12:00Z

…run refactor

Main branch introduced a finalize_run helper (chatwoot#56) that centralised the exit path. Our guardrails branch had inline output guards + manual finalization. This commit resolves the conflict and re-applies all three guard fixes on top of the new architecture.

Changes re-applied:

GuardRunner.run now tracks any_rewrite across the chain and returns action: :rewrite when any guard rewrote content. Previously always returned :pass, making caller .rewrite? checks dead code.
last_message_matches? moved below the input guard block so dedup compares against post-guard (potentially rewritten) input. Prevents stale dedup from silently discarding guard rewrites.
Output guards serialize Hash/Array content (from response_schema) to JSON before the guard chain and deserialize after rewrite, so guards always operate on Strings.

Conflict resolution: output guards run before finalize_run, passing the guarded final_output to the helper. Tripwire rescue remains separate (needs guardrail_tripwire field not supported by finalize_run).

Tests: 10 new examples covering input guard rewrites, output guard rewrites, structured output guards (redact/tripwire/pass-through), and dedup regression.

432 examples, 0 failures, 98.24% line coverage.

Introduce a guardrail layer that intercepts content before it reaches an agent (input guards) and before it returns to the caller (output guards). Guards are composable, ordered, and follow the same thread-safe, stateless design as Tools. A guard's `call` method returns one of three outcomes: - **pass** (nil or GuardResult.pass): content proceeds unchanged - **rewrite** (GuardResult.rewrite): content is replaced before continuing to the next guard or the LLM - **tripwire** (GuardResult.tripwire): the run is aborted immediately with a dedicated error and metadata on the RunResult Key design decisions: - Guards are agent-scoped (`input_guards:` / `output_guards:` kwargs), not global, enabling fine-grained per-agent policies - Fail-open by default: a guard that raises an unexpected exception logs and passes. `strict: true` converts exceptions to tripwires - Input guards run once before the first LLM call; output guards run only on the final response (not intermediate tool-call turns) - Guard chains execute in array order; each guard sees the output of the previous guard's potential rewrite - Structured output (Hash/Array from response_schema) is serialized to JSON before the guard chain and deserialized back after rewrite - GuardRunner.run tracks rewrites across the chain and returns action: :rewrite so callers can detect changes - Dedup check (last_message_matches?) runs after input guards so rewritten input is compared against history - Tripwire rescue uses finalize_run with guardrail_tripwire kwarg; StandardError rescue has a safety-net re-raise for Tripwire New files: - lib/agents/guard.rb — base class, Tripwire exception, DSL - lib/agents/guard_result.rb — value object (pass/rewrite/tripwire) - lib/agents/guard_runner.rb — ordered chain executor Integration points: - Agent: accepts input_guards/output_guards, propagated through clone - Runner: input guards before LLM, output guards before finalize_run, Guard::Tripwire rescue with guardrail_tripwire metadata on RunResult - RunResult: new `guardrail_tripwire` field and `tripwired?` predicate - CallbackManager: new `guard_triggered` event type - AgentRunner: `on_guard_triggered` callback registration - Instrumentation: `agents.run.guard.*` OTel spans with phase/action attributes, compatible with Langfuse Tests: 12 new examples covering input guard rewrites, output guard rewrites, structured output guards (redact/tripwire/pass-through), dedup regression, and tripwire metadata and callback emission. Existing specs updated to stub the new guard attributes.

sergiobayona force-pushed the feat/guardrails branch 5 times, most recently from 462e78e to a002424 Compare March 18, 2026 01:54

sergiobayona force-pushed the feat/guardrails branch from a002424 to ad5d962 Compare March 18, 2026 02:11

sergiobayona closed this Mar 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: resolve merge conflict and re-apply all guard fixes on finalize_…#3

fix: resolve merge conflict and re-apply all guard fixes on finalize_…#3
sergiobayona wants to merge 1 commit into
mainfrom
feat/guardrails

sergiobayona commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sergiobayona commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant