fix: resolve merge conflict and re-apply all guard fixes on finalize_…#3
Closed
sergiobayona wants to merge 1 commit into
Closed
fix: resolve merge conflict and re-apply all guard fixes on finalize_…#3sergiobayona wants to merge 1 commit into
sergiobayona wants to merge 1 commit into
Conversation
462e78e to
a002424
Compare
Introduce a guardrail layer that intercepts content before it reaches an agent (input guards) and before it returns to the caller (output guards). Guards are composable, ordered, and follow the same thread-safe, stateless design as Tools. A guard's `call` method returns one of three outcomes: - **pass** (nil or GuardResult.pass): content proceeds unchanged - **rewrite** (GuardResult.rewrite): content is replaced before continuing to the next guard or the LLM - **tripwire** (GuardResult.tripwire): the run is aborted immediately with a dedicated error and metadata on the RunResult Key design decisions: - Guards are agent-scoped (`input_guards:` / `output_guards:` kwargs), not global, enabling fine-grained per-agent policies - Fail-open by default: a guard that raises an unexpected exception logs and passes. `strict: true` converts exceptions to tripwires - Input guards run once before the first LLM call; output guards run only on the final response (not intermediate tool-call turns) - Guard chains execute in array order; each guard sees the output of the previous guard's potential rewrite - Structured output (Hash/Array from response_schema) is serialized to JSON before the guard chain and deserialized back after rewrite - GuardRunner.run tracks rewrites across the chain and returns action: :rewrite so callers can detect changes - Dedup check (last_message_matches?) runs after input guards so rewritten input is compared against history - Tripwire rescue uses finalize_run with guardrail_tripwire kwarg; StandardError rescue has a safety-net re-raise for Tripwire New files: - lib/agents/guard.rb — base class, Tripwire exception, DSL - lib/agents/guard_result.rb — value object (pass/rewrite/tripwire) - lib/agents/guard_runner.rb — ordered chain executor Integration points: - Agent: accepts input_guards/output_guards, propagated through clone - Runner: input guards before LLM, output guards before finalize_run, Guard::Tripwire rescue with guardrail_tripwire metadata on RunResult - RunResult: new `guardrail_tripwire` field and `tripwired?` predicate - CallbackManager: new `guard_triggered` event type - AgentRunner: `on_guard_triggered` callback registration - Instrumentation: `agents.run.guard.*` OTel spans with phase/action attributes, compatible with Langfuse Tests: 12 new examples covering input guard rewrites, output guard rewrites, structured output guards (redact/tripwire/pass-through), dedup regression, and tripwire metadata and callback emission. Existing specs updated to stub the new guard attributes.
a002424 to
ad5d962
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
…run refactor
Main branch introduced a
finalize_runhelper (chatwoot#56) that centralised the exit path. Our guardrails branch had inline output guards + manual finalization. This commit resolves the conflict and re-applies all three guard fixes on top of the new architecture.Changes re-applied:
GuardRunner.run now tracks
any_rewriteacross the chain and returnsaction: :rewritewhen any guard rewrote content. Previously always returned:pass, making caller.rewrite?checks dead code.last_message_matches?moved below the input guard block so dedup compares against post-guard (potentially rewritten) input. Prevents stale dedup from silently discarding guard rewrites.Output guards serialize Hash/Array content (from response_schema) to JSON before the guard chain and deserialize after rewrite, so guards always operate on Strings.
Conflict resolution: output guards run before
finalize_run, passing the guardedfinal_outputto the helper. Tripwire rescue remains separate (needsguardrail_tripwirefield not supported byfinalize_run).Tests: 10 new examples covering input guard rewrites, output guard rewrites, structured output guards (redact/tripwire/pass-through), and dedup regression.
432 examples, 0 failures, 98.24% line coverage.