diff --git a/CLAUDE.md b/CLAUDE.md index 9a75dd0..4ff2d0c 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -30,6 +30,9 @@ This project is a Ruby SDK for building multi-agent AI workflows. It allows deve - `lib/agents/tool.rb`: Defines the `Tool` class, the base for creating custom tools for agents. - `lib/agents/agent_runner.rb`: Thread-safe agent execution manager for multi-agent conversations. - `lib/agents/runner.rb`: Internal orchestrator that handles individual conversation turns. + - `lib/agents/guard.rb`: Base class for guardrails — stateless input/output validators. + - `lib/agents/guard_result.rb`: Value object for guard outcomes (pass/rewrite/tripwire). + - `lib/agents/guard_runner.rb`: Ordered chain executor for guards with fail-open/closed modes. - `spec/`: Contains the RSpec tests for the project. - `examples/`: Includes example implementations of multi-agent systems, such as an ISP customer support demo. - `Gemfile`: Manages the project's Ruby dependencies. @@ -65,7 +68,9 @@ This will start a command-line interface where you can interact with the multi-a - **Handoff**: The process of transferring a conversation from one agent to another. This is a core feature of the SDK. - **Runner**: Internal component that manages individual conversation turns (used by AgentRunner). - **Context**: A shared state object that stores conversation history and agent information, fully serializable for persistence. -- **Callbacks**: Event hooks for monitoring agent execution, including agent thinking, tool start/complete, and handoffs. +- **Callbacks**: Event hooks for monitoring agent execution, including agent thinking, tool start/complete, handoffs, and guard triggers. +- **Guard**: A stateless validator that intercepts content before (input) or after (output) agent execution. Returns pass, rewrite (modify content), or tripwire (abort run). +- **GuardRunner**: Executes an ordered chain of guards. Supports fail-open (default) and fail-closed (strict) error handling. ## Development Commands @@ -118,6 +123,9 @@ ruby examples/isp-support/interactive.rb - **Agents::Context**: Shared state management across agent interactions - **Agents::Handoff**: Manages seamless transfers between agents - **Agents::CallbackManager**: Centralized event handling for real-time monitoring +- **Agents::Guard**: Base class for guardrails (input/output content validation) +- **Agents::GuardResult**: Value object for guard outcomes (pass/rewrite/tripwire) +- **Agents::GuardRunner**: Ordered guard chain executor with fail-open/closed modes ### Key Design Principles @@ -143,6 +151,9 @@ lib/agents/ ├── tool_context.rb # Tool execution context ├── tool_wrapper.rb # Thread-safe tool wrapping ├── callback_manager.rb # Centralized callback event handling +├── guard.rb # Base class for guardrails (input/output validators) +├── guard_result.rb # Value object for guard outcomes (pass/rewrite/tripwire) +├── guard_runner.rb # Ordered guard chain executor ├── message_extractor.rb # Conversation history processing └── version.rb # Gem version ``` @@ -231,6 +242,7 @@ The SDK includes a comprehensive callback system for monitoring agent execution - `on_tool_start`: Triggered when a tool begins execution - `on_tool_complete`: Triggered when a tool finishes execution - `on_agent_handoff`: Triggered when control transfers between agents +- `on_guard_triggered`: Triggered when a guard produces a non-pass result (rewrite or tripwire) ### Callback Integration diff --git a/docs/concepts/guardrails.md b/docs/concepts/guardrails.md new file mode 100644 index 0000000..3b65176 --- /dev/null +++ b/docs/concepts/guardrails.md @@ -0,0 +1,205 @@ +--- +layout: default +title: Guardrails +parent: Concepts +nav_order: 8 +--- + +# Guardrails + +Guardrails are composable validation layers that intercept content before it reaches an agent (input guards) and before it returns to the caller (output guards). They allow you to enforce policies, redact sensitive data, and abort runs when content violates your rules. + +## How Guards Work + +A guard is a stateless class that receives content and returns one of three outcomes: + +- **Pass** (return `nil` or `GuardResult.pass`): Content is acceptable, continue execution. +- **Rewrite** (`GuardResult.rewrite`): Replace the content with a modified version. +- **Tripwire** (`GuardResult.tripwire`): Abort the run immediately with an error. + +```ruby +class PiiRedactor < Agents::Guard + guard_name "pii_redactor" + description "Redacts Social Security numbers from content" + + def call(content, context) + redacted = content.gsub(/\b\d{3}-\d{2}-\d{4}\b/, "[REDACTED]") + GuardResult.rewrite(redacted, message: "SSN redacted") if redacted != content + end +end +``` + +## Input Guards vs Output Guards + +**Input guards** run before the first LLM call. They validate or transform the user's message before the agent sees it. Use them for prompt injection detection, input sanitization, or content filtering. + +**Output guards** run on the agent's final response before it returns to the caller. They validate or transform what the agent says back. Use them for PII redaction, topic fencing, or response quality checks. + +```ruby +agent = Agents::Agent.new( + name: "Support", + instructions: "You are a helpful support agent.", + input_guards: [PromptInjectionGuard.new], + output_guards: [PiiRedactor.new, TopicFence.new] +) +``` + +Guards execute in array order. Each guard sees the output of the previous guard's potential rewrite, forming a processing pipeline. + +## Writing a Guard + +Extend `Agents::Guard` and implement the `call` method: + +```ruby +class MaxLengthGuard < Agents::Guard + guard_name "max_length" + description "Tripwires if content exceeds maximum length" + + def initialize(max:) + super() + @max = max + end + + def call(content, context) + if content.length > @max + GuardResult.tripwire( + message: "Content exceeds #{@max} characters", + metadata: { length: content.length, max: @max } + ) + end + end +end +``` + +Guards follow the same thread-safety principles as Tools: +- No execution state in instance variables (only configuration like `@max` above) +- All shared state flows through the `context` parameter +- Guard instances are immutable after creation + +## Tripwires + +When a guard tripwires, the run aborts immediately. The result includes structured metadata about what happened: + +```ruby +result = runner.run("Tell me a secret") + +if result.tripwired? + puts result.guardrail_tripwire[:guard_name] # => "content_policy" + puts result.guardrail_tripwire[:message] # => "Response violates content policy" + puts result.guardrail_tripwire[:metadata] # => { category: "secrets" } +end +``` + +Tripwires short-circuit the guard chain. If guard 1 tripwires, guards 2 and 3 never run. + +## Fail-Open vs Fail-Closed + +By default, guards are **fail-open**: if a guard raises an unexpected exception (not a Tripwire), the error is logged and the guard is skipped. This prevents a buggy guard from breaking your entire application. + +For high-security contexts, you can configure **fail-closed** (strict) mode on the agent. In strict mode, any unexpected guard exception is converted to a tripwire: + +```ruby +# Fail-open (default) — buggy guard is skipped, run continues +agent = Agents::Agent.new( + name: "Support", + input_guards: [PotentiallyBuggyGuard.new] +) + +# Fail-closed — any guard error aborts the run +# (configured via GuardRunner strict: true, typically set at the runner level) +``` + +## Structured Output + +When an agent uses `response_schema`, the LLM returns structured data (a Hash). Output guards still receive a String — the SDK automatically serializes the Hash to JSON before the guard chain and deserializes it back after any rewrite. This means your guards always operate on Strings regardless of output format. + +```ruby +# This guard works on both plain text and structured output +class ContentFilter < Agents::Guard + guard_name "content_filter" + + def call(content, context) + # content is always a String — JSON for structured output + if content.include?("forbidden") + GuardResult.tripwire(message: "Forbidden content detected") + end + end +end +``` + +## Guards Across Handoffs + +Guards are agent-scoped. When agent A hands off to agent B: + +- Agent A's **input guards** ran once on the original user input (before the handoff decision). +- Agent A's **output guards** do NOT run — the handoff interrupts before a final response. +- Agent B's **output guards** run on agent B's final response. + +This means each agent enforces its own policies independently. + +## Callbacks and Instrumentation + +Guard activity is observable through the callback system: + +```ruby +runner = Agents::Runner.with_agents(agent) + .on_guard_triggered { |guard_name, phase, action, message, ctx| + puts "Guard #{guard_name} (#{phase}): #{action} — #{message}" + } +``` + +The callback fires for every non-pass result (rewrites and tripwires). It does not fire when guards pass. + +If OpenTelemetry instrumentation is installed, guard events produce `agents.run.guard.*` spans with attributes for guard name, phase (input/output), action (rewrite/tripwire), and message. + +## Complete Example + +```ruby +class PromptInjectionGuard < Agents::Guard + guard_name "prompt_injection" + description "Detects common prompt injection patterns" + + def call(content, context) + patterns = [ + /ignore\s+(all\s+)?previous\s+instructions/i, + /you\s+are\s+now\s+a/i, + /disregard\s+(all\s+)?prior/i + ] + + if patterns.any? { |p| content.match?(p) } + GuardResult.tripwire( + message: "Potential prompt injection detected", + metadata: { input_length: content.length } + ) + end + end +end + +class PiiRedactor < Agents::Guard + guard_name "pii_redactor" + description "Redacts SSNs and email addresses" + + def call(content, context) + redacted = content + .gsub(/\b\d{3}-\d{2}-\d{4}\b/, "[SSN REDACTED]") + .gsub(/\b[\w.+-]+@[\w-]+\.[\w.]+\b/, "[EMAIL REDACTED]") + + GuardResult.rewrite(redacted, message: "PII redacted") if redacted != content + end +end + +agent = Agents::Agent.new( + name: "Support", + instructions: "You are a helpful customer support agent.", + input_guards: [PromptInjectionGuard.new], + output_guards: [PiiRedactor.new] +) + +runner = Agents::Runner.with_agents(agent) + .on_guard_triggered { |name, phase, action, msg| + Rails.logger.info("Guard #{name} (#{phase}): #{action}") + } + +result = runner.run("What is my email?") +# Output PII is automatically redacted before reaching the user +``` diff --git a/lib/agents.rb b/lib/agents.rb index f00f596..d45ceb5 100644 --- a/lib/agents.rb +++ b/lib/agents.rb @@ -112,6 +112,9 @@ def configured? require_relative "agents/tool" require_relative "agents/handoff" require_relative "agents/helpers" +require_relative "agents/guard_result" +require_relative "agents/guard" +require_relative "agents/guard_runner" require_relative "agents/agent" # Execution components diff --git a/lib/agents/agent.rb b/lib/agents/agent.rb index 371c15d..7df2cef 100644 --- a/lib/agents/agent.rb +++ b/lib/agents/agent.rb @@ -50,7 +50,8 @@ # ) module Agents class Agent - attr_reader :name, :instructions, :model, :tools, :handoff_agents, :temperature, :response_schema, :headers, :params + attr_reader :name, :instructions, :model, :tools, :handoff_agents, :temperature, :response_schema, :headers, :params, + :input_guards, :output_guards # Initialize a new Agent instance # @@ -64,7 +65,7 @@ class Agent # @param headers [Hash, nil] Default HTTP headers applied to LLM requests # @param params [Hash, nil] Default provider-specific parameters applied to LLM requests (e.g., service_tier) def initialize(name:, instructions: nil, model: "gpt-4.1-mini", tools: [], handoff_agents: [], temperature: 0.7, - response_schema: nil, headers: nil, params: nil) + response_schema: nil, headers: nil, params: nil, input_guards: [], output_guards: []) @name = name @instructions = instructions @model = model @@ -74,6 +75,8 @@ def initialize(name:, instructions: nil, model: "gpt-4.1-mini", tools: [], hando @response_schema = response_schema @headers = Helpers::HashNormalizer.normalize(headers, label: "headers", freeze_result: true) @params = Helpers::HashNormalizer.normalize(params, label: "params", freeze_result: true) + @input_guards = input_guards.dup.freeze + @output_guards = output_guards.dup.freeze # Mutex for thread-safe handoff registration # While agents are typically configured at startup, we want to ensure @@ -170,7 +173,9 @@ def clone(**changes) temperature: changes.fetch(:temperature, @temperature), response_schema: changes.fetch(:response_schema, @response_schema), headers: changes.fetch(:headers, @headers), - params: changes.fetch(:params, @params) + params: changes.fetch(:params, @params), + input_guards: changes.fetch(:input_guards, @input_guards), + output_guards: changes.fetch(:output_guards, @output_guards) ) end diff --git a/lib/agents/agent_runner.rb b/lib/agents/agent_runner.rb index a45ed22..b5950b4 100644 --- a/lib/agents/agent_runner.rb +++ b/lib/agents/agent_runner.rb @@ -54,7 +54,8 @@ def initialize(agents) agent_thinking: [], agent_handoff: [], llm_call_complete: [], - chat_created: [] + chat_created: [], + guard_triggered: [] } end @@ -195,6 +196,18 @@ def on_chat_created(&block) self end + # Register a callback for guard triggered events. + # Called when a guardrail produces a non-pass result (rewrite or tripwire). + # + # @param block [Proc] Callback block that receives (guard_name, phase, action, message, context_wrapper) + # @return [self] For method chaining + def on_guard_triggered(&block) + return self unless block + + @callbacks_mutex.synchronize { @callbacks[:guard_triggered] << block } + self + end + private # Build agent registry from provided agents only. diff --git a/lib/agents/callback_manager.rb b/lib/agents/callback_manager.rb index 12f318a..4c0c50d 100644 --- a/lib/agents/callback_manager.rb +++ b/lib/agents/callback_manager.rb @@ -22,6 +22,7 @@ class CallbackManager agent_handoff llm_call_complete chat_created + guard_triggered ].freeze def initialize(callbacks = {}) diff --git a/lib/agents/guard.rb b/lib/agents/guard.rb new file mode 100644 index 0000000..d4d0ce8 --- /dev/null +++ b/lib/agents/guard.rb @@ -0,0 +1,98 @@ +# frozen_string_literal: true + +# Guard is the base class for all guardrails, providing a stateless interface for +# validating and transforming input/output content before and after agent execution. +# +# ## Thread-Safe Design +# Guards follow the same thread-safety principles as Tools: +# 1. No execution state in instance variables - only configuration +# 2. All state passed through parameters - RunContext provides shared state +# 3. Immutable guard instances - create once, use everywhere +# 4. Stateless call methods - pure functions with context input +# +# ## Guard Actions +# A guard's `call` method returns nil (pass) or a GuardResult: +# - **pass**: Content is acceptable, continue execution +# - **rewrite**: Replace the content with a modified version +# - **tripwire**: Abort the run immediately with an error +# +# @example Detecting prompt injection +# class PromptInjectionGuard < Agents::Guard +# guard_name "prompt_injection_detector" +# description "Detects common prompt injection patterns" +# +# def call(content, context) +# return if content.nil? +# +# if content.match?(/ignore\s+(all\s+)?previous\s+instructions/i) +# GuardResult.tripwire(message: "Potential prompt injection detected") +# end +# end +# end +# +# @example Redacting PII from output +# class PiiRedactor < Agents::Guard +# guard_name "pii_redactor" +# description "Redacts SSNs from output" +# +# def call(content, context) +# return if content.nil? +# redacted = content.gsub(/\b\d{3}-\d{2}-\d{4}\b/, "[REDACTED]") +# GuardResult.rewrite(redacted, message: "PII redacted") if redacted != content +# end +# end +module Agents + # Base class for all guardrails. See top-of-file comment for design details. + class Guard + # Exception raised when a guard tripwires, aborting the run. + class Tripwire < StandardError + attr_reader :guard_name, :metadata + + def initialize(message, guard_name:, metadata: {}) + @guard_name = guard_name + @metadata = metadata + super(message) + end + end + + # Evaluate content against this guard. + # Subclasses must implement this method. + # + # @param content [String] The input or output text being validated + # @param context [Agents::RunContext] The current execution context + # @return [GuardResult, nil] nil means pass; GuardResult for rewrite/tripwire + def call(content, context) + raise NotImplementedError, "Guards must implement #call(content, context)" + end + + # DSL method to set or get the guard's name. + # Defaults to the class name's last segment if not explicitly set. + # + # @param value [String, nil] The guard name to set, or nil to get + # @return [String] The guard's name + def self.guard_name(value = nil) + if value + @guard_name = value + else + @guard_name || name&.split("::")&.last + end + end + + # DSL method to set or get the guard's description. + # + # @param value [String, nil] The description to set, or nil to get + # @return [String, nil] The guard's description + def self.description(value = nil) + if value + @description = value + else + @description + end + end + + # Instance-level name accessor, delegates to class method. + def name + self.class.guard_name + end + end +end diff --git a/lib/agents/guard_result.rb b/lib/agents/guard_result.rb new file mode 100644 index 0000000..56eb906 --- /dev/null +++ b/lib/agents/guard_result.rb @@ -0,0 +1,56 @@ +# frozen_string_literal: true + +module Agents + # Value object representing the outcome of a guard evaluation. + # A guard can pass (no action), rewrite content, or tripwire (abort the run). + # + # @example Passing (no issues found) + # GuardResult.pass + # + # @example Rewriting content + # GuardResult.rewrite("redacted output", message: "PII removed") + # + # @example Tripwiring (aborting the run) + # GuardResult.tripwire(message: "Prompt injection detected") + class GuardResult + attr_reader :action, :content, :output, :message, :metadata + + # @param action [Symbol] :pass, :rewrite, or :tripwire + # @param content [String, nil] Rewritten content as a String (only meaningful for :rewrite) + # @param output [Object, nil] Type-preserved output — matches original input type (Hash/Array/String). + # Set by GuardRunner when structured content is serialized for guards and deserialized back. + # Defaults to content when not explicitly provided. + # @param message [String] Human-readable explanation of the guard's decision + # @param metadata [Hash] Arbitrary data for logging/instrumentation + def initialize(action:, content: nil, output: nil, message: "", metadata: {}) + @action = action + @content = content + @output = output || content + @message = message + @metadata = metadata + end + + def pass? = action == :pass + def rewrite? = action == :rewrite + def tripwire? = action == :tripwire + + # Create a passing result (no action needed). + def self.pass(message: "", metadata: {}) + new(action: :pass, message: message, metadata: metadata) + end + + # Create a rewrite result with replacement content. + # + # @param content [String] The rewritten content to use instead of the original + def self.rewrite(content, message: "", metadata: {}) + new(action: :rewrite, content: content, message: message, metadata: metadata) + end + + # Create a tripwire result that aborts the run. + # + # @param message [String] Explanation of why the run was aborted + def self.tripwire(message:, metadata: {}) + new(action: :tripwire, message: message, metadata: metadata) + end + end +end diff --git a/lib/agents/guard_runner.rb b/lib/agents/guard_runner.rb new file mode 100644 index 0000000..0cf2eb0 --- /dev/null +++ b/lib/agents/guard_runner.rb @@ -0,0 +1,115 @@ +# frozen_string_literal: true + +module Agents + # Executes an ordered chain of guards against content. + # Guards run in array order, each seeing the (potentially rewritten) output of the previous guard. + # A tripwire short-circuits the chain immediately. + # + # ## Structured Output + # When content is a Hash or Array (e.g. from response_schema), it is serialized to JSON + # before the guard chain so guards always receive a String. If any guard rewrites, the + # final result deserializes back to the original type. Access `result.output` to get + # the type-preserved value. + # + # ## Fail-Open vs Fail-Closed + # By default, if a guard raises an unexpected exception, it is logged and treated as a pass + # (fail-open). With `strict: true`, unexpected exceptions become tripwires (fail-closed). + # + # @example Running guards + # result = GuardRunner.run( + # [PiiRedactor.new, TopicFence.new], + # "Some content with 123-45-6789", + # context_wrapper, + # phase: :output + # ) + # result.content # => "Some content with [REDACTED]" + class GuardRunner + # Run a chain of guards against content. + # + # @param guards [Array] Ordered list of guards to execute + # @param content [String, Hash, Array, nil] The content to validate + # @param context [RunContext] Execution context + # @param phase [Symbol] :input or :output (used in callbacks/instrumentation) + # @param strict [Boolean] If true, guard exceptions become tripwires instead of being swallowed + # @return [GuardResult] Final result after all guards run (content may have been rewritten) + # @raise [Guard::Tripwire] If any guard tripwires + def self.run(guards, content, context, phase:, strict: false) + return GuardResult.new(action: :pass, content: content) if guards.empty? || content.nil? + + # Serialize structured content so guards always receive a String + structured = content.is_a?(Hash) || content.is_a?(Array) + current_content = structured ? content.to_json : content + any_rewrite = false + + guards.each do |guard| + result = safe_execute(guard, current_content, context, strict: strict) + next if result.nil? || result.pass? + + any_rewrite = true if result.rewrite? + current_content = apply_result(result, guard, phase, context) + end + + action = any_rewrite ? :rewrite : :pass + output = resolve_output(any_rewrite, structured, current_content, content) + GuardResult.new(action: action, content: current_content, output: output) + end + + # Resolves the final output value after the guard chain completes. + # Handles structured content deserialization back to Hash/Array when guards rewrote it. + def self.resolve_output(any_rewrite, structured, current_content, original_content) + return original_content unless any_rewrite + return current_content unless structured + + JSON.parse(current_content) + rescue JSON::ParserError => e + raise JSON::ParserError, + "Guard chain produced invalid JSON for structured output: #{e.message}" + end + + # Emits a callback and applies the guard result (rewrite or tripwire). + # @return [String] The (possibly rewritten) content + # @raise [Guard::Tripwire] If the result is a tripwire + def self.apply_result(result, guard, phase, context) + context.callback_manager.emit_guard_triggered( + guard.name, phase, result.action, result.message, context + ) + + if result.tripwire? + raise Guard::Tripwire.new( + result.message, + guard_name: guard.name, + metadata: result.metadata + ) + end + + result.content + end + + # Execute a single guard with error handling. + # + # @param guard [Guard] The guard to execute + # @param content [String] Content to validate + # @param context [RunContext] Execution context + # @param strict [Boolean] Whether to fail-closed on errors + # @return [GuardResult, nil] The guard's result, or nil on swallowed errors + # @raise [Guard::Tripwire] On tripwires (always) or on errors in strict mode + def self.safe_execute(guard, content, context, strict: false) + result = guard.call(content, context) + return result if result.nil? || result.is_a?(GuardResult) + + raise TypeError, "Guard #{guard.name} must return nil or GuardResult, got #{result.class}" + rescue Guard::Tripwire + raise # Always re-raise tripwires + rescue StandardError => e + if strict + raise Guard::Tripwire.new( + "Guard #{guard.name} failed: #{e.message}", + guard_name: guard.name, + metadata: { original_error: e.class.name } + ) + end + Agents.logger&.warn("Guard #{guard.name} error (non-strict, passing): #{e.message}") + nil # Fail open + end + end +end diff --git a/lib/agents/instrumentation/constants.rb b/lib/agents/instrumentation/constants.rb index ef36de1..fd285ab 100644 --- a/lib/agents/instrumentation/constants.rb +++ b/lib/agents/instrumentation/constants.rb @@ -30,6 +30,12 @@ module Constants ATTR_LANGFUSE_OBS_TYPE = "langfuse.observation.type" ATTR_LANGFUSE_OBS_INPUT = "langfuse.observation.input" ATTR_LANGFUSE_OBS_OUTPUT = "langfuse.observation.output" + + # Guard attributes + ATTR_GUARD_NAME = "agents.guard.name" + ATTR_GUARD_PHASE = "agents.guard.phase" + ATTR_GUARD_ACTION = "agents.guard.action" + ATTR_GUARD_MESSAGE = "agents.guard.message" end end end diff --git a/lib/agents/instrumentation/tracing_callbacks.rb b/lib/agents/instrumentation/tracing_callbacks.rb index 5b9e6e5..59c870f 100644 --- a/lib/agents/instrumentation/tracing_callbacks.rb +++ b/lib/agents/instrumentation/tracing_callbacks.rb @@ -121,6 +121,22 @@ def on_agent_handoff(from_agent, to_agent, reason, context_wrapper) ) end + def on_guard_triggered(guard_name, phase, action, message, context_wrapper) + tracing = tracing_state(context_wrapper) + return unless tracing + + parent = parent_context(tracing) + attributes = { + ATTR_GUARD_NAME => guard_name.to_s, + ATTR_GUARD_PHASE => phase.to_s, + ATTR_GUARD_ACTION => action.to_s + } + attributes[ATTR_GUARD_MESSAGE] = message if message && !message.empty? + + span = @tracer.start_span("#{@trace_name}.guard.#{guard_name}", with_parent: parent, attributes: attributes) + span.finish + end + def on_run_complete(_agent_name, result, context_wrapper) tracing = tracing_state(context_wrapper) return unless tracing diff --git a/lib/agents/result.rb b/lib/agents/result.rb index 4c1ca23..2d37c29 100644 --- a/lib/agents/result.rb +++ b/lib/agents/result.rb @@ -1,7 +1,7 @@ # frozen_string_literal: true module Agents - RunResult = Struct.new(:output, :messages, :usage, :error, :context, keyword_init: true) do + RunResult = Struct.new(:output, :messages, :usage, :error, :context, :guardrail_tripwire, keyword_init: true) do def success? error.nil? && !output.nil? end @@ -9,5 +9,9 @@ def success? def failed? !success? end + + def tripwired? + !guardrail_tripwire.nil? + end end end diff --git a/lib/agents/runner.rb b/lib/agents/runner.rb index 5fceb81..d6b2feb 100644 --- a/lib/agents/runner.rb +++ b/lib/agents/runner.rb @@ -110,9 +110,17 @@ def run(starting_agent, input, context: {}, registry: {}, max_turns: DEFAULT_MAX apply_params(chat, current_params) configure_chat_for_agent(chat, current_agent, context_wrapper, replace: false) restore_conversation_history(chat, context_wrapper) - input_already_in_history = last_message_matches?(chat, input) context_wrapper.callback_manager.emit_chat_created(chat, current_agent.name, current_agent.model, context_wrapper) + # Run input guards before the first LLM call + input_guard_result = GuardRunner.run( + current_agent.input_guards, input, context_wrapper, phase: :input + ) + input = input_guard_result.output if input_guard_result.rewrite? + + # Check dedup AFTER guards so rewritten input is compared against history + input_already_in_history = last_message_matches?(chat, input) + loop do current_turn += 1 raise MaxTurnsExceeded, "Exceeded maximum turns: #{max_turns}" if current_turn > max_turns @@ -175,27 +183,56 @@ def run(starting_agent, input, context: {}, registry: {}, max_turns: DEFAULT_MAX chat, current_agent.name, current_agent.model, context_wrapper ) + # Run Agent B's input guards on the conversation context + # The last user message is the relevant input for the new agent's guards + last_user_msg = chat.messages.reverse.find { |m| m.role == :user }&.content.to_s + unless last_user_msg.empty? + handoff_guard_result = GuardRunner.run( + current_agent.input_guards, last_user_msg, context_wrapper, phase: :input + ) + # If Agent B's input guard tripwires, the rescue below handles it + end + # Force the new agent to respond to the conversation context # This ensures the user gets a response from the new agent input = nil next end - # Handle non-handoff halts - return the halt content as final response + # Handle non-handoff halts - run output guards before returning if response.is_a?(RubyLLM::Tool::Halt) - return finalize_run(chat, context_wrapper, current_agent, output: response.content) + halt_output = response.content + halt_guard_result = GuardRunner.run( + current_agent.output_guards, halt_output, context_wrapper, phase: :output + ) + halt_output = halt_guard_result.output if halt_guard_result.rewrite? + return finalize_run(chat, context_wrapper, current_agent, output: halt_output) end # If tools were called, continue the loop to let them execute next if response.tool_call? # If no tools were called, we have our final response - return finalize_run(chat, context_wrapper, current_agent, output: response.content) + + # Run output guards before returning + final_output = response.content + output_guard_result = GuardRunner.run( + current_agent.output_guards, final_output, context_wrapper, phase: :output + ) + final_output = output_guard_result.output if output_guard_result.rewrite? + + return finalize_run(chat, context_wrapper, current_agent, output: final_output) end + rescue Guard::Tripwire => e + finalize_run(chat, context_wrapper, current_agent, + output: nil, error: e, + guardrail_tripwire: { guard_name: e.guard_name, message: e.message, metadata: e.metadata }) rescue MaxTurnsExceeded => e finalize_run(chat, context_wrapper, current_agent, output: "Conversation ended: #{e.message}", error: e) rescue StandardError => e + raise if e.is_a?(Guard::Tripwire) # safety net — should be caught above + finalize_run(chat, context_wrapper, current_agent, output: nil, error: e) end @@ -210,7 +247,7 @@ def run(starting_agent, input, context: {}, registry: {}, max_turns: DEFAULT_MAX # @param output [String, nil] The output text for the result # @param error [StandardError, nil] Optional error to attach to the result # @return [RunResult] - def finalize_run(chat, context_wrapper, current_agent, output:, error: nil) + def finalize_run(chat, context_wrapper, current_agent, output:, error: nil, guardrail_tripwire: nil) save_conversation_state(chat, context_wrapper, current_agent) if chat result = RunResult.new( @@ -218,7 +255,8 @@ def finalize_run(chat, context_wrapper, current_agent, output:, error: nil) messages: chat ? Helpers::MessageExtractor.extract_messages(chat, current_agent) : [], usage: context_wrapper.usage, error: error, - context: context_wrapper.context + context: context_wrapper.context, + guardrail_tripwire: guardrail_tripwire ) context_wrapper.callback_manager.emit_agent_complete(current_agent.name, result, error, context_wrapper) diff --git a/spec/agents/agent_runner_spec.rb b/spec/agents/agent_runner_spec.rb index 8c892d6..38236b7 100644 --- a/spec/agents/agent_runner_spec.rb +++ b/spec/agents/agent_runner_spec.rb @@ -623,7 +623,8 @@ agent_thinking: [], agent_handoff: [], llm_call_complete: [], - chat_created: [] + chat_created: [], + guard_triggered: [] } ) ) diff --git a/spec/agents/guard_result_spec.rb b/spec/agents/guard_result_spec.rb new file mode 100644 index 0000000..169c0d6 --- /dev/null +++ b/spec/agents/guard_result_spec.rb @@ -0,0 +1,112 @@ +# frozen_string_literal: true + +require_relative "../../lib/agents" + +RSpec.describe Agents::GuardResult do + describe ".pass" do + it "creates a passing result" do + result = described_class.pass + expect(result.pass?).to be true + expect(result.rewrite?).to be false + expect(result.tripwire?).to be false + end + + it "defaults message to empty string" do + result = described_class.pass + expect(result.message).to eq("") + end + + it "accepts optional message and metadata" do + result = described_class.pass(message: "all good", metadata: { score: 0.99 }) + expect(result.message).to eq("all good") + expect(result.metadata).to eq({ score: 0.99 }) + end + + it "has nil content" do + result = described_class.pass + expect(result.content).to be_nil + end + end + + describe ".rewrite" do + it "creates a rewrite result with replacement content" do + result = described_class.rewrite("cleaned output") + expect(result.rewrite?).to be true + expect(result.pass?).to be false + expect(result.tripwire?).to be false + expect(result.content).to eq("cleaned output") + end + + it "accepts optional message and metadata" do + result = described_class.rewrite("redacted", message: "PII found", metadata: { count: 2 }) + expect(result.message).to eq("PII found") + expect(result.metadata).to eq({ count: 2 }) + end + + it "allows empty string as content" do + result = described_class.rewrite("") + expect(result.rewrite?).to be true + expect(result.content).to eq("") + end + end + + describe ".tripwire" do + it "creates a tripwire result" do + result = described_class.tripwire(message: "blocked") + expect(result.tripwire?).to be true + expect(result.pass?).to be false + expect(result.rewrite?).to be false + end + + it "stores the message" do + result = described_class.tripwire(message: "Prompt injection detected") + expect(result.message).to eq("Prompt injection detected") + end + + it "has nil content" do + result = described_class.tripwire(message: "blocked") + expect(result.content).to be_nil + end + + it "accepts optional metadata" do + result = described_class.tripwire(message: "blocked", metadata: { pattern: "sql_injection" }) + expect(result.metadata).to eq({ pattern: "sql_injection" }) + end + end + + describe "#initialize" do + it "creates result with all fields" do + result = described_class.new( + action: :rewrite, + content: "new content", + message: "rewritten", + metadata: { key: "value" } + ) + expect(result.action).to eq(:rewrite) + expect(result.content).to eq("new content") + expect(result.message).to eq("rewritten") + expect(result.metadata).to eq({ key: "value" }) + end + + it "defaults metadata to empty hash" do + result = described_class.new(action: :pass) + expect(result.metadata).to eq({}) + end + + it "defaults message to empty string" do + result = described_class.new(action: :pass) + expect(result.message).to eq("") + end + + it "defaults output to content when not provided" do + result = described_class.new(action: :rewrite, content: "text") + expect(result.output).to eq("text") + end + + it "stores output separately when provided" do + result = described_class.new(action: :rewrite, content: '{"a":1}', output: { "a" => 1 }) + expect(result.content).to eq('{"a":1}') + expect(result.output).to eq({ "a" => 1 }) + end + end +end diff --git a/spec/agents/guard_runner_spec.rb b/spec/agents/guard_runner_spec.rb new file mode 100644 index 0000000..edb444f --- /dev/null +++ b/spec/agents/guard_runner_spec.rb @@ -0,0 +1,240 @@ +# frozen_string_literal: true + +require_relative "../../lib/agents" + +RSpec.describe Agents::GuardRunner do + let(:callback_manager) { instance_double(Agents::CallbackManager) } + let(:context_wrapper) do + instance_double(Agents::RunContext, callback_manager: callback_manager, context: {}) + end + + before do + allow(callback_manager).to receive(:emit_guard_triggered) + end + + def build_guard(name: "test_guard", &block) + guard_class = Class.new(Agents::Guard) do + guard_name name + define_method(:call, &block) + end + guard_class.new + end + + describe ".run" do + context "with no guards" do + it "returns a passing result with original content" do + result = described_class.run([], "hello", context_wrapper, phase: :input) + expect(result.pass?).to be true + expect(result.content).to eq("hello") + end + end + + context "with a single passing guard" do + it "returns a passing result" do + guard = build_guard { |_content, _ctx| nil } + result = described_class.run([guard], "hello", context_wrapper, phase: :input) + expect(result.pass?).to be true + expect(result.content).to eq("hello") + end + + it "does not emit a callback" do + guard = build_guard { |_content, _ctx| nil } + described_class.run([guard], "hello", context_wrapper, phase: :input) + expect(callback_manager).not_to have_received(:emit_guard_triggered) + end + end + + context "with a guard that returns GuardResult.pass" do + it "treats explicit pass the same as nil" do + guard = build_guard { |_content, _ctx| Agents::GuardResult.pass } + result = described_class.run([guard], "hello", context_wrapper, phase: :input) + expect(result.pass?).to be true + expect(callback_manager).not_to have_received(:emit_guard_triggered) + end + end + + context "with a single rewriting guard" do + it "returns result with rewritten content" do + guard = build_guard { |content, _ctx| Agents::GuardResult.rewrite(content.upcase) } + result = described_class.run([guard], "hello", context_wrapper, phase: :output) + expect(result.content).to eq("HELLO") + end + + it "emits a callback" do + guard = build_guard(name: "uppercaser") do |content, _ctx| + Agents::GuardResult.rewrite(content.upcase, message: "uppercased") + end + described_class.run([guard], "hello", context_wrapper, phase: :output) + expect(callback_manager).to have_received(:emit_guard_triggered) + .with("uppercaser", :output, :rewrite, "uppercased", context_wrapper) + end + end + + context "with a tripwire guard" do + it "raises Guard::Tripwire with correct metadata" do + guard = build_guard(name: "blocker") do |_content, _ctx| + Agents::GuardResult.tripwire(message: "blocked", metadata: { reason: "test" }) + end + + error = nil + begin + described_class.run([guard], "hello", context_wrapper, phase: :input) + rescue Agents::Guard::Tripwire => e + error = e + end + + expect(error).not_to be_nil + expect(error.message).to eq("blocked") + expect(error.guard_name).to eq("blocker") + expect(error.metadata).to eq({ reason: "test" }) + end + + it "emits a callback before raising" do + guard = build_guard(name: "blocker") do |_content, _ctx| + Agents::GuardResult.tripwire(message: "blocked") + end + + expect do + described_class.run([guard], "hello", context_wrapper, phase: :input) + end.to raise_error(Agents::Guard::Tripwire) + + expect(callback_manager).to have_received(:emit_guard_triggered) + .with("blocker", :input, :tripwire, "blocked", context_wrapper) + end + end + + context "with chained guards" do + it "applies rewrites in order" do + guard1 = build_guard { |content, _ctx| Agents::GuardResult.rewrite("#{content}!") } + guard2 = build_guard { |content, _ctx| Agents::GuardResult.rewrite(content.upcase) } + + result = described_class.run([guard1, guard2], "hello", context_wrapper, phase: :output) + expect(result.content).to eq("HELLO!") + end + + it "tripwire sees rewritten content from earlier guard" do + seen_content = nil + guard1 = build_guard { |_content, _ctx| Agents::GuardResult.rewrite("REDACTED") } + guard2 = build_guard(name: "checker") do |content, _ctx| + seen_content = content + Agents::GuardResult.tripwire(message: "still bad") + end + + expect do + described_class.run([guard1, guard2], "secret 123-45-6789", context_wrapper, phase: :output) + end.to raise_error(Agents::Guard::Tripwire) + + expect(seen_content).to eq("REDACTED") + end + + it "short-circuits on tripwire -- subsequent guards do not run" do + guard2_called = false + guard1 = build_guard(name: "blocker") do |_content, _ctx| + Agents::GuardResult.tripwire(message: "blocked") + end + guard2 = build_guard do |_content, _ctx| + guard2_called = true + nil + end + + expect do + described_class.run([guard1, guard2], "hello", context_wrapper, phase: :input) + end.to raise_error(Agents::Guard::Tripwire) + + expect(guard2_called).to be false + end + + it "passes between rewrites do not reset content" do + guard1 = build_guard { |content, _ctx| Agents::GuardResult.rewrite("#{content}!") } + guard2 = build_guard { |_content, _ctx| nil } # pass + guard3 = build_guard { |content, _ctx| Agents::GuardResult.rewrite(content.upcase) } + + result = described_class.run([guard1, guard2, guard3], "hello", context_wrapper, phase: :output) + expect(result.content).to eq("HELLO!") + end + end + + context "with fail-open error handling (default)" do + it "swallows unexpected errors and passes" do + guard = build_guard { |_content, _ctx| raise "boom" } + + result = described_class.run([guard], "hello", context_wrapper, phase: :input) + expect(result.pass?).to be true + expect(result.content).to eq("hello") + end + + it "still raises Guard::Tripwire even in non-strict mode" do + guard = build_guard do |_content, _ctx| + raise Agents::Guard::Tripwire.new("abort", guard_name: "test") + end + + expect do + described_class.run([guard], "hello", context_wrapper, phase: :input) + end.to raise_error(Agents::Guard::Tripwire) + end + + it "continues to subsequent guards after a swallowed error" do + guard1 = build_guard { |_content, _ctx| raise "boom" } + guard2 = build_guard { |content, _ctx| Agents::GuardResult.rewrite(content.upcase) } + + result = described_class.run([guard1, guard2], "hello", context_wrapper, phase: :input) + expect(result.content).to eq("HELLO") + end + end + + context "with fail-closed error handling (strict: true)" do + it "converts unexpected errors to tripwires" do + guard = build_guard(name: "failing_guard") { |_content, _ctx| raise "boom" } + + error = nil + begin + described_class.run([guard], "hello", context_wrapper, phase: :input, strict: true) + rescue Agents::Guard::Tripwire => e + error = e + end + + expect(error).not_to be_nil + expect(error.guard_name).to eq("failing_guard") + expect(error.message).to include("boom") + expect(error.metadata[:original_error]).to eq("RuntimeError") + end + end + + context "with invalid guard return type" do + it "raises TypeError in fail-open mode (caught by safe_execute)" do + guard = build_guard { |_content, _ctx| "not a GuardResult" } + # TypeError is a StandardError, so fail-open swallows it and logs + result = described_class.run([guard], "hello", context_wrapper, phase: :input) + expect(result.pass?).to be true + end + + it "raises Guard::Tripwire with clear message in strict mode" do + guard = build_guard(name: "bad_guard") { |_content, _ctx| "not a GuardResult" } + + expect do + described_class.run([guard], "hello", context_wrapper, phase: :input, strict: true) + end.to raise_error(Agents::Guard::Tripwire, /must return nil or GuardResult, got String/) + end + end + + context "with adversarial inputs" do + it "handles nil content gracefully" do + guard = build_guard { |content, _ctx| content.nil? ? nil : Agents::GuardResult.pass } + result = described_class.run([guard], nil, context_wrapper, phase: :input) + expect(result.pass?).to be true + end + + it "handles empty string content" do + guard = build_guard { |_content, _ctx| Agents::GuardResult.rewrite("replaced") } + result = described_class.run([guard], "", context_wrapper, phase: :output) + expect(result.content).to eq("replaced") + end + + it "handles rewrite to empty string" do + guard = build_guard { |_content, _ctx| Agents::GuardResult.rewrite("") } + result = described_class.run([guard], "secret data", context_wrapper, phase: :output) + expect(result.content).to eq("") + end + end + end +end diff --git a/spec/agents/guard_spec.rb b/spec/agents/guard_spec.rb new file mode 100644 index 0000000..6258b80 --- /dev/null +++ b/spec/agents/guard_spec.rb @@ -0,0 +1,113 @@ +# frozen_string_literal: true + +require_relative "../../lib/agents" + +RSpec.describe Agents::Guard do + describe "#call" do + it "raises NotImplementedError when not implemented" do + guard = described_class.new + expect { guard.call("content", nil) }.to raise_error(NotImplementedError, /Guards must implement/) + end + end + + describe ".guard_name" do + it "defaults to the class name's last segment" do + stub_const("MyApp::PiiRedactor", Class.new(described_class)) + expect(MyApp::PiiRedactor.guard_name).to eq("PiiRedactor") + end + + it "can be set explicitly" do + guard_class = Class.new(described_class) do + guard_name "custom_guard" + end + expect(guard_class.guard_name).to eq("custom_guard") + end + + it "is accessible via instance #name" do + guard_class = Class.new(described_class) do + guard_name "my_guard" + end + expect(guard_class.new.name).to eq("my_guard") + end + end + + describe ".description" do + it "defaults to nil" do + guard_class = Class.new(described_class) + expect(guard_class.description).to be_nil + end + + it "can be set explicitly" do + guard_class = Class.new(described_class) do + description "Detects prompt injection" + end + expect(guard_class.description).to eq("Detects prompt injection") + end + end + + describe "subclass implementation" do + let(:passing_guard_class) do + Class.new(described_class) do + guard_name "passing_guard" + + def call(_content, _context) + nil # pass + end + end + end + + let(:rewriting_guard_class) do + Class.new(described_class) do + guard_name "rewriter" + + def call(content, _context) + Agents::GuardResult.rewrite(content.upcase, message: "uppercased") + end + end + end + + let(:tripwire_guard_class) do + Class.new(described_class) do + guard_name "blocker" + + def call(_content, _context) + Agents::GuardResult.tripwire(message: "blocked") + end + end + end + + it "can return nil to pass" do + result = passing_guard_class.new.call("hello", nil) + expect(result).to be_nil + end + + it "can return a rewrite result" do + result = rewriting_guard_class.new.call("hello", nil) + expect(result.rewrite?).to be true + expect(result.content).to eq("HELLO") + end + + it "can return a tripwire result" do + result = tripwire_guard_class.new.call("hello", nil) + expect(result.tripwire?).to be true + end + end + + describe Agents::Guard::Tripwire do + it "is a StandardError" do + expect(described_class.superclass).to eq(StandardError) + end + + it "stores guard_name and metadata" do + error = described_class.new("blocked", guard_name: "pii_guard", metadata: { score: 0.95 }) + expect(error.message).to eq("blocked") + expect(error.guard_name).to eq("pii_guard") + expect(error.metadata).to eq({ score: 0.95 }) + end + + it "defaults metadata to empty hash" do + error = described_class.new("blocked", guard_name: "test") + expect(error.metadata).to eq({}) + end + end +end diff --git a/spec/agents/instrumentation/tracing_callbacks_spec.rb b/spec/agents/instrumentation/tracing_callbacks_spec.rb index 668ee5d..25f0997 100644 --- a/spec/agents/instrumentation/tracing_callbacks_spec.rb +++ b/spec/agents/instrumentation/tracing_callbacks_spec.rb @@ -873,6 +873,51 @@ class Context; end end end + describe "#on_guard_triggered" do + let(:guard_span) { instance_double(OpenTelemetry::Trace::Span) } + + before do + allow(guard_span).to receive_messages(set_attribute: nil, finish: nil) + allow(tracer).to receive(:start_span).and_return(root_span) + callbacks.on_run_start("TestAgent", "Hello", context_wrapper) + end + + it "creates a guard span with correct attributes" do + allow(tracer).to receive(:start_span).and_return(guard_span) + + callbacks.on_guard_triggered("pii_redactor", :output, :rewrite, "SSN redacted", context_wrapper) + + expect(tracer).to have_received(:start_span).with( + "agents.run.guard.pii_redactor", + with_parent: anything, + attributes: hash_including( + "agents.guard.name" => "pii_redactor", + "agents.guard.phase" => "output", + "agents.guard.action" => "rewrite", + "agents.guard.message" => "SSN redacted" + ) + ) + expect(guard_span).to have_received(:finish) + end + + it "omits message attribute when message is empty" do + allow(tracer).to receive(:start_span).and_return(guard_span) + + callbacks.on_guard_triggered("blocker", :input, :tripwire, "", context_wrapper) + + expect(tracer).to have_received(:start_span).with( + "agents.run.guard.blocker", + with_parent: anything, + attributes: hash_not_including("agents.guard.message") + ) + end + + it "does nothing without tracing state" do + fresh_context = instance_double(Agents::RunContext, context: {}) + expect { callbacks.on_guard_triggered("test", :input, :pass, "ok", fresh_context) }.not_to raise_error + end + end + describe "tracing state isolation" do it "stores tracing state per context_wrapper" do context1 = instance_double(Agents::RunContext, context: {}) diff --git a/spec/agents/runner_spec.rb b/spec/agents/runner_spec.rb index 74d1140..32e62e6 100644 --- a/spec/agents/runner_spec.rb +++ b/spec/agents/runner_spec.rb @@ -25,7 +25,9 @@ response_schema: nil, get_system_prompt: "You are a helpful assistant", headers: {}, - params: {}) + params: {}, + input_guards: [], + output_guards: []) end let(:handoff_agent) do @@ -38,7 +40,9 @@ response_schema: nil, get_system_prompt: "You are a specialist", headers: {}, - params: {}) + params: {}, + input_guards: [], + output_guards: []) end let(:test_tool) do @@ -854,7 +858,9 @@ response_schema: nil, get_system_prompt: "You route users to specialists", headers: {}, - params: {}) + params: {}, + input_guards: [], + output_guards: []) end before do @@ -1034,7 +1040,9 @@ response_schema: schema, get_system_prompt: "You provide structured responses", headers: {}, - params: {}) + params: {}, + input_guards: [], + output_guards: []) end it "includes response_schema in API request" do @@ -1107,7 +1115,9 @@ response_schema: nil, get_system_prompt: "You are an agent with tools", headers: {}, - params: {}) + params: {}, + input_guards: [], + output_guards: []) end it "wraps regular tools in ToolWrapper" do @@ -1218,7 +1228,9 @@ response_schema: nil, get_system_prompt: "You route users", headers: {}, - params: {}) + params: {}, + input_guards: [], + output_guards: []) stub_chat_sequence( { tool_calls: [{ name: "handoff_to_handoffagent", arguments: "{}" }] }, @@ -1250,7 +1262,9 @@ response_schema: nil, get_system_prompt: "You route users", headers: {}, - params: {}) + params: {}, + input_guards: [], + output_guards: []) stub_request(:post, "https://api.openai.com/v1/chat/completions") .to_return( @@ -1297,5 +1311,618 @@ expect(run_complete_call[1]).to eq("TriageAgent") end end + + context "with input guards" do + let(:rewriting_guard) do + guard_class = Class.new(Agents::Guard) do + guard_name "uppercaser" + + def call(content, _context) + Agents::GuardResult.rewrite(content.upcase, message: "uppercased") + end + end + guard_class.new + end + + let(:guarded_agent) do + instance_double(Agents::Agent, + name: "GuardedAgent", + model: "gpt-4o", + tools: [], + handoff_agents: [], + temperature: 0.7, + response_schema: nil, + get_system_prompt: "You are a guarded assistant", + headers: {}, + params: {}, + input_guards: [rewriting_guard], + output_guards: []) + end + + it "sends rewritten input to the LLM" do + stub_simple_chat("I received your message") + + result = runner.run(guarded_agent, "hello") + + expect(result.success?).to be true + user_message = result.messages.find { |m| m[:role] == :user } + expect(user_message[:content]).to eq("HELLO") + end + + it "rewritten input defeats stale dedup from conversation history" do + context_with_history = { + conversation_history: [ + { role: :user, content: "hello" } + ], + current_agent: "GuardedAgent" + } + + stub_simple_chat("Got your updated message") + + result = runner.run(guarded_agent, "hello", context: context_with_history) + + expect(result.success?).to be true + user_messages = result.messages.select { |m| m[:role] == :user } + expect(user_messages.last[:content]).to eq("HELLO") + end + end + + context "with output guards" do + let(:redacting_guard) do + guard_class = Class.new(Agents::Guard) do + guard_name "redactor" + + def call(content, _context) + redacted = content.gsub(/\d{3}-\d{2}-\d{4}/, "[REDACTED]") + Agents::GuardResult.rewrite(redacted, message: "PII redacted") if redacted != content + end + end + guard_class.new + end + + let(:output_guarded_agent) do + instance_double(Agents::Agent, + name: "OutputGuardedAgent", + model: "gpt-4o", + tools: [], + handoff_agents: [], + temperature: 0.7, + response_schema: nil, + get_system_prompt: "You are a guarded assistant", + headers: {}, + params: {}, + input_guards: [], + output_guards: [redacting_guard]) + end + + it "rewrites output before returning" do + stub_simple_chat("Your SSN is 123-45-6789") + + result = runner.run(output_guarded_agent, "What is my SSN?") + + expect(result.success?).to be true + expect(result.output).to eq("Your SSN is [REDACTED]") + end + + it "passes output through unchanged when guard returns nil" do + stub_simple_chat("No PII here") + + result = runner.run(output_guarded_agent, "Hello") + + expect(result.success?).to be true + expect(result.output).to eq("No PII here") + end + end + + context "with output guards on structured output" do + let(:schema) do + { + type: "object", + properties: { + answer: { type: "string" }, + ssn: { type: "string" } + }, + required: %w[answer ssn] + } + end + + let(:json_redacting_guard) do + guard_class = Class.new(Agents::Guard) do + guard_name "json_redactor" + + def call(content, _context) + redacted = content.gsub(/\d{3}-\d{2}-\d{4}/, "[REDACTED]") + Agents::GuardResult.rewrite(redacted, message: "PII redacted") if redacted != content + end + end + guard_class.new + end + + let(:tripwire_guard) do + guard_class = Class.new(Agents::Guard) do + guard_name "json_blocker" + + def call(content, _context) + Agents::GuardResult.tripwire(message: "blocked structured output") if content.include?("secret") + end + end + guard_class.new + end + + let(:passing_guard) do + guard_class = Class.new(Agents::Guard) do + guard_name "noop_guard" + + def call(_content, _context) + nil + end + end + guard_class.new + end + + let(:structured_guarded_agent) do + instance_double(Agents::Agent, + name: "StructuredGuardedAgent", + model: "gpt-4o", + tools: [], + handoff_agents: [], + temperature: 0.7, + response_schema: schema, + get_system_prompt: "You provide structured responses", + headers: {}, + params: {}, + input_guards: [], + output_guards: [json_redacting_guard]) + end + + it "redacts values inside structured output" do + stub_simple_chat('{"answer": "here you go", "ssn": "123-45-6789"}') + + result = runner.run(structured_guarded_agent, "What is my SSN?") + + expect(result.success?).to be true + expect(result.output).to be_a(Hash) + expect(result.output["ssn"]).to eq("[REDACTED]") + expect(result.output["answer"]).to eq("here you go") + end + + it "tripwires on structured output" do + tripwire_agent = instance_double(Agents::Agent, + name: "TripwireStructuredAgent", + model: "gpt-4o", + tools: [], + handoff_agents: [], + temperature: 0.7, + response_schema: schema, + get_system_prompt: "You provide structured responses", + headers: {}, + params: {}, + input_guards: [], + output_guards: [tripwire_guard]) + + stub_simple_chat('{"answer": "secret data", "ssn": "000-00-0000"}') + + result = runner.run(tripwire_agent, "Give me the secret") + + expect(result.tripwired?).to be true + expect(result.guardrail_tripwire[:guard_name]).to eq("json_blocker") + end + + it "preserves Hash type when guard passes" do + pass_agent = instance_double(Agents::Agent, + name: "PassStructuredAgent", + model: "gpt-4o", + tools: [], + handoff_agents: [], + temperature: 0.7, + response_schema: schema, + get_system_prompt: "You provide structured responses", + headers: {}, + params: {}, + input_guards: [], + output_guards: [passing_guard]) + + stub_simple_chat('{"answer": "clean", "ssn": "none"}') + + result = runner.run(pass_agent, "Hello") + + expect(result.success?).to be true + expect(result.output).to be_a(Hash) + expect(result.output["answer"]).to eq("clean") + end + end + + context "with output guard that corrupts structured JSON" do + it "returns a failed RunResult with JSON::ParserError" do + corrupting_guard = Class.new(Agents::Guard) do + guard_name "corruptor" + + def call(content, _context) + Agents::GuardResult.rewrite(content[0..5], message: "truncated") + end + end.new + + schema = { type: "object", properties: { answer: { type: "string" } }, required: ["answer"] } + corrupt_agent = instance_double(Agents::Agent, + name: "CorruptAgent", + model: "gpt-4o", + tools: [], + handoff_agents: [], + temperature: 0.7, + response_schema: schema, + get_system_prompt: "You provide structured responses", + headers: {}, + params: {}, + input_guards: [], + output_guards: [corrupting_guard]) + + stub_simple_chat('{"answer": "hello world"}') + + result = runner.run(corrupt_agent, "Hello") + + expect(result.failed?).to be true + expect(result.error).to be_a(JSON::ParserError) + end + end + + context "with input guard tripwire" do + let(:input_tripwire_guard) do + guard_class = Class.new(Agents::Guard) do + guard_name "input_blocker" + + def call(content, _context) + Agents::GuardResult.tripwire(message: "banned input", metadata: { pattern: "evil" }) if content.include?("evil") + end + end + guard_class.new + end + + let(:input_tripwire_agent) do + instance_double(Agents::Agent, + name: "InputTripwireAgent", + model: "gpt-4o", + tools: [], + handoff_agents: [], + temperature: 0.7, + response_schema: nil, + get_system_prompt: "You are guarded", + headers: {}, + params: {}, + input_guards: [input_tripwire_guard], + output_guards: []) + end + + it "returns a tripwired RunResult with metadata" do + stub_simple_chat("should not reach here") + + result = runner.run(input_tripwire_agent, "do something evil") + + expect(result.tripwired?).to be true + expect(result.success?).to be false + expect(result.output).to be_nil + expect(result.guardrail_tripwire[:guard_name]).to eq("input_blocker") + expect(result.guardrail_tripwire[:message]).to eq("banned input") + expect(result.guardrail_tripwire[:metadata]).to eq({ pattern: "evil" }) + end + + it "emits agent_complete and run_complete callbacks on tripwire" do + stub_simple_chat("should not reach here") + + callbacks_called = [] + callbacks = { + run_start: [], + run_complete: [proc { |*args| callbacks_called << [:run_complete, *args] }], + agent_complete: [proc { |*args| callbacks_called << [:agent_complete, *args] }], + agent_thinking: [], + tool_start: [], + tool_complete: [], + agent_handoff: [], + llm_call_complete: [], + chat_created: [], + guard_triggered: [proc { |*args| callbacks_called << [:guard_triggered, *args] }] + } + + runner.run(input_tripwire_agent, "do something evil", callbacks: callbacks) + + guard_event = callbacks_called.find { |c| c[0] == :guard_triggered } + expect(guard_event).not_to be_nil + expect(guard_event[1]).to eq("input_blocker") + + complete_event = callbacks_called.find { |c| c[0] == :agent_complete } + expect(complete_event).not_to be_nil + + run_event = callbacks_called.find { |c| c[0] == :run_complete } + expect(run_event).not_to be_nil + end + end + + context "with guards across handoffs" do + let(:pii_redactor) do + guard_class = Class.new(Agents::Guard) do + guard_name "pii_redactor" + + def call(content, _context) + redacted = content.gsub(/\d{3}-\d{2}-\d{4}/, "[REDACTED]") + Agents::GuardResult.rewrite(redacted, message: "PII redacted") if redacted != content + end + end + guard_class.new + end + + let(:specialist_tripwire) do + guard_class = Class.new(Agents::Guard) do + guard_name "specialist_blocker" + + def call(content, _context) + Agents::GuardResult.tripwire(message: "blocked") if content.include?("specialist") + end + end + guard_class.new + end + + let(:uppercasing_guard) do + guard_class = Class.new(Agents::Guard) do + guard_name "uppercaser" + + def call(content, _context) + Agents::GuardResult.rewrite(content.upcase, message: "uppercased") + end + end + guard_class.new + end + + it "applies agent B's output guards after handoff" do + specialist = instance_double(Agents::Agent, + name: "HandoffAgent", + model: "gpt-4o", + tools: [], + handoff_agents: [], + temperature: 0.7, + response_schema: nil, + get_system_prompt: "You are a specialist", + headers: {}, + params: {}, + input_guards: [], + output_guards: [pii_redactor]) + + triage = instance_double(Agents::Agent, + name: "TriageAgent", + model: "gpt-4o", + tools: [], + handoff_agents: [specialist], + temperature: 0.7, + response_schema: nil, + get_system_prompt: "You route users", + headers: {}, + params: {}, + input_guards: [], + output_guards: []) + + stub_chat_sequence( + { tool_calls: [{ name: "handoff_to_handoffagent", arguments: "{}" }] }, + "Your SSN is 123-45-6789" + ) + + registry = { "TriageAgent" => triage, "HandoffAgent" => specialist } + result = runner.run(triage, "What is my SSN?", registry: registry) + + expect(result.success?).to be true + expect(result.output).to eq("Your SSN is [REDACTED]") + expect(result.context[:current_agent]).to eq("HandoffAgent") + end + + it "does NOT apply agent A's output guards after handoff" do + triage_with_tripwire = instance_double(Agents::Agent, + name: "TriageAgent", + model: "gpt-4o", + tools: [], + handoff_agents: [handoff_agent], + temperature: 0.7, + response_schema: nil, + get_system_prompt: "You route users", + headers: {}, + params: {}, + input_guards: [], + output_guards: [specialist_tripwire]) + + stub_chat_sequence( + { tool_calls: [{ name: "handoff_to_handoffagent", arguments: "{}" }] }, + "I'm the specialist, how can I help?" + ) + + registry = { "TriageAgent" => triage_with_tripwire, "HandoffAgent" => handoff_agent } + result = runner.run(triage_with_tripwire, "Help me", registry: registry) + + expect(result.success?).to be true + expect(result.output).to eq("I'm the specialist, how can I help?") + expect(result.tripwired?).to be false + end + + it "applies agent A's input guards before handoff occurs" do + triage_with_input_guard = instance_double(Agents::Agent, + name: "TriageAgent", + model: "gpt-4o", + tools: [], + handoff_agents: [handoff_agent], + temperature: 0.7, + response_schema: nil, + get_system_prompt: "You route users", + headers: {}, + params: {}, + input_guards: [uppercasing_guard], + output_guards: []) + + stub_chat_sequence( + { tool_calls: [{ name: "handoff_to_handoffagent", arguments: "{}" }] }, + "Specialist here to help" + ) + + registry = { "TriageAgent" => triage_with_input_guard, "HandoffAgent" => handoff_agent } + result = runner.run(triage_with_input_guard, "help me", registry: registry) + + expect(result.success?).to be true + # The user message should be the uppercased version from agent A's input guard + user_message = result.messages.find { |m| m[:role] == :user } + expect(user_message[:content]).to eq("HELP ME") + end + end + + context "with guards on handoff target agent" do + let(:target_tripwire_guard) do + guard_class = Class.new(Agents::Guard) do + guard_name "target_input_blocker" + + def call(content, _context) + Agents::GuardResult.tripwire(message: "blocked by target") if content.include?("blocked") + end + end + guard_class.new + end + + it "runs Agent B's input guards after handoff" do + specialist = instance_double(Agents::Agent, + name: "HandoffAgent", + model: "gpt-4o", + tools: [], + handoff_agents: [], + temperature: 0.7, + response_schema: nil, + get_system_prompt: "You are a specialist", + headers: {}, + params: {}, + input_guards: [target_tripwire_guard], + output_guards: []) + + triage = instance_double(Agents::Agent, + name: "TriageAgent", + model: "gpt-4o", + tools: [], + handoff_agents: [specialist], + temperature: 0.7, + response_schema: nil, + get_system_prompt: "You route users", + headers: {}, + params: {}, + input_guards: [], + output_guards: []) + + stub_chat_sequence( + { tool_calls: [{ name: "handoff_to_handoffagent", arguments: "{}" }] }, + "Specialist here to help" + ) + + registry = { "TriageAgent" => triage, "HandoffAgent" => specialist } + result = runner.run(triage, "this should be blocked", registry: registry) + + expect(result.tripwired?).to be true + expect(result.guardrail_tripwire[:guard_name]).to eq("target_input_blocker") + end + end + + context "with output guards on halt response" do + let(:halt_redacting_guard) do + guard_class = Class.new(Agents::Guard) do + guard_name "halt_redactor" + + def call(content, _context) + redacted = content.gsub(/\d{3}-\d{2}-\d{4}/, "[REDACTED]") + Agents::GuardResult.rewrite(redacted, message: "PII redacted") if redacted != content + end + end + guard_class.new + end + + it "runs output guards on halt content before returning" do + halt_guarded_agent = instance_double(Agents::Agent, + name: "HaltGuardedAgent", + model: "gpt-4o", + tools: [], + handoff_agents: [], + temperature: 0.7, + response_schema: nil, + get_system_prompt: "You are guarded", + headers: {}, + params: {}, + input_guards: [], + output_guards: [halt_redacting_guard]) + + mock_chat = instance_double(RubyLLM::Chat) + mock_halt = instance_double(RubyLLM::Tool::Halt, content: "Your SSN is 123-45-6789") + + allow(mock_halt).to receive(:is_a?).with(RubyLLM::Tool::Halt).and_return(true) + allow(RubyLLM::Chat).to receive(:new).and_return(mock_chat) + allow(runner).to receive_messages( + configure_chat_for_agent: mock_chat, + restore_conversation_history: nil, + save_conversation_state: nil + ) + allow(mock_chat).to receive(:ask).and_return(mock_halt) + + result = runner.run(halt_guarded_agent, "What is my SSN?") + + expect(result.success?).to be true + expect(result.output).to eq("Your SSN is [REDACTED]") + end + end + + context "with output guards on Array structured output" do + let(:array_redacting_guard) do + guard_class = Class.new(Agents::Guard) do + guard_name "array_redactor" + + def call(content, _context) + redacted = content.gsub(/\d{3}-\d{2}-\d{4}/, "[REDACTED]") + Agents::GuardResult.rewrite(redacted, message: "PII redacted") if redacted != content + end + end + guard_class.new + end + + it "redacts values inside Array structured output and preserves Array type" do + schema = { + type: "array", + items: { type: "object", properties: { ssn: { type: "string" } } } + } + array_agent = instance_double(Agents::Agent, + name: "ArrayAgent", + model: "gpt-4o", + tools: [], + handoff_agents: [], + temperature: 0.7, + response_schema: schema, + get_system_prompt: "You provide arrays", + headers: {}, + params: {}, + input_guards: [], + output_guards: [array_redacting_guard]) + + stub_simple_chat('[{"ssn":"123-45-6789"},{"ssn":"987-65-4321"}]') + + result = runner.run(array_agent, "List SSNs") + + expect(result.success?).to be true + expect(result.output).to be_a(Array) + expect(result.output[0]["ssn"]).to eq("[REDACTED]") + expect(result.output[1]["ssn"]).to eq("[REDACTED]") + end + end + + context "without guards" do + it "dedup still works when input matches history" do + context_with_history = { + conversation_history: [ + { role: :user, content: "hello" }, + { role: :assistant, content: "Hi there" } + ], + current_agent: "TestAgent" + } + + stub_simple_chat("Continued response") + + result = runner.run(agent, "hello", context: context_with_history) + + expect(result.success?).to be true + end + end end end