feat(guardrail): add conversation monitoring for AgentSession #4105

Hormold · 2025-11-26T22:11:45Z

Supervisor aka Observer aka Guardrails sub-agent

This PR introduces background sub-agent that watches conversations and injects advice to agent in real-time.

Background: While working on outbound calling (especially cold calling), I hit a wall - fast TTFT is critical to sound human, so the primary agent must be lightweight (compact prompt + blazing-fast LLM=slight dumb). But complex rules overload the prompt and slow it down. Discovered this pattern: offload monitoring to a separate, parallel thinking model.

Background thinking observer watching every conversation
Catches what the realtime agent misses
Keeps agent prompt clean, monitoring logic separate and non-blocking

How it works:

Listens to conversation_item_added events
Every N agent responses, async evaluates transcript with observer LLM
If issue detected, injects advice into agent's chat_ctx
Agent sees it on next response

Async - evaluation takes 1-2s but runs in background, no latency impact
After agent response - ensures complete exchange, avoids partial eval during barge-in
Injects as system message - agent sees it in context, uses it on next turn

Example

# Outbound sales calls - agent pitches product, guardrail watches for compliance
session = AgentSession(
    llm="openai/gpt-4o-mini",  # fast for natural conversation
    guardrail=Guardrail(
        llm="deepseek-ai/deepseek-v3",  # reasons about context
        instructions="""
        Sales compliance monitor:
        - If discussing pricing, agent must mention "terms may vary"
        - If customer sounds hesitant, don't push - offer to send info instead
        - If customer mentions they're on fixed income, flag as vulnerable
        - Never let agent promise specific savings without disclaimer

        These rules are too nuanced for the sales script. Watch and intervene.
        """,
    ),
)

Configuration

instructions (required) - what to watch for, what advice to give
llm (required) - model for evaluation
eval_interval=3 - check every N agent responses (balance cost vs responsiveness)
max_interventions=5 - cap per session (prevent advice spam)
cooldown=10.0 - min seconds between advice (natural pacing)
inject_role="system" - role for injected message (see open question 1)
inject_prefix="[GUARDRAIL ADVISOR]:" - adds context to advice, e.g. "A supervisor is watching this conversation and suggests: ..." Lets you control how agent perceives the advice

Open questions

Do all LLMs support multiple system messages or system messages in the middle of a conversation? The "developer" type was perfect for that, but it was deprecated from most LLMs some time ago.
Should primary realtime be aware about observer injections
How does it preform in realtime - is any one tested on big scale same pattern?
RAG integration: allow user to hook contextual retrieval before evaluation? Observer could pull relevant docs/policies based on conversation topic in the background

Synthetic Benchmarks Test Results (9 Scenarios, 4 Models):

Ran ~90 tests covering different LLMs and scenarios. Focused on booking, support, and frontdesk receptionist use cases using 4 LLMs - customer, realtime agent, observer and judge.

deepseek-v3: 86% precision, 100% recall, 86% follow rate, ~$0.06/100 evals
gpt-4o-mini: 86% precision, 100% recall, 86% follow rate, ~$0.03/100 evals
gpt-4o: 86% precision, 100% recall, 86% follow rate, ~$0.55/100 evals
gpt-5: 100% precision, 67% recall, 50% follow rate, ~$2/100 evals

Findings: cheap models work same as expensive. gpt-5 actually worst (overthinks).
Best result: deepseek-v3 or gpt-4o-mini for observer. And not even thinking :-)

…, ensuring complete context for evaluation

Hormold added 3 commits November 26, 2025 13:36

add initial Guardrail functionality for conversation monitoring, wip

b694e0b

Update guardrail evaluation logic to trigger after assistant response…

b0bbfef

…, ensuring complete context for evaluation

type fixes

ca2084a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(guardrail): add conversation monitoring for AgentSession #4105

feat(guardrail): add conversation monitoring for AgentSession #4105

Hormold commented Nov 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(guardrail): add conversation monitoring for AgentSession #4105

Are you sure you want to change the base?

feat(guardrail): add conversation monitoring for AgentSession #4105

Conversation

Hormold commented Nov 26, 2025

Supervisor aka Observer aka Guardrails sub-agent

How it works:

Example

Configuration

Open questions

Synthetic Benchmarks Test Results (9 Scenarios, 4 Models):

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants