Skip to content

Redesign policy callback context (leaky grounding + missing edge identity) #58

@anormang1992

Description

@anormang1992

Background

Surfaced during the codebase-audit cleanup pass (branch chore/audit-cleanup). Two related concerns about the policy-callback API came up — both touching PolicyCallContext. They're being deferred so the audit-cleanup PR can stay narrow, and so the redesign happens as one coherent pass instead of two.

Concern 1 — Leaky grounding: GroundingResult

PolicyCallContext currently exposes the full GroundingResult:

# src/vre/core/policy/callback.py
class PolicyCallContext(BaseModel):
    tool_name: str
    grounding: GroundingResult            # ← full epistemic trace, gaps, internals
    call_args: tuple[Any, ...]
    call_kwargs: dict[str, Any]

This gives user-supplied callbacks access to .grounding.trace.result.primitives, .gaps, .pathway, etc. — the whole internal trace shape. Refactoring grounding (e.g. changing the trace structure) could silently break callbacks.

No callbacks in src/, tests/, or examples/ currently read context.grounding — the leak is forward-looking. A real callback-side use case did surface in design discussion, though: a policy on edge E may want to behave differently depending on which other root concepts were grounded alongside it (e.g. allow Delete→File unless protected was also grounded in the same call). That argues for some grounding signal, not zero.

Minimum useful surface, scoped to the actual use case:

class GroundingContext(BaseModel):
    agent_id: UUID | None = None
    resolved_concepts: list[str]   # canonical names grounded in this call

Notably dropped from the audit's original suggestion: is_grounded and gap_count. By the time a callback fires, grounding has already succeeded; those fields are invariant and dead weight.

Open question: should the facade also let callbacks inspect primitive properties of co-occurring concepts (e.g. get_primitive(name).depths[3].properties["sensitivity"])? My lean is no until a real callback needs it — the facade is non-breaking to extend.

Concern 2 — Same call_context reused across all triggering edges

PolicyGate._collect_violations walks (primitive, depth, relatum, policy) tuples and invokes the callback once per match — but builds call_context once at the top and reuses the same instance for every iteration:

# src/vre/core/policy/gate.py
for primitive in response.result.primitives:
    for depth in primitive.depths:
        for relatum in depth.relata:
            for policy in relatum.policies:
                cb_result = cb(call_context)   # same call_context every time

A callback registered on multiple edges has no way to know which edge triggered the current invocation. The callback only sees tool_name, call_args, call_kwargs, and (today) the full GroundingResult.

Three patterns this blocks:

  1. Edge-aware logging. An audit_log callback on Delete→File, Modify→File, Read→File can fire three times per call and produce three indistinguishable log lines.
  2. Source-specific rate limiting. A rate_limit callback on Send→Email (5/hr) and Send→SMS (20/hr) can't pick the right counter — forcing two callbacks instead of one parameterized one.
  3. Depth-discriminating policies. Same callback on a primitive's D2 (CAPABILITIES) edge vs. its D3 (CONSTRAINTS) edge can't tell them apart.

Workaround today: write one callback per edge and let the dotted-path callback string carry the discrimination. Works, but pushes complexity into graph configuration.

Proposed combined shape (sketch)

class GroundingContext(BaseModel):
    agent_id: UUID | None = None
    resolved_concepts: list[str]

class TriggeringEdge(BaseModel):
    source_name: str
    target_name: str
    source_depth: DepthLevel
    target_depth: DepthLevel

class PolicyCallContext(BaseModel):
    tool_name: str
    grounding: GroundingContext
    call_args: tuple[Any, ...]
    call_kwargs: dict[str, Any]
    triggering_edge: TriggeringEdge   # new — built per iteration in gate.py

PolicyGate._collect_violations builds a fresh PolicyCallContext per (primitive, depth, relatum) tuple instead of reusing one.

Open design questions

  • Should Policy itself (or policy.metadata) be on the context? Useful for the rate-limiting case ("stuff the limit into policy.metadata, read it from the callback").
  • Pass triggering edge via the context, or as a second positional arg to the callback (would change the PolicyCallback Protocol signature — breaking for any existing callbacks)?
  • Per-iteration PolicyCallContext construction cost — cheap, but worth measuring on a graph with many policy edges.
  • Backwards-compatibility story: is this a 1.0-blocker breaking change, or do we ship it as a 0.5.x bump with a migration note?

Notes

  • PolicyCallContext and PolicyCallback are exported from vre.__all__ (public API).
  • We're at 0.4.x — pre-1.0 latitude is available but worth using deliberately.
  • Policy wizard (core/policy/wizard.py) and any UI work (mentioned: a future "VRE Workstation UI" for policy creation) should be considered when shaping the final API.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions