Background
Surfaced during the codebase-audit cleanup pass (branch chore/audit-cleanup). Two related concerns about the policy-callback API came up — both touching PolicyCallContext. They're being deferred so the audit-cleanup PR can stay narrow, and so the redesign happens as one coherent pass instead of two.
Concern 1 — Leaky grounding: GroundingResult
PolicyCallContext currently exposes the full GroundingResult:
# src/vre/core/policy/callback.py
class PolicyCallContext(BaseModel):
tool_name: str
grounding: GroundingResult # ← full epistemic trace, gaps, internals
call_args: tuple[Any, ...]
call_kwargs: dict[str, Any]
This gives user-supplied callbacks access to .grounding.trace.result.primitives, .gaps, .pathway, etc. — the whole internal trace shape. Refactoring grounding (e.g. changing the trace structure) could silently break callbacks.
No callbacks in src/, tests/, or examples/ currently read context.grounding — the leak is forward-looking. A real callback-side use case did surface in design discussion, though: a policy on edge E may want to behave differently depending on which other root concepts were grounded alongside it (e.g. allow Delete→File unless protected was also grounded in the same call). That argues for some grounding signal, not zero.
Minimum useful surface, scoped to the actual use case:
class GroundingContext(BaseModel):
agent_id: UUID | None = None
resolved_concepts: list[str] # canonical names grounded in this call
Notably dropped from the audit's original suggestion: is_grounded and gap_count. By the time a callback fires, grounding has already succeeded; those fields are invariant and dead weight.
Open question: should the facade also let callbacks inspect primitive properties of co-occurring concepts (e.g. get_primitive(name).depths[3].properties["sensitivity"])? My lean is no until a real callback needs it — the facade is non-breaking to extend.
Concern 2 — Same call_context reused across all triggering edges
PolicyGate._collect_violations walks (primitive, depth, relatum, policy) tuples and invokes the callback once per match — but builds call_context once at the top and reuses the same instance for every iteration:
# src/vre/core/policy/gate.py
for primitive in response.result.primitives:
for depth in primitive.depths:
for relatum in depth.relata:
for policy in relatum.policies:
cb_result = cb(call_context) # same call_context every time
A callback registered on multiple edges has no way to know which edge triggered the current invocation. The callback only sees tool_name, call_args, call_kwargs, and (today) the full GroundingResult.
Three patterns this blocks:
- Edge-aware logging. An
audit_log callback on Delete→File, Modify→File, Read→File can fire three times per call and produce three indistinguishable log lines.
- Source-specific rate limiting. A
rate_limit callback on Send→Email (5/hr) and Send→SMS (20/hr) can't pick the right counter — forcing two callbacks instead of one parameterized one.
- Depth-discriminating policies. Same callback on a primitive's D2 (CAPABILITIES) edge vs. its D3 (CONSTRAINTS) edge can't tell them apart.
Workaround today: write one callback per edge and let the dotted-path callback string carry the discrimination. Works, but pushes complexity into graph configuration.
Proposed combined shape (sketch)
class GroundingContext(BaseModel):
agent_id: UUID | None = None
resolved_concepts: list[str]
class TriggeringEdge(BaseModel):
source_name: str
target_name: str
source_depth: DepthLevel
target_depth: DepthLevel
class PolicyCallContext(BaseModel):
tool_name: str
grounding: GroundingContext
call_args: tuple[Any, ...]
call_kwargs: dict[str, Any]
triggering_edge: TriggeringEdge # new — built per iteration in gate.py
PolicyGate._collect_violations builds a fresh PolicyCallContext per (primitive, depth, relatum) tuple instead of reusing one.
Open design questions
- Should
Policy itself (or policy.metadata) be on the context? Useful for the rate-limiting case ("stuff the limit into policy.metadata, read it from the callback").
- Pass triggering edge via the context, or as a second positional arg to the callback (would change the
PolicyCallback Protocol signature — breaking for any existing callbacks)?
- Per-iteration
PolicyCallContext construction cost — cheap, but worth measuring on a graph with many policy edges.
- Backwards-compatibility story: is this a 1.0-blocker breaking change, or do we ship it as a 0.5.x bump with a migration note?
Notes
PolicyCallContext and PolicyCallback are exported from vre.__all__ (public API).
- We're at 0.4.x — pre-1.0 latitude is available but worth using deliberately.
- Policy wizard (
core/policy/wizard.py) and any UI work (mentioned: a future "VRE Workstation UI" for policy creation) should be considered when shaping the final API.
Background
Surfaced during the codebase-audit cleanup pass (branch
chore/audit-cleanup). Two related concerns about the policy-callback API came up — both touchingPolicyCallContext. They're being deferred so the audit-cleanup PR can stay narrow, and so the redesign happens as one coherent pass instead of two.Concern 1 — Leaky
grounding: GroundingResultPolicyCallContextcurrently exposes the fullGroundingResult:This gives user-supplied callbacks access to
.grounding.trace.result.primitives,.gaps,.pathway, etc. — the whole internal trace shape. Refactoring grounding (e.g. changing the trace structure) could silently break callbacks.No callbacks in src/, tests/, or examples/ currently read
context.grounding— the leak is forward-looking. A real callback-side use case did surface in design discussion, though: a policy on edgeEmay want to behave differently depending on which other root concepts were grounded alongside it (e.g. allowDelete→Fileunlessprotectedwas also grounded in the same call). That argues for some grounding signal, not zero.Minimum useful surface, scoped to the actual use case:
Notably dropped from the audit's original suggestion:
is_groundedandgap_count. By the time a callback fires, grounding has already succeeded; those fields are invariant and dead weight.Open question: should the facade also let callbacks inspect primitive properties of co-occurring concepts (e.g.
get_primitive(name).depths[3].properties["sensitivity"])? My lean is no until a real callback needs it — the facade is non-breaking to extend.Concern 2 — Same
call_contextreused across all triggering edgesPolicyGate._collect_violationswalks(primitive, depth, relatum, policy)tuples and invokes the callback once per match — but buildscall_contextonce at the top and reuses the same instance for every iteration:A callback registered on multiple edges has no way to know which edge triggered the current invocation. The callback only sees
tool_name,call_args,call_kwargs, and (today) the fullGroundingResult.Three patterns this blocks:
audit_logcallback onDelete→File,Modify→File,Read→Filecan fire three times per call and produce three indistinguishable log lines.rate_limitcallback onSend→Email(5/hr) andSend→SMS(20/hr) can't pick the right counter — forcing two callbacks instead of one parameterized one.Workaround today: write one callback per edge and let the dotted-path callback string carry the discrimination. Works, but pushes complexity into graph configuration.
Proposed combined shape (sketch)
PolicyGate._collect_violationsbuilds a freshPolicyCallContextper(primitive, depth, relatum)tuple instead of reusing one.Open design questions
Policyitself (orpolicy.metadata) be on the context? Useful for the rate-limiting case ("stuff the limit intopolicy.metadata, read it from the callback").PolicyCallbackProtocol signature — breaking for any existing callbacks)?PolicyCallContextconstruction cost — cheap, but worth measuring on a graph with many policy edges.Notes
PolicyCallContextandPolicyCallbackare exported fromvre.__all__(public API).core/policy/wizard.py) and any UI work (mentioned: a future "VRE Workstation UI" for policy creation) should be considered when shaping the final API.