Skip to content

Inconsistent naming conventions for Primitive data objects #48

@anormang1992

Description

@anormang1992

Summary

Most JSON-serialized blobs on Primitive nodes and outgoing relationships use the *_json suffix to signal "this is a stringified JSON payload, not a native Neo4j scalar." Two properties — provenance (on nodes and relationships) and policies (on relationships) — break this convention. Both were added after the original schema (provenance in 41b9d77, policies later) and did not conform to the existing pattern.

The result is a storage layer where the property name no longer reliably tells a reader whether they're looking at a native value or a JSON blob that needs json.loads() before use.

Problem Statement

Current property naming on stored primitives and relationships:

Property Storage form Suffix follows convention?
depths_json JSON string yes
metrics_json JSON string yes
metadata_json JSON string (on rel) yes
provenance JSON string no (node + rel)
policies JSON string (on rel) no

Concretely from src/vre/core/graph.py:

  • save_primitive() writes p.provenance = $provenance (line 328) where $provenance is the output of _dump_model_json() — a JSON string.
  • Relationship CREATE writes provenance: $provenance and policies: $policies (lines 360–361), both JSON strings.
  • The hydration path then has to _parse_json_field() these properties to get usable dicts (lines 198–201, 217–220).

Why this matters:

  1. Reader cognitive loadp.metrics_json clearly needs decoding; p.provenance does not visually advertise that it does too. Future contributors querying the graph in Cypher (debugging, ad-hoc reporting) will be surprised.
  2. Schema drift signal — the inconsistency hints that the convention isn't load-bearing, which makes future additions less likely to follow it.
  3. Documentation accuracy — the docstring at the top of the module describes "embedded depth JSON" but doesn't enumerate the schema; users reading the code rely on naming to understand storage shape.

Proposed Solution

Standardize on the *_json suffix for every property whose value is a JSON-serialized Pydantic model.

Renames required:

  • Node property: p.provenancep.provenance_json
  • Relationship property: r.provenancer.provenance_json
  • Relationship property: r.policiesr.policies_json

Files affected (all in src/vre/core/graph.py):

  • save_primitive() (lines 326–371) — write side: rename $provenance$provenance_json and the SET/CREATE clauses to match.
  • _record_to_node_data() (lines 134–145) — rename the dict key returned to hydration.
  • _record_to_relationships() (lines 147–166) — rename provenance and add policies_json style if we also rename the local dict shape.
  • _hydrate_primitive() (lines 168–238) — read side: pull from the renamed fields.
  • find_by_id() Cypher (lines 470–513) — RETURN p.provenance_json AS provenance_json, r.provenance_json, r.policies_json in the collect({...}).
  • find_by_name() Cypher (lines 515–559) — same rename.
  • resolve_subgraph() Cypher (lines 578–685) — same rename in both the per-node projection and the per-edge projection.

No changes outside graph.py are needed: the Pydantic field names on Primitive, Depth, and Relatum remain provenance and policies — only the on-disk Neo4j property names change.

Migration of existing graphs

This is a breaking change for any existing Neo4j database that was written before the rename. A one-shot Cypher migration is required:

// Nodes
MATCH (p:Primitive)
WHERE p.provenance IS NOT NULL
SET p.provenance_json = p.provenance
REMOVE p.provenance;

// Relationships — applies to every relation type, so iterate via type filter
MATCH ()-[r]->()
WHERE r.provenance IS NOT NULL
SET r.provenance_json = r.provenance
REMOVE r.provenance;

MATCH ()-[r]->()
WHERE r.policies IS NOT NULL
SET r.policies_json = r.policies
REMOVE r.policies;

Ship this as a script under scripts/migrate_property_names.py (matching the style of clear_graph.py and seed_all.py) so users can run it once against their graph.

VRE Design Alignment

This is purely a persistence-layer cleanup. It does not affect:

  • The agent–VRE contract — agents never see Neo4j property names; they work with Primitive, Depth, Relatum Python objects.
  • Grounding, policy evaluation, or gap detection — all read hydrated Pydantic models.
  • Epistemic semantics — provenance still attaches to nodes, depths, and relata exactly as before.

The change reinforces a principle that's already in the codebase: storage-shape information should be visible at the call site. Epistemic honesty isn't directly at stake, but consistency makes the storage layer easier to audit and extend.

Acceptance Criteria

  • All Cypher writes in save_primitive() and update_metrics() use provenance_json / policies_json for the renamed properties.
  • All Cypher reads (find_by_id, find_by_name, resolve_subgraph, batch_read_metrics) project the renamed properties.
  • _hydrate_primitive, _record_to_node_data, and _record_to_relationships read from the renamed keys.
  • No occurrences of p.provenance, r.provenance, or r.policies remain as Cypher property accesses (Pydantic field accesses like relatum.provenance are unchanged).
  • A migration script (scripts/migrate_property_names.py) renames properties on existing graphs and is documented in README.md upgrade notes.
  • All existing tests pass without modification (Pydantic surface unchanged); persistence round-trip tests cover the new property names.
  • Release notes call out the breaking change and link to the migration script.

Open Questions

  • Should this rename ride along with another schema-touching change (e.g. issue Atomic metric updates for concurrent safety #50, atomic metric updates) so users only run one migration? Or ship standalone with a clear minor-version bump?
  • Is there appetite to go further and rename the Pydantic fields too (e.g. drop the .policies list from Relatum in favor of a different shape)? Out of scope for this issue, but worth flagging if any redesign is planned.

Dependencies

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions