Skip to content

Populate schema on entity schemas for in-context LLM anchors #2

@mgoldsborough

Description

@mgoldsborough

Context

upjack 0.5.1 added pass-through of the top-level examples field on entity JSON schemas to the auto-generated MCP tool schemas. The pass-through also strips framework-managed fields (id, type, version, created_at, etc.) from examples. Infrastructure is ready; our schemas don't use it yet.

Problem

An LLM calling create_deal or create_contact for the first time in a session sees only the schema: property names, types, descriptions. No worked example of a valid call. When the field names aren't obvious (e.g., is it name or title? email or email_address?), the LLM guesses and wastes a turn on schema-shape errors.

Adding 1-2 realistic examples per entity gives the LLM an in-context anchor alongside the schema. First-try success rates go up noticeably.

Fix

Add a top-level examples array to each entity schema file:

  • schemas/deal.schema.json — example deals covering a typical early-stage and closing stage
  • schemas/contact.schema.json — example with email, phone, role, company
  • schemas/interaction.schema.json — one email, one meeting with follow_up_date
  • Any other entity schemas under schemas/

Shape:

{
  "$schema": "...",
  "title": "CRM Deal",
  "allOf": [...],
  "properties": {...},
  "required": [...],
  "examples": [
    {
      "title": "Acme Q2 pilot",
      "stage": "qualified",
      "value": 25000,
      "contact_name": "Alice Chen"
    }
  ]
}

upjack handles the rest — strips id/type/other base fields from examples before publishing, passes through to create_* tool schemas, prepends {entity}_id for update_* examples.

Effort

~5 minutes per schema. Six-ish schemas. ~30 min total for the whole app.

Verification

  • uv run pytest tests/ still green
  • Confirm tools/list output for create_deal contains the examples array (inspect via a quick script against the server)
  • Ideally: spin up against a local agent and repeat the prompts from conv_30f049cdb75d464f — first-try success on create_deal should be cleaner

Out of scope

  • synapse-todo-board needs the mirror treatment — tracked separately in that repo
  • Changing example generation logic in upjack — no, author-controlled is the right design

Severity

Low priority, high-ROI-per-effort. Good item for the next time someone's in the repo for any reason.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions