Skip to content

Editorial: hybrid citation system — Metis-backed claims and external references with visible distinction #109

@0b00101111

Description

@0b00101111

Problem

Today, PGB articles cite a mix of sources — most pulled from the open web (AAO, USPSTF, PMC, HealthyChildren.org, StatPearls). The Metis library currently has 26 PDFs (mostly WHO/CDC/AAP preventive guidance) producing 6,821 atoms with verbatim quoted spans.

We want every factual claim to have solid provenance, but a coverage audit on eyes-overview.md (15 representative claims, see results below) shows that the current Metis library can only back ~27% strongly + ~20% partially + ~53% not at all. The gap isn't fixable by one more PDF — it's structural. Specialist clinical content (prevalence stats, treatment protocols, USPSTF/AAO position statements, RCT outcomes) lives in journal articles and clinical web pages that won't ever be in the Metis library at scale.

So "all claims Metis-backed" isn't realistic for articles as broad as PGB. The editorial decision is to embrace a hybrid: every claim gets a citation, but the citation's class is visible. Library-backed claims are auditable to the byte. External-reference claims are clearly marked as such.

Citation taxonomy

Two tiers, with deliberately different visual weight:

Tier 1 — Library-backed (Metis atom):

Tier 2 — External reference:

  • Marker in markdown: [^ext:<ref-id>] (current [Source: srcN] markers map here)
  • Resolves to a plain citation in the article frontmatter or a `refs.yml` sidecar
  • Footnote shows: title, author, year, URL (no verbatim quote, no provenance hash)
  • Visual: "external reference" badge per Design: inline provenance for cited claims without hurting readability #108 — clearly different from Tier 1

Forbidden: uncited factual claims. Tone/transition/framing sentences are fine bare; anything making a factual assertion needs a Tier 1 or Tier 2 marker.

Workflow

For new articles:

  1. Pick topic. Run `metis apply "" --format kx --output content/citations/.kx.json` to fetch relevant atoms.
  2. Outline the article around the available atoms. For claims atoms can support, cite as Tier 1.
  3. For claims that need external sources, cite as Tier 2 (plain URL/citation).
  4. Build-time validator: every `[^atom:X]` resolves to a unit in the sidecar. Every `[^ext:Y]` resolves to an entry in refs.yml. Build fails on dangling references.

For existing articles:

  • Migrate one article at a time (probably high-traffic ones first)
  • Run audit: which existing `[Source: srcN]` claims have a matching atom in Metis? Convert those to Tier 1.
  • Leave the rest as Tier 2 with refreshed plain-citation format.
  • Track migration % in repo README or a simple dashboard.

Audit data — eyes-overview.md (2026-04-19)

Audited 15 representative factual claims against the current Metis library:

Verdict Count Examples
✅ Strong (Tier 1 viable) 4 red reflex detects cataract/RB/glaucoma; AAP red reflex at every well-child visit; outdoor time reduces myopia; 80–120 min/day outdoor recommendation
🟡 Partial (Tier 1 with caveat) 3 newborn focus 8–12 in (similar atom but ranked 5/5); 6–12mo eye exam (cited as Canadian Optometrists not AOA); USPSTF 3–5 screening (similar atom but cites NCCVEH not USPSTF)
❌ None (Tier 2 only) 8 color vision 4–5mo milestone; amblyopia critical period 7–9; amblyopia 1–5% prevalence; strabismus 2–4% prevalence; 2hr=6hr patching; AAO blue-light glasses position; 30K sports eye injuries/year; 90% preventable with eyewear

Library coverage is strong on preventive/population health (WHO, AAP Bright Futures, CDC) and weak on specialist clinical content (prevalence, RCTs, position statements).

Two retrieval issues found during audit (separate work, not blocking this issue)

  1. Ranking poor on borderline matches. C1 had a strong-match atom in slot 5/5, behind unrelated pincer-grasp atoms. Investigate hybrid BM25+vector+RRF retrieval tuning in metis.
  2. `--top-k 3` returns 5 units. Either flag isn't honored or it's a different stage's k.

File these as separate issues in the metis repo if confirmed.

Phased plan

Phase 1 — Schema & validator (1 day, no UX work)

  • TypeScript types for KX docs (subset PGB cares about: `units[].id`, `provenance.quotedSpans[].text`, `source.ref`, `meta.sources[]`)
  • Refs.yml schema for Tier 2 citations
  • Astro/Vite plugin: validate `[^atom:X]` resolves to sidecar unit; validate `[^ext:Y]` resolves to refs entry; fail build on dangling
  • No rendering yet

Phase 2 — Rendering + UX (depends on #108 design decision, ~3 days)

Phase 3 — Audit + migration (ongoing)

  • Build a CLI: `pgb audit-citations ` runs the same check this issue's audit data came from
  • Pick 3–5 high-traffic articles; migrate each; learn what hurts
  • Eventually: every article in repo has 100% citation coverage (mix of Tier 1 + Tier 2), enforced by CI

Phase 4 — Library expansion (lower priority, can run in parallel)

  • For each article topic where Tier 1 coverage is <50%, identify what would need to be added to Metis library
  • Most won't be free PDFs; some (USPSTF rec PDFs, CDC reports) are. Cherry-pick the high-leverage ones.
  • This won't get us to 100% Tier 1 ever; goal is to nudge the ratio.

Out of scope (for now)

  • Auto-generating articles from Metis atoms (separate product question)
  • Bidirectional traceability (article → atom → source PDF byte offset)
  • Live Metis querying at build time (Phase 1 commits sidecars instead — simpler, works in CI without Metis installed)

Related

Decision deferred

This proposal is not for execution today. Filing for visibility and as a starting point when we're ready to commit. Plan should be revisited after:

  • More Metis library sources added (Phase B of metis-library/parentguidebook ingestion)
  • Decision on whether PGB stays a broad "trustworthy parenting guide" or narrows to "only what we can verify" (different products)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions