Skip to content

[recipes] Universal ingest primitives#99

Open
alanshurafa wants to merge 3 commits intoNateBJones-Projects:mainfrom
alanshurafa:contrib/alanshurafa/universal-ingest-primitives
Open

[recipes] Universal ingest primitives#99
alanshurafa wants to merge 3 commits intoNateBJones-Projects:mainfrom
alanshurafa:contrib/alanshurafa/universal-ingest-primitives

Conversation

@alanshurafa
Copy link
Copy Markdown
Contributor

Summary

  • Shared parseStructuredCapture() for bracket-syntax type/topic hints
  • Portable prepareThoughtPayload() with strict override precedence
  • Pure functions — no network calls, no provider env lookups, no LLM classification
  • Accepts optional precomputed inputs (embedding, fingerprint, extracted metadata)

Why

Multiple ingest paths (MCP, REST, import scripts, smart-ingest) need consistent type resolution, metadata merging, and payload normalization. This primitive provides one shared contract so every path produces identically shaped payloads. References the already-merged Content Fingerprint Dedup primitive (#54).

Override precedence

  1. Structured capture hint (from content brackets)
  2. Explicit caller override (opts.metadata.type, etc.)
  3. Extracted metadata (from LLM, passed in)
  4. Defaults (type: idea, importance: 3, quality_score: 50)

Test plan

  • Bracketed structured capture parses type/topic/next-step correctly
  • Precedence is stable across all 4 levels
  • No provider/network dependency for the primitive's happy path
  • Two different callers produce same payload for same input

🤖 Generated with Claude Code

alanshurafa and others added 2 commits March 21, 2026 22:09
Shared structured capture parser and thought payload preparation contract
for consistent ingestion across all capture paths. Pure portable functions
with no provider/network dependencies.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add Prerequisites section required by OB1 automated review
- Replace relative links to schemas/integrations (not in this branch)
  with plain text references to avoid broken link check failures

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@justfinethanku justfinethanku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: Universal Ingest Primitives

Overview

This PR introduces shared TypeScript functions for normalizing thought ingestion across different capture paths. The implementation is well-designed with clean separation of concerns, clear precedence rules, and comprehensive documentation.


What's Good

Excellent code quality

  • Pure functions with no side effects (no network calls, no env lookups)
  • Clear type definitions and comprehensive JSDoc comments
  • Well-tested precedence model with explicit override hierarchy
  • Structured capture syntax adds real user value

Strong documentation

  • README meets all required sections (Prerequisites, Steps, Expected Outcome, Troubleshooting)
  • Clear code examples for each common pattern
  • Override precedence is thoroughly explained
  • 890+ words (well above the 200-word primitive requirement)

Metadata compliance

  • All required fields present and valid
  • Proper semantic versioning
  • Appropriate difficulty level (intermediate)
  • Accurate time estimate

Technical approach

  • Portable design that accepts precomputed values
  • No embedded policy decisions (embeddings, sensitivity detection)
  • Good use of TypeScript's type system
  • Proper validation and sanitization

Critical Issue: Primitive Curation Requirement

Does not meet primitive curation standard

Per CONTRIBUTING.md line 28:

Primitives are reusable concept guides that get referenced by multiple extensions. [...] A primitive should be referenced by at least 2 extensions.

The README's "Extensions That Use This" section lists:

  • Smart Ingest — which is an integration, not an extension
  • "Any MCP capture tool or REST endpoint" — aspirational, not concrete

Current state:

  • 6 extensions exist in the repo: family-calendar, home-maintenance, household-knowledge, job-hunt, meal-planning, professional-crm
  • None of these currently use this primitive
  • Smart Ingest (PR #100) is categorized under , not

Why this matters:
Primitives are curated specifically to avoid premature abstraction. They should emerge from proven patterns across multiple extensions, not be created speculatively. This is a well-designed library, but it doesn't meet the repo's curation bar yet.


Recommendations

Option 1: Convert to a recipe (fastest path)
Move this to where it can live as a standalone contribution without the 2+ extension requirement. Recipes are open for community contributions and don't have the curation constraint.

Option 2: Integration-first approach

  1. Merge smart-ingest (PR #100) first as an integration using this code inline
  2. Build a second integration or extension that needs the same pattern
  3. Once 2+ integrations/extensions share the code, extract it as a primitive (this PR)

Option 3: Wait for extension adoption
If upcoming extensions in your roadmap will use this (e.g., if email-history-import or chatgpt-conversation-import plan to adopt it), coordinate with maintainers to time this PR with those updates.


Additional Observations

No technical blockers:

  • No credentials, API keys, or secrets detected
  • No SQL files (not applicable for TypeScript primitives)
  • No binary blobs
  • PR title follows format correctly
  • All files in correct directory structure
  • No destructive operations

Minor polish opportunity:
The README's "Extensions That Use This" section currently references smart-ingest as if it were an extension. Update this to accurately reflect that it's an integration, or wait until actual extensions adopt it.

Code quality note:
The implementation itself is production-ready. The precedence model (structured capture → caller override → extracted → defaults) is clear and well-documented. No code changes needed.


Verdict: Significant changes needed

This is high-quality work, but it doesn't meet the primitive category's curation requirement. The primitive should either be:

  1. Reclassified as a recipe (immediate path forward), or
  2. Deferred until 2+ extensions actually use it (maintains curation standard)

I recommend Option 1 (recipe reclassification) as the fastest path to getting this merged while maintaining the repo's curation standards.

Next steps:

  • Decide between recipe vs. deferred primitive with maintainers
  • If recipe: update category field to "recipes" and move folder
  • If deferred: coordinate with extension authors to adopt this pattern first

Happy to discuss further on Discord or in the PR thread. The code itself is solid — this is purely about category placement and curation timing.

Copy link
Copy Markdown
Collaborator

@justfinethanku justfinethanku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: Universal Ingest Primitives

Overview

This PR introduces shared TypeScript functions for normalizing thought ingestion across different capture paths. The implementation is well-designed with clean separation of concerns, clear precedence rules, and comprehensive documentation.


What's Good

Excellent code quality

  • Pure functions with no side effects (no network calls, no env lookups)
  • Clear type definitions and comprehensive JSDoc comments
  • Well-tested precedence model with explicit override hierarchy
  • Structured capture syntax adds real user value

Strong documentation

  • README meets all required sections (Prerequisites, Steps, Expected Outcome, Troubleshooting)
  • Clear code examples for each common pattern
  • Override precedence is thoroughly explained
  • 890+ words (well above the 200-word primitive requirement)

Metadata compliance

  • All required fields present and valid
  • Proper semantic versioning
  • Appropriate difficulty level (intermediate)
  • Accurate time estimate

Technical approach

  • Portable design that accepts precomputed values
  • No embedded policy decisions (embeddings, sensitivity detection)
  • Good use of TypeScript's type system
  • Proper validation and sanitization

Critical Issue: Primitive Curation Requirement

Does not meet primitive curation standard

Per CONTRIBUTING.md line 28:

Primitives are reusable concept guides that get referenced by multiple extensions. [...] A primitive should be referenced by at least 2 extensions.

The README's "Extensions That Use This" section lists:

  • Smart Ingest — which is an integration, not an extension
  • "Any MCP capture tool or REST endpoint" — aspirational, not concrete

Current state:

  • 6 extensions exist in the repo: family-calendar, home-maintenance, household-knowledge, job-hunt, meal-planning, professional-crm
  • None of these currently use this primitive
  • Smart Ingest (PR #100) is categorized under integrations/, not extensions/

Why this matters:
Primitives are curated specifically to avoid premature abstraction. They should emerge from proven patterns across multiple extensions, not be created speculatively. This is a well-designed library, but it doesn't meet the repo's curation bar yet.


Recommendations

Option 1: Convert to a recipe (fastest path)
Move this to recipes/universal-ingest-primitives where it can live as a standalone contribution without the 2+ extension requirement. Recipes are open for community contributions and don't have the curation constraint.

Option 2: Integration-first approach

  1. Merge smart-ingest (PR #100) first as an integration using this code inline
  2. Build a second integration or extension that needs the same pattern
  3. Once 2+ integrations/extensions share the code, extract it as a primitive (this PR)

Option 3: Wait for extension adoption
If upcoming extensions in your roadmap will use this (e.g., if email-history-import or chatgpt-conversation-import plan to adopt it), coordinate with maintainers to time this PR with those updates.


Additional Observations

No technical blockers:

  • No credentials, API keys, or secrets detected
  • No SQL files (not applicable for TypeScript primitives)
  • No binary blobs
  • PR title follows format correctly
  • All files in correct directory structure
  • No destructive operations

Minor polish opportunity:
The README's "Extensions That Use This" section currently references smart-ingest as if it were an extension. Update this to accurately reflect that it's an integration, or wait until actual extensions adopt it.

Code quality note:
The implementation itself is production-ready. The precedence model (structured capture → caller override → extracted → defaults) is clear and well-documented. No code changes needed.


Verdict: Significant changes needed

This is high-quality work, but it doesn't meet the primitive category's curation requirement. The primitive should either be:

  1. Reclassified as a recipe (immediate path forward), or
  2. Deferred until 2+ extensions actually use it (maintains curation standard)

I recommend Option 1 (recipe reclassification) as the fastest path to getting this merged while maintaining the repo's curation standards.

Next steps:

  • Decide between recipe vs. deferred primitive with maintainers
  • If recipe: update metadata.json category field to "recipes" and move folder
  • If deferred: coordinate with extension authors to adopt this pattern first

Happy to discuss further on Discord or in the PR thread. The code itself is solid — this is purely about category placement and curation timing.

@justfinethanku
Copy link
Copy Markdown
Collaborator

Code Review: Universal Ingest Primitives

Overview

This PR introduces shared TypeScript functions for normalizing thought ingestion across different capture paths. The implementation is well-designed with clean separation of concerns, clear precedence rules, and comprehensive documentation.


What's Good

Excellent code quality

  • Pure functions with no side effects (no network calls, no env lookups)
  • Clear type definitions and comprehensive JSDoc comments
  • Well-tested precedence model with explicit override hierarchy
  • Structured capture syntax adds real user value

Strong documentation

  • README meets all required sections (Prerequisites, Steps, Expected Outcome, Troubleshooting)
  • Clear code examples for each common pattern
  • Override precedence is thoroughly explained
  • 890+ words (well above the 200-word primitive requirement)

Metadata compliance

  • All required fields present and valid
  • Proper semantic versioning
  • Appropriate difficulty level (intermediate)
  • Accurate time estimate

Technical approach

  • Portable design that accepts precomputed values
  • No embedded policy decisions (embeddings, sensitivity detection)
  • Good use of TypeScript's type system
  • Proper validation and sanitization

Critical Issue: Primitive Curation Requirement

Does not meet primitive curation standard

Per CONTRIBUTING.md line 28:

Primitives are reusable concept guides that get referenced by multiple extensions. [...] A primitive should be referenced by at least 2 extensions.

The README's "Extensions That Use This" section lists:

  • Smart Ingest — which is an integration, not an extension
  • "Any MCP capture tool or REST endpoint" — aspirational, not concrete

Current state:

  • 6 extensions exist in the repo: family-calendar, home-maintenance, household-knowledge, job-hunt, meal-planning, professional-crm
  • None of these currently use this primitive
  • Smart Ingest (PR [integrations] Smart ingest edge function #100) is categorized under integrations/, not extensions/

Why this matters:
Primitives are curated specifically to avoid premature abstraction. They should emerge from proven patterns across multiple extensions, not be created speculatively. This is a well-designed library, but it doesn't meet the repo's curation bar yet.


Recommendations

Option 1: Convert to a recipe (fastest path)
Move this to recipes/universal-ingest-primitives where it can live as a standalone contribution without the 2+ extension requirement. Recipes are open for community contributions and don't have the curation constraint.

Option 2: Integration-first approach

  1. Merge smart-ingest (PR [integrations] Smart ingest edge function #100) first as an integration using this code inline
  2. Build a second integration or extension that needs the same pattern
  3. Once 2+ integrations/extensions share the code, extract it as a primitive (this PR)

Option 3: Wait for extension adoption
If upcoming extensions in your roadmap will use this (e.g., if email-history-import or chatgpt-conversation-import plan to adopt it), coordinate with maintainers to time this PR with those updates.


Additional Observations

No technical blockers:

  • No credentials, API keys, or secrets detected
  • No SQL files (not applicable for TypeScript primitives)
  • No binary blobs
  • PR title follows format correctly
  • All files in correct directory structure
  • No destructive operations

Minor polish opportunity:
The README's "Extensions That Use This" section currently references smart-ingest as if it were an extension. Update this to accurately reflect that it's an integration, or wait until actual extensions adopt it.

Code quality note:
The implementation itself is production-ready. The precedence model (structured capture → caller override → extracted → defaults) is clear and well-documented. No code changes needed.


Verdict: Significant changes needed

This is high-quality work, but it doesn't meet the primitive category's curation requirement. The primitive should either be:

  1. Reclassified as a recipe (immediate path forward), or
  2. Deferred until 2+ extensions actually use it (maintains curation standard)

I recommend Option 1 (recipe reclassification) as the fastest path to getting this merged while maintaining the repo's curation standards.

Next steps:

  • Decide between recipe vs. deferred primitive with maintainers
  • If recipe: update metadata.json category field to "recipes" and move folder
  • If deferred: coordinate with extension authors to adopt this pattern first

Happy to discuss further on Discord or in the PR thread. The code itself is solid — this is purely about category placement and curation timing.

Primitives require 2+ extension references per CONTRIBUTING.md.
Moving to recipes per maintainer review feedback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@alanshurafa alanshurafa changed the title [primitives] Universal ingest primitives [recipes] Universal ingest primitives Mar 25, 2026
@matthallett1
Copy link
Copy Markdown
Collaborator

@claude review this PR

@github-actions
Copy link
Copy Markdown

Claude Code is working…

I'll analyze this and get back to you.

View job run

@matthallett1
Copy link
Copy Markdown
Collaborator

@claude review this PR

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 30, 2026

Claude finished @matthallett1's task in 1m 56s —— View job


PR Review: Universal Ingest Primitives

  • Read CLAUDE.md and CONTRIBUTING.md for project context
  • Read changed files (README, metadata.json, TypeScript source)
  • Assess alignment, clarity, completeness, safety, and value
  • Post review findings

Previous Review Status

The prior review by @justfinethanku correctly flagged that this code didn't belong in primitives/ and recommended moving it to recipes/. That has been done — the current PR places the contribution under recipes/universal-ingest-primitives/. That concern is resolved.


What's Good

Alignment — This fits squarely in OB1's mission. Any project that builds multiple ingest paths (MCP, REST, import scripts) will need exactly this kind of normalization layer. The recipe category is the right home.

Code quality — Both TypeScript files are well-written: clean types, comprehensive JSDoc, explicit validation and sanitization helpers (asNumber, asInteger, normalizeStringArray), no magic numbers (all constants are named).

Documentation — README meets all required recipe sections. Code examples for all four common patterns are copy-paste ready and accurate. The troubleshooting section anticipates real user friction.

Safety — No credentials, no destructive SQL, no local MCP servers, no binary blobs. Clean.

metadata.json — All required fields present, category correctly set to "recipes".


Issues Found

1. Importance precedence is inconsistent with documented order (medium)

The PR description and README both say the precedence is:

  1. Structured capture hint → 2. Explicit caller override → 3. Extracted → 4. Defaults

For type, this is correct — structured.typeHint || callerType || extracted?.type (line 176–178).

But for importance (lines 224–232), it's reversed: caller override wins over structured capture:

const callerImportance = extraMetadata.importance !== undefined ? ... : null;
const structuredImportance = structured.matched ? STRUCTURED_CAPTURE_IMPORTANCE : null;
const importance = callerImportance ?? structuredImportance ?? DEFAULT_IMPORTANCE;
// callerImportance takes priority — structured hint only applies if caller didn't specify

If a caller passes metadata: { importance: 1 } along with a [decision] structured capture, they get importance 1, not the documented level-1 structured hint value of 4. The same inconsistency applies to confidence resolution (lines 238–246). The documented contract doesn't match the implementation.

Fix this →

2. DEFAULT_QUALITY_SCORE doesn't match computed default (minor)

The constant at line 75 says DEFAULT_QUALITY_SCORE = 50. The code comments and PR body also say the default is 50. But when no caller quality is supplied, the actual value computed is:

Math.round(confidence * 70 + 20)  // with DEFAULT_CONFIDENCE = 0.55 → 59, not 50

A user expecting the documented default of 50 would see 59 instead. Either the formula or the documented default should be updated to be consistent.

Fix this →

3. integrations/smart-ingest doesn't exist in the repo (minor)

The "Works Well With" section references integrations/smart-ingest, but this directory doesn't exist. The integrations directory currently has only discord-capture and slack-capture. If that PR hasn't merged yet, this cross-reference will produce a broken internal link (caught by CI rule 13). Suggest softening the language to something like "Smart Ingest (pending contribution)" or removing the directory reference.

4. VALID_TYPES is defined in both files (minor)

parse-structured-capture.ts (line 35) and prepare-thought-payload.ts (line 81) each define the same VALID_TYPES set independently. If a new thought type is added to OB1, it needs to be updated in two places. Since prepare-thought-payload.ts already imports from parse-structured-capture.ts, exporting VALID_TYPES from the parser and re-using it in the payload preparer would be cleaner.


Minor Notes

  • The Troubleshooting section mentions that [idea] My thought (single bracket) won't match — this is genuinely surprising and worth a > [!NOTE] callout near the format description, not just in Troubleshooting.
  • normalizeStringArray JSDoc says "sorted" but the function only deduplicates; it preserves insertion order.
  • Both created and updated in metadata.json are 2026-03-22; the updated date could reflect when the category was changed to recipes.

Verdict

The recipe-category placement is correct and this is genuinely useful shared infrastructure. Two issues warrant attention before merge: the importance/confidence precedence inconsistency (#1) and the broken internal link to a non-existent integration (#3). The quality score mismatch (#2) is worth fixing to avoid confusing adopters.

The code itself is solid — no blockers on the logic, safety, or documentation quality. Once those items are addressed this should be ready to merge.

Copy link
Copy Markdown
Collaborator

@matthallett1 matthallett1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — Universal Ingest Primitives

This is well-designed. Pure functions, no side effects, clear precedence rules, good type safety. The structured capture parser is a smart pattern for giving users inline control over classification.

One issue

Single-bracket input silently drops hints (parse-structured-capture.ts:85-87)
The regex requires both [type] and [topic] brackets. If a user writes [idea] My thought, the brackets are treated as literal text in the content. This will confuse users since single-bracket is the intuitive first attempt.

Options:

  • Add a second regex branch for single-bracket format: [type] content
  • Or document this more prominently in the README (the troubleshooting section mentions it, but users won't read that first)

Recommend adding single-bracket support since the code is clean enough to handle it.

Minor notes

  • asString and asInteger helpers are clean, proper clamping
  • normalizeStringArray dedup + lowercase is the right call
  • The precedence order (structured > caller > extracted > default) is well-documented and correctly implemented
  • mergeUniqueStrings correctly handles the Set-based dedup

One small thing: normalizeTypeHint is exported from parse-structured-capture.ts and imported in prepare-thought-payload.ts, but I don't see it actually called in the prepare function. The type resolution uses sanitizeType instead. Is normalizeTypeHint only used via the structured parser internally? If so, no need to re-import it.

Fix the single-bracket case and this is good to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants