Skip to content

[integrations] Consolidation workers (bio + metadata)#14

Open
alanshurafa wants to merge 7 commits intomainfrom
contrib/alanshurafa/consolidation-workers
Open

[integrations] Consolidation workers (bio + metadata)#14
alanshurafa wants to merge 7 commits intomainfrom
contrib/alanshurafa/consolidation-workers

Conversation

@alanshurafa
Copy link
Copy Markdown
Owner

Summary

  • Bio worker (bio/index.ts): Synthesizes canonical biographical profiles from person_note, decision, and journal thoughts via LLM. Updates existing profiles in place on subsequent runs.
  • Metadata normalization worker (metadata-norm/index.ts): Finds thoughts with weak metadata (catch-all type, default importance, low confidence) and reclassifies via LLM with materiality and confidence guards (> 0.8 confidence, material change required).
  • Both workers use OpenRouter-first three-tier LLM fallback, support dry-run mode, and log all operations to consolidation_log.
  • Shared helpers (_shared/) copied from the enhanced-mcp integration (PR 5).

Dependencies

  • schemas/enhanced-thoughts (PR 1) — for type, importance, sensitivity_tier columns
  • schemas/knowledge-graph (PR 4) — for consolidation_log table

Gate compliance

Rule Status
2: README + metadata.json Pass
3: Valid metadata schema Pass
6: .ts code files Pass (4 files)
9: Prerequisites, numbered steps, expected outcome Pass
12: All files in contribution folder Pass
13: Relative links resolve Pass
14: No local MCP patterns Pass
15: Contains "05-tool-audit" Pass

Test plan

  • CI gate passes all 15 checks
  • Deploy bio worker with ?dry_run=true and verify profile output
  • Deploy metadata-norm worker with ?dry_run=true&limit=5 and verify reclassification results
  • Verify consolidation_log entries are created on non-dry-run execution

Bio worker synthesizes canonical biographical profiles from person_note,
decision, and journal thoughts. Metadata normalization worker reclassifies
thoughts with weak metadata (catch-all type, default importance, low
confidence) via LLM with materiality and confidence guards.

Both workers use OpenRouter-first three-tier LLM fallback, support dry-run
mode, log to consolidation_log, and use wildcard CORS for flexible deployment.

Depends on: schemas/enhanced-thoughts, schemas/knowledge-graph

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add blank lines around headings (MD022), fenced code blocks (MD031),
and between adjacent blockquotes (MD028). Fix broken link fragment
(MD051) and remove extra blank line (MD012). No content changes.

These errors were blocking CI on all open PRs since the lint check
runs repo-wide.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added documentation Improvements or additions to documentation recipe labels Apr 6, 2026
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a487268b56

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +133 to +137
const { data: decisions } = await supabase
.from("thoughts")
.select("id, content, type, importance, created_at")
.eq("type", "decision")
.gte("importance", 4)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restrict non-note sources by requested person name

When ?name= is provided, gatherSourceThoughts still pulls all high-importance decision rows (and the journal query below is also unconditional), so the profile prompt can include unrelated people/context and produce an inaccurate "Who is X" profile. This breaks the targeted-person mode whenever the brain contains mixed-person data.

Useful? React with 👍 / 👎.

Comment on lines +148 to +151
for (const { name, fn } of providers) {
try {
return await fn();
} catch (err) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Fall through to next LLM on empty primary output

The fallback loop returns immediately on the first provider that does not throw, even if it returns an empty or unusable body. In that case reclassifyThought gets empty/non-JSON text and the thought is skipped or errors out, while configured fallback providers are never attempted, reducing reclassification reliability.

Useful? React with 👍 / 👎.

Comment on lines +181 to +182
.eq("metadata->>generated_by", "consolidation-bio")
.eq("metadata->>artifact_type", "biographical_profile")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Scope existing-profile lookup to the requested person

The existing-profile query ignores the name filter and always returns the latest biographical profile globally. If operators run consolidation-bio for different ?name= values, each run updates the same row and overwrites the previous person’s profile instead of keeping separate artifacts.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation integration recipe

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant