[integrations] Smart ingest edge function by alanshurafa · Pull Request #10 · alanshurafa/OB1

alanshurafa · 2026-04-06T16:16:00Z

Summary

Standalone Supabase Edge Function for LLM-powered atomic thought extraction from raw text
Ported from ExoCortex production smart-ingest pipeline (1,369 completed jobs) with OB1 adaptations
Depends on schemas/smart-ingest-tables (PR [schemas] Smart ingest pipeline tables #4) for ingestion_jobs and ingestion_items tables

What It Does

Accepts raw text via HTTP POST, extracts atomic thoughts using an LLM (OpenRouter primary, OpenAI/Anthropic fallback), then deduplicates each thought against existing brain content using both SHA-256 content fingerprinting and pgvector semantic similarity. Four reconciliation actions: add, skip, append_evidence, create_revision.

Key Features

Dry-run mode — preview extractions without writing to the database
Job execution — commit dry-run results via /execute endpoint
Quality gate — minimum 30 chars, minimum importance 3
Fingerprint + semantic dedup — 0.85 match threshold, 0.92 skip threshold
Source metadata threading — import_key session dedup, capture provenance
Text chunking — handles long documents (5000 word limit per LLM call)
Sensitivity pre-flight — blocks restricted content from cloud processing
Entity extraction trigger — optional, best-effort (non-fatal if worker not deployed)

OB1 Adaptations

OpenRouter-first LLM provider order (reversed from ExoCortex)
Wildcard CORS for generic deployments
Model constants from _shared/config.ts (consistent with enhanced-mcp)
_shared/ helpers copied from enhanced-mcp (PR [integrations] Enhanced MCP server with alpha tool suite #6) for consistency

Files

All within integrations/smart-ingest/:

File	Lines	Purpose
`index.ts`	1094	Edge function with extraction, dedup, and execution logic
`_shared/helpers.ts`	770	Shared utilities (embedding, fingerprint, sensitivity, payload prep)
`_shared/config.ts`	204	Constants, types, prompts
`README.md`	225	Setup guide with prerequisites, steps, API reference, troubleshooting
`metadata.json`	18	OB1 contribution metadata
`deno.json`	5	Deno import map

Test plan

Verify all 15 gate checks pass via gh pr checks
Validate metadata.json against .github/metadata.schema.json
Confirm README contains: "prerequisites", numbered steps, "expected outcome"
Confirm "05-tool-audit" string appears in README
Confirm all relative links resolve (../../docs/01-getting-started.md, ../../docs/05-tool-audit.md)
Confirm no files outside integrations/smart-ingest/
Deploy to test Supabase project and smoke-test dry-run + execute flow

🤖 Generated with Claude Code

Port ExoCortex production smart-ingest pipeline to OB1 as a standalone Supabase Edge Function for LLM-powered atomic thought extraction from raw text. Features: dry-run preview, fingerprint + semantic dedup (0.85/0.92 thresholds), evidence append, job execution, quality gate, source metadata threading, import_key session dedup, chunking for long texts. OB1 adaptations: OpenRouter-first provider order, wildcard CORS, model constants from _shared/config.ts, optional entity extraction trigger, _shared/ helpers copied from enhanced-mcp (PR 5). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 661fe55dc6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-06T16:19:59Z

integrations/smart-ingest/_shared/helpers.ts

+    if (!response.ok) {
+      throw new Error(`OpenRouter embedding failed (${response.status}): ${await response.text()}`);
+    }


Fall back to OpenAI when OpenRouter embedding fails

embedText advertises OpenRouter-primary/OpenAI-fallback behavior, but this branch throws immediately on any OpenRouter non-2xx response, so the OpenAI branch is never attempted when both keys are configured. In production, transient OpenRouter 429/5xx errors will cause ingestion reconciliation to fail (or lose embeddings) even though a healthy fallback provider is available; catch this failure and continue to the OpenAI path instead of hard-failing here.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-06T16:19:59Z

integrations/smart-ingest/index.ts

+    : null;
+
+  for (const item of items) {
+    if (item.action === "skip") { skippedCount++; continue; }


Mark skipped dry-run items executed in /execute

In handleExecuteJob, skip actions are counted and immediately continued, but the corresponding ingestion_items row is never updated. Because dry-run persistence stores pending items as ready, these rows stay ready even after the job is marked complete, leaving job state inconsistent and potentially misleading any UI/automation that interprets ready as unprocessed. Update skipped rows to executed before continuing (as the immediate-execution path already does).

Useful? React with 👍 / 👎.

Add blank lines around headings (MD022), fenced code blocks (MD031), and between adjacent blockquotes (MD028). Fix broken link fragment (MD051) and remove extra blank line (MD012). No content changes. These errors were blocking CI on all open PRs since the lint check runs repo-wide. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Each section's numbered list now restarts at 1 instead of continuing the global count (3-14), satisfying markdownlint MD029/ol-prefix rule. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions bot added the integration label Apr 6, 2026

chatgpt-codex-connector bot reviewed Apr 6, 2026

View reviewed changes

alanshurafa and others added 2 commits April 6, 2026 13:32

fix: renumber ordered lists in thought-enrichment README for MD029

5c64d67

Each section's numbered list now restarts at 1 instead of continuing the global count (3-14), satisfying markdownlint MD029/ol-prefix rule. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions bot added documentation Improvements or additions to documentation recipe labels Apr 6, 2026

alanshurafa and others added 4 commits April 6, 2026 13:49

[integrations] Fix smart ingest Deno type errors

e1655e2

[integrations] Add tool audit link to Slack capture README

84b4bbf

[recipes] Remove secret-like placeholders from thought enrichment README

e196e9c

fix: atomic job execution CAS and graceful embedding fallback

be2136a

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[integrations] Smart ingest edge function#10

[integrations] Smart ingest edge function#10
alanshurafa wants to merge 7 commits intomainfrom
contrib/alanshurafa/smart-ingest

alanshurafa commented Apr 6, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Apr 6, 2026

Uh oh!

chatgpt-codex-connector bot Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

alanshurafa commented Apr 6, 2026

Summary

What It Does

Key Features

OB1 Adaptations

Files

Test plan

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant