Skip to content

[integrations] Smart ingest edge function#10

Open
alanshurafa wants to merge 7 commits intomainfrom
contrib/alanshurafa/smart-ingest
Open

[integrations] Smart ingest edge function#10
alanshurafa wants to merge 7 commits intomainfrom
contrib/alanshurafa/smart-ingest

Conversation

@alanshurafa
Copy link
Copy Markdown
Owner

Summary

  • Standalone Supabase Edge Function for LLM-powered atomic thought extraction from raw text
  • Ported from ExoCortex production smart-ingest pipeline (1,369 completed jobs) with OB1 adaptations
  • Depends on schemas/smart-ingest-tables (PR [schemas] Smart ingest pipeline tables #4) for ingestion_jobs and ingestion_items tables

What It Does

Accepts raw text via HTTP POST, extracts atomic thoughts using an LLM (OpenRouter primary, OpenAI/Anthropic fallback), then deduplicates each thought against existing brain content using both SHA-256 content fingerprinting and pgvector semantic similarity. Four reconciliation actions: add, skip, append_evidence, create_revision.

Key Features

  • Dry-run mode — preview extractions without writing to the database
  • Job execution — commit dry-run results via /execute endpoint
  • Quality gate — minimum 30 chars, minimum importance 3
  • Fingerprint + semantic dedup — 0.85 match threshold, 0.92 skip threshold
  • Source metadata threading — import_key session dedup, capture provenance
  • Text chunking — handles long documents (5000 word limit per LLM call)
  • Sensitivity pre-flight — blocks restricted content from cloud processing
  • Entity extraction trigger — optional, best-effort (non-fatal if worker not deployed)

OB1 Adaptations

Files

All within integrations/smart-ingest/:

File Lines Purpose
index.ts 1094 Edge function with extraction, dedup, and execution logic
_shared/helpers.ts 770 Shared utilities (embedding, fingerprint, sensitivity, payload prep)
_shared/config.ts 204 Constants, types, prompts
README.md 225 Setup guide with prerequisites, steps, API reference, troubleshooting
metadata.json 18 OB1 contribution metadata
deno.json 5 Deno import map

Test plan

  • Verify all 15 gate checks pass via gh pr checks
  • Validate metadata.json against .github/metadata.schema.json
  • Confirm README contains: "prerequisites", numbered steps, "expected outcome"
  • Confirm "05-tool-audit" string appears in README
  • Confirm all relative links resolve (../../docs/01-getting-started.md, ../../docs/05-tool-audit.md)
  • Confirm no files outside integrations/smart-ingest/
  • Deploy to test Supabase project and smoke-test dry-run + execute flow

🤖 Generated with Claude Code

Port ExoCortex production smart-ingest pipeline to OB1 as a standalone
Supabase Edge Function for LLM-powered atomic thought extraction from
raw text.

Features: dry-run preview, fingerprint + semantic dedup (0.85/0.92
thresholds), evidence append, job execution, quality gate, source
metadata threading, import_key session dedup, chunking for long texts.

OB1 adaptations: OpenRouter-first provider order, wildcard CORS,
model constants from _shared/config.ts, optional entity extraction
trigger, _shared/ helpers copied from enhanced-mcp (PR 5).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 661fe55dc6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +114 to +116
if (!response.ok) {
throw new Error(`OpenRouter embedding failed (${response.status}): ${await response.text()}`);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Fall back to OpenAI when OpenRouter embedding fails

embedText advertises OpenRouter-primary/OpenAI-fallback behavior, but this branch throws immediately on any OpenRouter non-2xx response, so the OpenAI branch is never attempted when both keys are configured. In production, transient OpenRouter 429/5xx errors will cause ingestion reconciliation to fail (or lose embeddings) even though a healthy fallback provider is available; catch this failure and continue to the OpenAI path instead of hard-failing here.

Useful? React with 👍 / 👎.

: null;

for (const item of items) {
if (item.action === "skip") { skippedCount++; continue; }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Mark skipped dry-run items executed in /execute

In handleExecuteJob, skip actions are counted and immediately continued, but the corresponding ingestion_items row is never updated. Because dry-run persistence stores pending items as ready, these rows stay ready even after the job is marked complete, leaving job state inconsistent and potentially misleading any UI/automation that interprets ready as unprocessed. Update skipped rows to executed before continuing (as the immediate-execution path already does).

Useful? React with 👍 / 👎.

alanshurafa and others added 2 commits April 6, 2026 13:32
Add blank lines around headings (MD022), fenced code blocks (MD031),
and between adjacent blockquotes (MD028). Fix broken link fragment
(MD051) and remove extra blank line (MD012). No content changes.

These errors were blocking CI on all open PRs since the lint check
runs repo-wide.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each section's numbered list now restarts at 1 instead of continuing
the global count (3-14), satisfying markdownlint MD029/ol-prefix rule.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added documentation Improvements or additions to documentation recipe labels Apr 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation integration recipe

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant