Skip to content

[schemas] Smart ingest pipeline tables#4

Open
alanshurafa wants to merge 5 commits intomainfrom
contrib/alanshurafa/smart-ingest-schema
Open

[schemas] Smart ingest pipeline tables#4
alanshurafa wants to merge 5 commits intomainfrom
contrib/alanshurafa/smart-ingest-schema

Conversation

@alanshurafa
Copy link
Copy Markdown
Owner

Summary

  • Adds ingestion_jobs and ingestion_items tables for tracking the extract-deduplicate-execute lifecycle of bulk text ingestion
  • Installs append_thought_evidence RPC for idempotent evidence accumulation on thoughts
  • Part of the OB1 alpha milestone (Wave 2 schema — depends on PR [schemas] Enhanced thoughts columns and utility RPCs #1 enhanced-thoughts)

What's included

Object Type Purpose
ingestion_jobs table One row per ingest invocation with status lifecycle and per-action counters
ingestion_items table Individual extracted thoughts with action codes, dedup reasons, and execution results
ingestion_items_job_idx index Fast job-to-item lookups
append_thought_evidence() RPC Idempotent evidence append using SHA256 identity hashing

Design decisions

  • Idempotent DDL: All CREATE TABLE IF NOT EXISTS and CREATE OR REPLACE FUNCTION patterns — safe to run multiple times
  • Dry-run lifecycle: Jobs progress through pending → extracting → dry_run_complete → executing → complete, supporting human-in-the-loop review before mutation
  • Content-hash dedup: input_hash unique constraint on jobs prevents duplicate processing of the same text
  • Evidence identity: append_thought_evidence uses SHA256 of (source_label + excerpt + thought_id) to prevent duplicate evidence entries
  • Cascade cleanup: Deleting a job automatically removes its items via ON DELETE CASCADE

Dependencies

Test plan

  • Paste schema.sql into Supabase SQL Editor and run — no errors
  • Confirm ingestion_jobs and ingestion_items tables appear in Table Editor
  • Confirm append_thought_evidence function appears in Database > Functions
  • Run SELECT count(*) FROM ingestion_jobs; — returns 0
  • Re-run the migration — no errors (idempotency check)

🤖 Generated with Claude Code

Add ingestion_jobs and ingestion_items tables for tracking the
extract-deduplicate-execute lifecycle of bulk text ingestion.
Install append_thought_evidence RPC for idempotent evidence
accumulation on thoughts. Part of the OB1 alpha milestone.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added the schema label Apr 6, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3abe49a057

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +113 to +117
UPDATE public.thoughts
SET metadata = jsonb_set(
coalesce(metadata, '{}'::jsonb),
'{evidence}',
v_current_evidence || jsonb_build_object(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Serialize evidence updates to avoid lost writes

append_thought_evidence reads metadata->'evidence' into v_current_evidence and later writes back v_current_evidence || ..., so concurrent calls on the same thought can clobber each other: if two workers append different evidence at the same time, the second write can overwrite the first and drop one entry. This breaks the function’s stated idempotent accumulation behavior for multi-item ingestion and should be fixed by locking the row (FOR UPDATE) or performing the append against the current row value inside a single UPDATE expression.

Useful? React with 👍 / 👎.

Comment on lines +145 to +146
GRANT EXECUTE ON FUNCTION public.append_thought_evidence(bigint, jsonb)
TO authenticated, anon, service_role;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restrict SECURITY DEFINER RPC from anon/auth roles

This migration grants EXECUTE on a SECURITY DEFINER function to authenticated and anon, which allows callers to mutate public.thoughts through the function even when row-level access is intended to be service-role-only (as configured in docs/01-getting-started.md with a service-role policy on thoughts). In Supabase deployments where anon keys are client-visible, this exposes an authorization bypass for arbitrary thought_id updates and should be limited to service_role (or enforce caller ownership checks inside the function).

Useful? React with 👍 / 👎.

alanshurafa and others added 2 commits April 6, 2026 10:41
Add blank lines around headings (MD022), fenced code blocks (MD031),
and between adjacent blockquotes (MD028). Fix broken link fragment
(MD051) and remove extra blank line (MD012). No content changes.

These errors were blocking CI on all open PRs since the lint check
runs repo-wide.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
alanshurafa and others added 2 commits April 6, 2026 13:53
SECURITY DEFINER function was granted to authenticated/anon, allowing
RLS bypass. Now restricted to service_role only. Added FOR UPDATE to
prevent concurrent evidence appends from losing writes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation integration recipe schema

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant