[recipes] Sensitivity tiers for content access control by alanshurafa · Pull Request #110 · NateBJones-Projects/OB1

alanshurafa · 2026-03-23T21:56:23Z

Summary

Three-tier content sensitivity classification: restricted, personal, standard
Regex pattern matching detects SSNs, passport numbers, API keys, passwords, credit cards, medication dosages, health data, and financial details
Includes migration SQL (ALTER TABLE thoughts ADD COLUMN sensitivity_tier), reusable TypeScript detection function, and JSON pattern config
No external dependencies — pure regex, instant classification
Patterns are customizable via sensitivity-patterns.json

Files

File	Purpose
`sensitivity-patterns.json`	Regex patterns for restricted (7) and personal (6) tiers
`detect-sensitivity.ts`	TypeScript function: `detectSensitivity(text, patterns) → { tier, reasons }`
`migration.sql`	Adds `sensitivity_tier` column with index and check constraint
`metadata.json`	OB1 metadata
`README.md`	Full guide with integration examples and backfill instructions

Test plan

Run migration SQL against a test Supabase instance
Import detect-sensitivity.ts and test against sample content
Verify restricted patterns short-circuit on first match
Verify personal patterns collect all matches
Verify standard tier returned when no patterns match

🤖 Generated with Claude Code

Three-tier classification (restricted/personal/standard) using regex pattern matching. Detects SSNs, API keys, passwords, credit cards, medications, health data, and financial details. Includes migration SQL, reusable TypeScript detection function, and pattern config. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

justfinethanku

Code Review for Sensitivity Tiers Primitive

Overall Assessment

This is a well-executed contribution that adds real value to the Open Brain ecosystem. The primitive introduces a clean, dependency-free content classification system using pure regex pattern matching.

What's Good

1. Solid technical approach

Pure TypeScript with no external dependencies keeps the solution lightweight
Short-circuit logic for restricted patterns (performance optimization)
Collects all personal-tier matches (helpful for debugging/audit trails)
Customizable patterns via JSON configuration (maintainable and extensible)

2. Complete implementation

Migration SQL with proper index and check constraint
Reusable detection function with clear type definitions
Well-documented README with integration examples
Proper metadata.json following repo standards

3. Strong documentation

Clear explanation of the three-tier system
Multiple integration examples (Deno, Node.js)
Backfill guidance for existing data
Troubleshooting section with common issues

4. Safety-conscious

No credentials or secrets
No destructive SQL operations
Proper use of IF NOT EXISTS and check constraints
Handles existing columns gracefully

Required Changes

1. Primitives category validation (blocking)

Per CONTRIBUTING.md section "Extensions vs Primitives vs Recipes":

Primitives are reusable concept guides that get referenced by multiple extensions. They teach a pattern... once, so extensions can link to them instead of re-explaining. A primitive should be referenced by at least 2 extensions.

This is currently a standalone concept with no existing extensions referencing it. You have two paths forward:

Option A (Recommended): Move this to category

Change folder location to
Update category field to
Update PR title to
This makes it immediately available to the community while allowing future promotion to primitive if 2+ extensions adopt it

Option B: Keep as primitive but defer merge until referenced by extensions

Open an issue proposing which extensions should adopt this pattern
Submit those extension updates to reference this primitive
Merge this PR after 2+ extension references exist

2. README: Add "Extensions That Use This" section (blocking if staying as primitive)

Per CONTRIBUTING.md primitive requirements:

Primitives additionally require: "Extensions That Use This" section listing which extensions reference this primitive

Since no extensions currently use this, Option A (moving to recipes) is cleaner.

Suggested Improvements (non-blocking)

1. Pattern accuracy considerations

The regex patterns are good starting points but may need refinement:

SSN pattern () will match dates like "2024-01-15" or invoice numbers
Credit card pattern doesn't validate Luhn checksum (will flag random 16-digit sequences)
Bank account pattern is very broad and may catch non-sensitive number references

Consider adding a comment in acknowledging the tradeoff between false positives and false negatives, with guidance for users to tune patterns based on their content.

2. TypeScript export structure

The current export is clean, but consider adding a convenience function:

export function detectSensitivitySimple(text: string): SensitivityTier {
  const patterns = compilePatterns(patternsJson);
  return detectSensitivity(text, patterns).tier;
}

This would simplify the most common use case (just need the tier, not the reasons).

3. Migration safety note

Add a note in migration.sql about the performance impact of backfilling large datasets:

-- NOTE: For large existing datasets (100K+ thoughts), consider running
-- the backfill in batches with LIMIT/OFFSET to avoid long-running transactions

4. Consider a pattern test suite

A JSON file with test cases would help users validate custom patterns:

{
  "restricted_tests": [
    {"input": "My SSN is 123-45-6789", "should_match": true},
    {"input": "Invoice #123-45-6789", "should_match": false}
  ]
}

Automated Review Status

All 14 automated checks should pass once the category is adjusted. The word count (155 words in main sections, 310 total) exceeds the 200-word requirement for primitives.

Verdict

Verdict: Minor fixes needed

Primary blocker: Primitives must be referenced by 2+ extensions per repo standards. Recommend moving to category (simple folder rename + metadata update) to unblock immediate merge while keeping future promotion path open.

The implementation quality is excellent. Once the category is corrected, this is ready to merge.

Action Items:

Move to folder
Update : change to
Update PR title to
(Optional) Consider adding test cases and the suggested improvements above

Looking forward to seeing this contribution land! The pattern matching approach is clean and the documentation is thorough.

justfinethanku

Code Review for Sensitivity Tiers Primitive

Overall Assessment

This is a well-executed contribution that adds real value to the Open Brain ecosystem. The primitive introduces a clean, dependency-free content classification system using pure regex pattern matching.

What's Good

1. Solid technical approach

Pure TypeScript with no external dependencies keeps the solution lightweight
Short-circuit logic for restricted patterns (performance optimization)
Collects all personal-tier matches (helpful for debugging/audit trails)
Customizable patterns via JSON configuration (maintainable and extensible)

2. Complete implementation

Migration SQL with proper index and check constraint
Reusable detection function with clear type definitions
Well-documented README with integration examples
Proper metadata.json following repo standards

3. Strong documentation

Clear explanation of the three-tier system
Multiple integration examples (Deno, Node.js)
Backfill guidance for existing data
Troubleshooting section with common issues

4. Safety-conscious

No credentials or secrets
No destructive SQL operations
Proper use of IF NOT EXISTS and check constraints
Handles existing columns gracefully

Required Changes

1. Primitives category validation (blocking)

Per CONTRIBUTING.md section "Extensions vs Primitives vs Recipes":

Primitives are reusable concept guides that get referenced by multiple extensions. They teach a pattern... once, so extensions can link to them instead of re-explaining. A primitive should be referenced by at least 2 extensions.

This is currently a standalone concept with no existing extensions referencing it. You have two paths forward:

Option A (Recommended): Move this to recipes/ category

Change folder location to recipes/sensitivity-tiers/
Update metadata.json category field to "recipes"
Update PR title to [recipes] Sensitivity tiers for content access control
This makes it immediately available to the community while allowing future promotion to primitive if 2+ extensions adopt it

Option B: Keep as primitive but defer merge until referenced by extensions

Open an issue proposing which extensions should adopt this pattern
Submit those extension updates to reference this primitive
Merge this PR after 2+ extension references exist

2. README: Add "Extensions That Use This" section (blocking if staying as primitive)

Per CONTRIBUTING.md primitive requirements:

Primitives additionally require: "Extensions That Use This" section listing which extensions reference this primitive

Since no extensions currently use this, Option A (moving to recipes) is cleaner.

Suggested Improvements (non-blocking)

1. Pattern accuracy considerations

The regex patterns are good starting points but may need refinement:

SSN pattern will match dates like "2024-01-15" or invoice numbers
Credit card pattern doesn't validate Luhn checksum (will flag random 16-digit sequences)
Bank account pattern is very broad and may catch non-sensitive number references

Consider adding a comment in sensitivity-patterns.json acknowledging the tradeoff between false positives and false negatives, with guidance for users to tune patterns based on their content.

2. TypeScript export structure

The current export is clean, but consider adding a convenience function for the most common use case (just need the tier, not the reasons).

3. Migration safety note

Add a note in migration.sql about the performance impact of backfilling large datasets - consider running backfill in batches with LIMIT/OFFSET to avoid long-running transactions for 100K+ thoughts.

4. Consider a pattern test suite

A JSON file with test cases would help users validate custom patterns and understand false positive scenarios.

Automated Review Status

All 14 automated checks should pass once the category is adjusted. The word count (155 words in main sections, 310 total) exceeds the 200-word requirement for primitives.

Verdict

Verdict: Minor fixes needed

Primary blocker: Primitives must be referenced by 2+ extensions per repo standards. Recommend moving to recipes/ category (simple folder rename + metadata update) to unblock immediate merge while keeping future promotion path open.

The implementation quality is excellent. Once the category is corrected, this is ready to merge.

Action Items:

Move to recipes/sensitivity-tiers/ folder
Update metadata.json: change "category": "primitives" to "category": "recipes"
Update PR title to [recipes] Sensitivity tiers for content access control
(Optional) Consider adding test cases and the suggested improvements above

Looking forward to seeing this contribution land! The pattern matching approach is clean and the documentation is thorough.

justfinethanku · 2026-03-24T01:46:20Z

Code Review for Sensitivity Tiers Primitive

Overall Assessment

This is a well-executed contribution that adds real value to the Open Brain ecosystem. The primitive introduces a clean, dependency-free content classification system using pure regex pattern matching.

What's Good

1. Solid technical approach

Pure TypeScript with no external dependencies keeps the solution lightweight
Short-circuit logic for restricted patterns (performance optimization)
Collects all personal-tier matches (helpful for debugging/audit trails)
Customizable patterns via JSON configuration (maintainable and extensible)

2. Complete implementation

Migration SQL with proper index and check constraint
Reusable detection function with clear type definitions
Well-documented README with integration examples
Proper metadata.json following repo standards

3. Strong documentation

Clear explanation of the three-tier system
Multiple integration examples (Deno, Node.js)
Backfill guidance for existing data
Troubleshooting section with common issues

4. Safety-conscious

No credentials or secrets
No destructive SQL operations
Proper use of IF NOT EXISTS and check constraints
Handles existing columns gracefully

Required Changes

1. Primitives category validation (blocking)

Per CONTRIBUTING.md section "Extensions vs Primitives vs Recipes":

Primitives are reusable concept guides that get referenced by multiple extensions. They teach a pattern... once, so extensions can link to them instead of re-explaining. A primitive should be referenced by at least 2 extensions.

This is currently a standalone concept with no existing extensions referencing it. You have two paths forward:

Option A (Recommended): Move this to recipes/ category

Change folder location to recipes/sensitivity-tiers/
Update metadata.json category field to "recipes"
Update PR title to [recipes] Sensitivity tiers for content access control
This makes it immediately available to the community while allowing future promotion to primitive if 2+ extensions adopt it

Option B: Keep as primitive but defer merge until referenced by extensions

Open an issue proposing which extensions should adopt this pattern
Submit those extension updates to reference this primitive
Merge this PR after 2+ extension references exist

2. README: Add "Extensions That Use This" section (blocking if staying as primitive)

Per CONTRIBUTING.md primitive requirements:

Primitives additionally require: "Extensions That Use This" section listing which extensions reference this primitive

Since no extensions currently use this, Option A (moving to recipes) is cleaner.

Suggested Improvements (non-blocking)

1. Pattern accuracy considerations

The regex patterns are good starting points but may need refinement:

SSN pattern will match dates like "2024-01-15" or invoice numbers
Credit card pattern doesn't validate Luhn checksum (will flag random 16-digit sequences)
Bank account pattern is very broad and may catch non-sensitive number references

Consider adding a comment in sensitivity-patterns.json acknowledging the tradeoff between false positives and false negatives, with guidance for users to tune patterns based on their content.

2. TypeScript export structure

The current export is clean, but consider adding a convenience function for the most common use case (just need the tier, not the reasons).

3. Migration safety note

Add a note in migration.sql about the performance impact of backfilling large datasets - consider running backfill in batches with LIMIT/OFFSET to avoid long-running transactions for 100K+ thoughts.

4. Consider a pattern test suite

A JSON file with test cases would help users validate custom patterns and understand false positive scenarios.

Automated Review Status

All 14 automated checks should pass once the category is adjusted. The word count (155 words in main sections, 310 total) exceeds the 200-word requirement for primitives.

Verdict

Verdict: Minor fixes needed

Primary blocker: Primitives must be referenced by 2+ extensions per repo standards. Recommend moving to recipes/ category (simple folder rename + metadata update) to unblock immediate merge while keeping future promotion path open.

The implementation quality is excellent. Once the category is corrected, this is ready to merge.

Action Items:

Move to recipes/sensitivity-tiers/ folder
Update metadata.json: change "category": "primitives" to "category": "recipes"
Update PR title to [recipes] Sensitivity tiers for content access control
(Optional) Consider adding test cases and the suggested improvements above

Looking forward to seeing this contribution land! The pattern matching approach is clean and the documentation is thorough.

Primitives require 2+ extension references per CONTRIBUTING.md. Moving to recipes per maintainer review feedback. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

resolveSensitivityTier() ensures overrides can only escalate (standard → personal → restricted), never downgrade. Unrecognized values normalize to "personal" as a safe default. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

matthallett1 · 2026-03-30T19:14:23Z

@claude review this PR

github-actions · 2026-03-30T19:14:39Z

Claude Code is working…

I'll analyze this and get back to you.

View job run

matthallett1 · 2026-03-30T19:23:04Z

@claude review this PR

github-actions · 2026-03-30T19:23:16Z

Claude Code is working…

I'll analyze this and get back to you.

View job run

matthallett1 · 2026-03-30T20:14:08Z

@claude review this PR

github-actions · 2026-03-30T20:14:25Z

Claude finished @matthallett1's task in 1m 28s —— View job

PR Review: [recipes] Sensitivity tiers for content access control

Read PR context and existing comments
Read changed files
Review for alignment, clarity, completeness, safety, value, and design patterns
Post final review

Context

The previous review by @justfinethanku flagged a blocking issue (primitives category requiring 2+ extension references). That's been resolved — the contribution is now correctly placed in recipes/sensitivity-tiers/ with "category": "recipes" in metadata.json. The PR title is updated accordingly. Good work addressing that feedback.

Alignment

Fits OB1's mission well. This directly extends the thoughts table (adds a column, doesn't touch existing ones) and provides a client-side utility that integrates into capture/ingest pipelines. Zero-dependency, no external services, no independent deployment. Exactly what a recipe should be.

Clarity

The README is clear and well-structured. The step ordering is logical: migrate → integrate → backfill → query. Code examples cover both Deno and Node.js. The "No credentials needed" callout is a nice touch that reduces cognitive overhead for the reader.

One gap: Step 2 introduces resolveSensitivityTier() inline without warning the reader this is optional. The escalation-only semantics section sits between Steps 2 and 3, which may confuse readers who just want the basic case. Consider moving it to an "Advanced Usage" section below Step 4, or adding a note that it's only needed when accepting external overrides.

Completeness

All required README sections present: What it does ✅, Prerequisites ✅, Step-by-step instructions ✅, Expected Outcome ✅, Troubleshooting ✅. metadata.json has all required fields ✅.

Safety

No credentials or secrets ✅
No destructive SQL ✅
ALTER TABLE only adds a new column, doesn't touch existing ones ✅
No local MCP servers ✅
IF NOT EXISTS on column and index ✅

One Real Issue: Migration is Not Fully Idempotent

migration.sql line 19 (ADD CONSTRAINT chk_sensitivity_tier) has no equivalent of IF NOT EXISTS. PostgreSQL doesn't support ADD CONSTRAINT IF NOT EXISTS, so re-running the migration will fail with ERROR: constraint "chk_sensitivity_tier" for relation "thoughts" already exists.

The troubleshooting section addresses the column case but not the constraint case. This will bite users who run the migration twice or restore from a backup that already has the column.

Suggested fix:

DO $$ BEGIN
  ALTER TABLE thoughts
    ADD CONSTRAINT chk_sensitivity_tier
    CHECK (sensitivity_tier IN ('standard', 'personal', 'restricted'));
EXCEPTION
  WHEN duplicate_object THEN NULL;
END $$;

Or simply add a troubleshooting entry: Fix this →

Non-Blocking Observations

SSN false positives (flagged in prior review, still present): \b\d{3}-?\d{2}-?\d{4}\b matches invoice numbers, part numbers, and other hyphenated numeric strings. The sensitivity-patterns.json README note about false positives is a good safety valve, but the patterns file itself doesn't have any comment about this tradeoff. Since JSON doesn't support comments, a sentence in the README's Customizing Patterns section like "The SSN pattern has a known false-positive rate on invoice and part numbers; consider word-boundary context if this matters for your dataset" would help users calibrate expectations.

Backfill batching: The example backfill loop fetches all thoughts in one query with no LIMIT. For datasets over ~10K thoughts this could cause timeout or memory issues. A note (not a code rewrite) in Step 3 recommending batch processing for large datasets would be useful.

resolveSensitivityTier escalates unrecognized strings to personal (detect-sensitivity.ts:117–120). This is intentional ("safe default") and documented in the JSDoc, but it's a surprising behavior. If a caller passes "restricted" as a string override and there's a typo like "Restricted" (capital R), it would escalate to personal instead of restricted. The normalization on line 113 (toLowerCase()) handles this case, but a note in the README about this behavior would help.

Value

This is genuinely useful and different from anything else in the repo. Sensitivity classification is a real gap — without it, all thoughts are treated equally regardless of whether they contain credentials or health data. The escalation-only semantics and the separation of pattern config from logic are good design decisions.

Verdict

Approve with one requested fix: Make the constraint addition idempotent (either via DO block or adding a troubleshooting note for the constraint duplicate error). Everything else is solid. The quality of this implementation and documentation is well above the bar for a recipe contribution.

matthallett1

The concept and documentation quality are strong — sensitivity tiers fill a real gap. But Codex review (gpt-5.4) found a critical issue and several pattern gaps:

P1: MCP retrieval doesn't actually filter restricted thoughts. The README claims restricted thoughts are excluded from MCP search, but neither server/index.ts nor the Edge Function accepts exclude_restricted, and match_thoughts can only filter on metadata, not the new sensitivity_tier column. After following this recipe, restricted thoughts still show up in search results. The advertised privacy behavior isn't delivered.

This is the main blocker. Options:

Add sensitivity_tier filtering to match_thoughts (would need a schema PR for the RPC change)
Store sensitivity_tier in metadata JSONB instead of a top-level column, so existing metadata filters work
Remove the MCP filtering claim from the README and document it as a capture-time classification only (honest about current capability)

P2: Non-idempotent constraint. ADD CONSTRAINT chk_sensitivity_tier fails on re-run. Wrap in DO $$ BEGIN ... EXCEPTION WHEN duplicate_object THEN NULL; END $$;

P2: Pattern gaps in restricted detection:

sk-proj-... (OpenAI dashed API keys) — regex only allows alphanumerics after prefix, misses modern dashed formats
15-digit AmEx card numbers — pattern only matches 4x4 digit groups

The classification logic and escalation semantics are well-designed. These are fixable issues, not architectural problems.

Replace misleading claim that MCP tools filter restricted thoughts with honest documentation: match_thoughts requires MCP-layer post-query filtering, search_thoughts_text supports exclude_restricted at RPC level. Wrap CHECK constraint in DO/EXCEPTION block for re-run idempotency. Fix API key pattern to match dashed formats (sk-proj-xxx-xxx) and add 15-digit AmEx card number pattern. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

alanshurafa · 2026-04-06T02:25:36Z

Review feedback from @matthallett1 addressed in commit 50f46b3:

P1 — MCP filtering claim fixed: Replaced the misleading exclude_restricted=true guidance with honest docs. match_thoughts returns sensitivity_tier but requires MCP-layer post-query filtering. search_thoughts_text supports exclude_restricted natively. Added code examples for both approaches.
P2 — Idempotent constraint: Wrapped ADD CONSTRAINT chk_sensitivity_tier in DO/EXCEPTION block to handle re-runs.
P2 — Regex patterns: API key pattern now matches dashed formats (sk-proj-[A-Za-z0-9_-]+). Added 15-digit AmEx card pattern (3[47]xx-xxxxxx-xxxxx).

Ready for re-review.

alanshurafa · 2026-04-06T14:01:05Z

Closing this PR as part of the OB1 Alpha Milestone consolidation. This feature is being rebuilt as one of 12 clean, gate-compliant PRs that together form the alpha upgrade path. The consolidated PRs will be submitted once verified on the fork. See the full plan for details.

alanshurafa · 2026-04-06T15:02:57Z

Closing as part of the OB1 Alpha Milestone consolidation. This feature is being rebuilt as one of 12 clean, gate-compliant PRs that together form the alpha upgrade path. The consolidated PRs will be submitted once verified on the fork.

alanshurafa mentioned this pull request Mar 23, 2026

[dashboards] Full-featured web dashboard for thought management #111

Merged

9 tasks

justfinethanku reviewed Mar 24, 2026

View reviewed changes

[recipes] Move sensitivity tiers from primitives to recipes

0137b48

Primitives require 2+ extension references per CONTRIBUTING.md. Moving to recipes per maintainer review feedback. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

alanshurafa changed the title ~~[primitives] Sensitivity tiers for content access control~~ [recipes] Sensitivity tiers for content access control Mar 25, 2026

matthallett1 requested changes Mar 30, 2026

View reviewed changes

shansen3 mentioned this pull request Mar 31, 2026

[dashboards] Link Next.js dashboard to open-brain-rest dependency #130

Closed

10 tasks

github-actions bot added the recipe Contribution: step-by-step recipe label Apr 6, 2026

alanshurafa closed this Apr 6, 2026

alanshurafa reopened this Apr 6, 2026

alanshurafa closed this Apr 6, 2026

Conversation

alanshurafa commented Mar 23, 2026

Summary

Files

Test plan

Uh oh!

justfinethanku left a comment

Choose a reason for hiding this comment

Code Review for Sensitivity Tiers Primitive

Overall Assessment

What's Good

Required Changes

Suggested Improvements (non-blocking)

Automated Review Status

Verdict

Uh oh!

justfinethanku left a comment

Choose a reason for hiding this comment

Code Review for Sensitivity Tiers Primitive

Overall Assessment

What's Good

Required Changes

Suggested Improvements (non-blocking)

Automated Review Status

Verdict

Uh oh!

justfinethanku commented Mar 24, 2026

Code Review for Sensitivity Tiers Primitive

Overall Assessment

What's Good

Required Changes

Suggested Improvements (non-blocking)

Automated Review Status

Verdict

Uh oh!

matthallett1 commented Mar 30, 2026

Uh oh!

github-actions bot commented Mar 30, 2026

Uh oh!

matthallett1 commented Mar 30, 2026

Uh oh!

github-actions bot commented Mar 30, 2026

Uh oh!

matthallett1 commented Mar 30, 2026

Uh oh!

github-actions bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: [recipes] Sensitivity tiers for content access control

Context

Alignment

Clarity

Completeness

Safety

One Real Issue: Migration is Not Fully Idempotent

Non-Blocking Observations

Value

Verdict

Uh oh!

matthallett1 left a comment

Choose a reason for hiding this comment

Uh oh!

alanshurafa commented Apr 6, 2026

Uh oh!

alanshurafa commented Apr 6, 2026

Uh oh!

alanshurafa commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Mar 30, 2026 •

edited

Loading