Skip to content

[recipes] Sensitivity tiers for content access control#110

Closed
alanshurafa wants to merge 4 commits intoNateBJones-Projects:mainfrom
alanshurafa:contrib/alanshurafa/sensitivity-tiers
Closed

[recipes] Sensitivity tiers for content access control#110
alanshurafa wants to merge 4 commits intoNateBJones-Projects:mainfrom
alanshurafa:contrib/alanshurafa/sensitivity-tiers

Conversation

@alanshurafa
Copy link
Copy Markdown
Contributor

Summary

  • Three-tier content sensitivity classification: restricted, personal, standard
  • Regex pattern matching detects SSNs, passport numbers, API keys, passwords, credit cards, medication dosages, health data, and financial details
  • Includes migration SQL (ALTER TABLE thoughts ADD COLUMN sensitivity_tier), reusable TypeScript detection function, and JSON pattern config
  • No external dependencies — pure regex, instant classification
  • Patterns are customizable via sensitivity-patterns.json

Files

File Purpose
sensitivity-patterns.json Regex patterns for restricted (7) and personal (6) tiers
detect-sensitivity.ts TypeScript function: detectSensitivity(text, patterns) → { tier, reasons }
migration.sql Adds sensitivity_tier column with index and check constraint
metadata.json OB1 metadata
README.md Full guide with integration examples and backfill instructions

Test plan

  • Run migration SQL against a test Supabase instance
  • Import detect-sensitivity.ts and test against sample content
  • Verify restricted patterns short-circuit on first match
  • Verify personal patterns collect all matches
  • Verify standard tier returned when no patterns match

🤖 Generated with Claude Code

Three-tier classification (restricted/personal/standard) using regex
pattern matching. Detects SSNs, API keys, passwords, credit cards,
medications, health data, and financial details. Includes migration
SQL, reusable TypeScript detection function, and pattern config.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@justfinethanku justfinethanku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review for Sensitivity Tiers Primitive

Overall Assessment

This is a well-executed contribution that adds real value to the Open Brain ecosystem. The primitive introduces a clean, dependency-free content classification system using pure regex pattern matching.

What's Good

1. Solid technical approach

  • Pure TypeScript with no external dependencies keeps the solution lightweight
  • Short-circuit logic for restricted patterns (performance optimization)
  • Collects all personal-tier matches (helpful for debugging/audit trails)
  • Customizable patterns via JSON configuration (maintainable and extensible)

2. Complete implementation

  • Migration SQL with proper index and check constraint
  • Reusable detection function with clear type definitions
  • Well-documented README with integration examples
  • Proper metadata.json following repo standards

3. Strong documentation

  • Clear explanation of the three-tier system
  • Multiple integration examples (Deno, Node.js)
  • Backfill guidance for existing data
  • Troubleshooting section with common issues

4. Safety-conscious

  • No credentials or secrets
  • No destructive SQL operations
  • Proper use of IF NOT EXISTS and check constraints
  • Handles existing columns gracefully

Required Changes

1. Primitives category validation (blocking)

Per CONTRIBUTING.md section "Extensions vs Primitives vs Recipes":

Primitives are reusable concept guides that get referenced by multiple extensions. They teach a pattern... once, so extensions can link to them instead of re-explaining. A primitive should be referenced by at least 2 extensions.

This is currently a standalone concept with no existing extensions referencing it. You have two paths forward:

Option A (Recommended): Move this to category

  • Change folder location to
  • Update category field to
  • Update PR title to
  • This makes it immediately available to the community while allowing future promotion to primitive if 2+ extensions adopt it

Option B: Keep as primitive but defer merge until referenced by extensions

  • Open an issue proposing which extensions should adopt this pattern
  • Submit those extension updates to reference this primitive
  • Merge this PR after 2+ extension references exist

2. README: Add "Extensions That Use This" section (blocking if staying as primitive)

Per CONTRIBUTING.md primitive requirements:

Primitives additionally require: "Extensions That Use This" section listing which extensions reference this primitive

Since no extensions currently use this, Option A (moving to recipes) is cleaner.

Suggested Improvements (non-blocking)

1. Pattern accuracy considerations

The regex patterns are good starting points but may need refinement:

  • SSN pattern () will match dates like "2024-01-15" or invoice numbers
  • Credit card pattern doesn't validate Luhn checksum (will flag random 16-digit sequences)
  • Bank account pattern is very broad and may catch non-sensitive number references

Consider adding a comment in acknowledging the tradeoff between false positives and false negatives, with guidance for users to tune patterns based on their content.

2. TypeScript export structure

The current export is clean, but consider adding a convenience function:

export function detectSensitivitySimple(text: string): SensitivityTier {
  const patterns = compilePatterns(patternsJson);
  return detectSensitivity(text, patterns).tier;
}

This would simplify the most common use case (just need the tier, not the reasons).

3. Migration safety note

Add a note in migration.sql about the performance impact of backfilling large datasets:

-- NOTE: For large existing datasets (100K+ thoughts), consider running
-- the backfill in batches with LIMIT/OFFSET to avoid long-running transactions

4. Consider a pattern test suite

A JSON file with test cases would help users validate custom patterns:

{
  "restricted_tests": [
    {"input": "My SSN is 123-45-6789", "should_match": true},
    {"input": "Invoice #123-45-6789", "should_match": false}
  ]
}

Automated Review Status

All 14 automated checks should pass once the category is adjusted. The word count (155 words in main sections, 310 total) exceeds the 200-word requirement for primitives.

Verdict

Verdict: Minor fixes needed

Primary blocker: Primitives must be referenced by 2+ extensions per repo standards. Recommend moving to category (simple folder rename + metadata update) to unblock immediate merge while keeping future promotion path open.

The implementation quality is excellent. Once the category is corrected, this is ready to merge.


Action Items:

  1. Move to folder
  2. Update : change to
  3. Update PR title to
  4. (Optional) Consider adding test cases and the suggested improvements above

Looking forward to seeing this contribution land! The pattern matching approach is clean and the documentation is thorough.

Copy link
Copy Markdown
Collaborator

@justfinethanku justfinethanku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review for Sensitivity Tiers Primitive

Overall Assessment

This is a well-executed contribution that adds real value to the Open Brain ecosystem. The primitive introduces a clean, dependency-free content classification system using pure regex pattern matching.

What's Good

1. Solid technical approach

  • Pure TypeScript with no external dependencies keeps the solution lightweight
  • Short-circuit logic for restricted patterns (performance optimization)
  • Collects all personal-tier matches (helpful for debugging/audit trails)
  • Customizable patterns via JSON configuration (maintainable and extensible)

2. Complete implementation

  • Migration SQL with proper index and check constraint
  • Reusable detection function with clear type definitions
  • Well-documented README with integration examples
  • Proper metadata.json following repo standards

3. Strong documentation

  • Clear explanation of the three-tier system
  • Multiple integration examples (Deno, Node.js)
  • Backfill guidance for existing data
  • Troubleshooting section with common issues

4. Safety-conscious

  • No credentials or secrets
  • No destructive SQL operations
  • Proper use of IF NOT EXISTS and check constraints
  • Handles existing columns gracefully

Required Changes

1. Primitives category validation (blocking)

Per CONTRIBUTING.md section "Extensions vs Primitives vs Recipes":

Primitives are reusable concept guides that get referenced by multiple extensions. They teach a pattern... once, so extensions can link to them instead of re-explaining. A primitive should be referenced by at least 2 extensions.

This is currently a standalone concept with no existing extensions referencing it. You have two paths forward:

Option A (Recommended): Move this to recipes/ category

  • Change folder location to recipes/sensitivity-tiers/
  • Update metadata.json category field to "recipes"
  • Update PR title to [recipes] Sensitivity tiers for content access control
  • This makes it immediately available to the community while allowing future promotion to primitive if 2+ extensions adopt it

Option B: Keep as primitive but defer merge until referenced by extensions

  • Open an issue proposing which extensions should adopt this pattern
  • Submit those extension updates to reference this primitive
  • Merge this PR after 2+ extension references exist

2. README: Add "Extensions That Use This" section (blocking if staying as primitive)

Per CONTRIBUTING.md primitive requirements:

Primitives additionally require: "Extensions That Use This" section listing which extensions reference this primitive

Since no extensions currently use this, Option A (moving to recipes) is cleaner.

Suggested Improvements (non-blocking)

1. Pattern accuracy considerations

The regex patterns are good starting points but may need refinement:

  • SSN pattern will match dates like "2024-01-15" or invoice numbers
  • Credit card pattern doesn't validate Luhn checksum (will flag random 16-digit sequences)
  • Bank account pattern is very broad and may catch non-sensitive number references

Consider adding a comment in sensitivity-patterns.json acknowledging the tradeoff between false positives and false negatives, with guidance for users to tune patterns based on their content.

2. TypeScript export structure

The current export is clean, but consider adding a convenience function for the most common use case (just need the tier, not the reasons).

3. Migration safety note

Add a note in migration.sql about the performance impact of backfilling large datasets - consider running backfill in batches with LIMIT/OFFSET to avoid long-running transactions for 100K+ thoughts.

4. Consider a pattern test suite

A JSON file with test cases would help users validate custom patterns and understand false positive scenarios.

Automated Review Status

All 14 automated checks should pass once the category is adjusted. The word count (155 words in main sections, 310 total) exceeds the 200-word requirement for primitives.

Verdict

Verdict: Minor fixes needed

Primary blocker: Primitives must be referenced by 2+ extensions per repo standards. Recommend moving to recipes/ category (simple folder rename + metadata update) to unblock immediate merge while keeping future promotion path open.

The implementation quality is excellent. Once the category is corrected, this is ready to merge.


Action Items:

  1. Move to recipes/sensitivity-tiers/ folder
  2. Update metadata.json: change "category": "primitives" to "category": "recipes"
  3. Update PR title to [recipes] Sensitivity tiers for content access control
  4. (Optional) Consider adding test cases and the suggested improvements above

Looking forward to seeing this contribution land! The pattern matching approach is clean and the documentation is thorough.

@justfinethanku
Copy link
Copy Markdown
Collaborator

Code Review for Sensitivity Tiers Primitive

Overall Assessment

This is a well-executed contribution that adds real value to the Open Brain ecosystem. The primitive introduces a clean, dependency-free content classification system using pure regex pattern matching.

What's Good

1. Solid technical approach

  • Pure TypeScript with no external dependencies keeps the solution lightweight
  • Short-circuit logic for restricted patterns (performance optimization)
  • Collects all personal-tier matches (helpful for debugging/audit trails)
  • Customizable patterns via JSON configuration (maintainable and extensible)

2. Complete implementation

  • Migration SQL with proper index and check constraint
  • Reusable detection function with clear type definitions
  • Well-documented README with integration examples
  • Proper metadata.json following repo standards

3. Strong documentation

  • Clear explanation of the three-tier system
  • Multiple integration examples (Deno, Node.js)
  • Backfill guidance for existing data
  • Troubleshooting section with common issues

4. Safety-conscious

  • No credentials or secrets
  • No destructive SQL operations
  • Proper use of IF NOT EXISTS and check constraints
  • Handles existing columns gracefully

Required Changes

1. Primitives category validation (blocking)

Per CONTRIBUTING.md section "Extensions vs Primitives vs Recipes":

Primitives are reusable concept guides that get referenced by multiple extensions. They teach a pattern... once, so extensions can link to them instead of re-explaining. A primitive should be referenced by at least 2 extensions.

This is currently a standalone concept with no existing extensions referencing it. You have two paths forward:

Option A (Recommended): Move this to recipes/ category

  • Change folder location to recipes/sensitivity-tiers/
  • Update metadata.json category field to "recipes"
  • Update PR title to [recipes] Sensitivity tiers for content access control
  • This makes it immediately available to the community while allowing future promotion to primitive if 2+ extensions adopt it

Option B: Keep as primitive but defer merge until referenced by extensions

  • Open an issue proposing which extensions should adopt this pattern
  • Submit those extension updates to reference this primitive
  • Merge this PR after 2+ extension references exist

2. README: Add "Extensions That Use This" section (blocking if staying as primitive)

Per CONTRIBUTING.md primitive requirements:

Primitives additionally require: "Extensions That Use This" section listing which extensions reference this primitive

Since no extensions currently use this, Option A (moving to recipes) is cleaner.

Suggested Improvements (non-blocking)

1. Pattern accuracy considerations

The regex patterns are good starting points but may need refinement:

  • SSN pattern will match dates like "2024-01-15" or invoice numbers
  • Credit card pattern doesn't validate Luhn checksum (will flag random 16-digit sequences)
  • Bank account pattern is very broad and may catch non-sensitive number references

Consider adding a comment in sensitivity-patterns.json acknowledging the tradeoff between false positives and false negatives, with guidance for users to tune patterns based on their content.

2. TypeScript export structure

The current export is clean, but consider adding a convenience function for the most common use case (just need the tier, not the reasons).

3. Migration safety note

Add a note in migration.sql about the performance impact of backfilling large datasets - consider running backfill in batches with LIMIT/OFFSET to avoid long-running transactions for 100K+ thoughts.

4. Consider a pattern test suite

A JSON file with test cases would help users validate custom patterns and understand false positive scenarios.

Automated Review Status

All 14 automated checks should pass once the category is adjusted. The word count (155 words in main sections, 310 total) exceeds the 200-word requirement for primitives.

Verdict

Verdict: Minor fixes needed

Primary blocker: Primitives must be referenced by 2+ extensions per repo standards. Recommend moving to recipes/ category (simple folder rename + metadata update) to unblock immediate merge while keeping future promotion path open.

The implementation quality is excellent. Once the category is corrected, this is ready to merge.


Action Items:

  1. Move to recipes/sensitivity-tiers/ folder
  2. Update metadata.json: change "category": "primitives" to "category": "recipes"
  3. Update PR title to [recipes] Sensitivity tiers for content access control
  4. (Optional) Consider adding test cases and the suggested improvements above

Looking forward to seeing this contribution land! The pattern matching approach is clean and the documentation is thorough.

Primitives require 2+ extension references per CONTRIBUTING.md.
Moving to recipes per maintainer review feedback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@alanshurafa alanshurafa changed the title [primitives] Sensitivity tiers for content access control [recipes] Sensitivity tiers for content access control Mar 25, 2026
resolveSensitivityTier() ensures overrides can only escalate
(standard → personal → restricted), never downgrade. Unrecognized
values normalize to "personal" as a safe default.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@matthallett1
Copy link
Copy Markdown
Collaborator

@claude review this PR

@github-actions
Copy link
Copy Markdown

Claude Code is working…

I'll analyze this and get back to you.

View job run

@matthallett1
Copy link
Copy Markdown
Collaborator

@claude review this PR

@github-actions
Copy link
Copy Markdown

Claude Code is working…

I'll analyze this and get back to you.

View job run

@matthallett1
Copy link
Copy Markdown
Collaborator

@claude review this PR

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 30, 2026

Claude finished @matthallett1's task in 1m 28s —— View job


PR Review: [recipes] Sensitivity tiers for content access control

  • Read PR context and existing comments
  • Read changed files
  • Review for alignment, clarity, completeness, safety, value, and design patterns
  • Post final review

Context

The previous review by @justfinethanku flagged a blocking issue (primitives category requiring 2+ extension references). That's been resolved — the contribution is now correctly placed in recipes/sensitivity-tiers/ with "category": "recipes" in metadata.json. The PR title is updated accordingly. Good work addressing that feedback.


Alignment

Fits OB1's mission well. This directly extends the thoughts table (adds a column, doesn't touch existing ones) and provides a client-side utility that integrates into capture/ingest pipelines. Zero-dependency, no external services, no independent deployment. Exactly what a recipe should be.


Clarity

The README is clear and well-structured. The step ordering is logical: migrate → integrate → backfill → query. Code examples cover both Deno and Node.js. The "No credentials needed" callout is a nice touch that reduces cognitive overhead for the reader.

One gap: Step 2 introduces resolveSensitivityTier() inline without warning the reader this is optional. The escalation-only semantics section sits between Steps 2 and 3, which may confuse readers who just want the basic case. Consider moving it to an "Advanced Usage" section below Step 4, or adding a note that it's only needed when accepting external overrides.


Completeness

All required README sections present: What it does ✅, Prerequisites ✅, Step-by-step instructions ✅, Expected Outcome ✅, Troubleshooting ✅. metadata.json has all required fields ✅.


Safety

  • No credentials or secrets ✅
  • No destructive SQL ✅
  • ALTER TABLE only adds a new column, doesn't touch existing ones ✅
  • No local MCP servers ✅
  • IF NOT EXISTS on column and index ✅

One Real Issue: Migration is Not Fully Idempotent

migration.sql line 19 (ADD CONSTRAINT chk_sensitivity_tier) has no equivalent of IF NOT EXISTS. PostgreSQL doesn't support ADD CONSTRAINT IF NOT EXISTS, so re-running the migration will fail with ERROR: constraint "chk_sensitivity_tier" for relation "thoughts" already exists.

The troubleshooting section addresses the column case but not the constraint case. This will bite users who run the migration twice or restore from a backup that already has the column.

Suggested fix:

DO $$ BEGIN
  ALTER TABLE thoughts
    ADD CONSTRAINT chk_sensitivity_tier
    CHECK (sensitivity_tier IN ('standard', 'personal', 'restricted'));
EXCEPTION
  WHEN duplicate_object THEN NULL;
END $$;

Or simply add a troubleshooting entry: Fix this →


Non-Blocking Observations

SSN false positives (flagged in prior review, still present): \b\d{3}-?\d{2}-?\d{4}\b matches invoice numbers, part numbers, and other hyphenated numeric strings. The sensitivity-patterns.json README note about false positives is a good safety valve, but the patterns file itself doesn't have any comment about this tradeoff. Since JSON doesn't support comments, a sentence in the README's Customizing Patterns section like "The SSN pattern has a known false-positive rate on invoice and part numbers; consider word-boundary context if this matters for your dataset" would help users calibrate expectations.

Backfill batching: The example backfill loop fetches all thoughts in one query with no LIMIT. For datasets over ~10K thoughts this could cause timeout or memory issues. A note (not a code rewrite) in Step 3 recommending batch processing for large datasets would be useful.

resolveSensitivityTier escalates unrecognized strings to personal (detect-sensitivity.ts:117–120). This is intentional ("safe default") and documented in the JSDoc, but it's a surprising behavior. If a caller passes "restricted" as a string override and there's a typo like "Restricted" (capital R), it would escalate to personal instead of restricted. The normalization on line 113 (toLowerCase()) handles this case, but a note in the README about this behavior would help.


Value

This is genuinely useful and different from anything else in the repo. Sensitivity classification is a real gap — without it, all thoughts are treated equally regardless of whether they contain credentials or health data. The escalation-only semantics and the separation of pattern config from logic are good design decisions.


Verdict

Approve with one requested fix: Make the constraint addition idempotent (either via DO block or adding a troubleshooting note for the constraint duplicate error). Everything else is solid. The quality of this implementation and documentation is well above the bar for a recipe contribution.

Copy link
Copy Markdown
Collaborator

@matthallett1 matthallett1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The concept and documentation quality are strong — sensitivity tiers fill a real gap. But Codex review (gpt-5.4) found a critical issue and several pattern gaps:

P1: MCP retrieval doesn't actually filter restricted thoughts. The README claims restricted thoughts are excluded from MCP search, but neither server/index.ts nor the Edge Function accepts exclude_restricted, and match_thoughts can only filter on metadata, not the new sensitivity_tier column. After following this recipe, restricted thoughts still show up in search results. The advertised privacy behavior isn't delivered.

This is the main blocker. Options:

  1. Add sensitivity_tier filtering to match_thoughts (would need a schema PR for the RPC change)
  2. Store sensitivity_tier in metadata JSONB instead of a top-level column, so existing metadata filters work
  3. Remove the MCP filtering claim from the README and document it as a capture-time classification only (honest about current capability)

P2: Non-idempotent constraint. ADD CONSTRAINT chk_sensitivity_tier fails on re-run. Wrap in DO $$ BEGIN ... EXCEPTION WHEN duplicate_object THEN NULL; END $$;

P2: Pattern gaps in restricted detection:

  • sk-proj-... (OpenAI dashed API keys) — regex only allows alphanumerics after prefix, misses modern dashed formats
  • 15-digit AmEx card numbers — pattern only matches 4x4 digit groups

The classification logic and escalation semantics are well-designed. These are fixable issues, not architectural problems.

Replace misleading claim that MCP tools filter restricted thoughts
with honest documentation: match_thoughts requires MCP-layer post-query
filtering, search_thoughts_text supports exclude_restricted at RPC level.

Wrap CHECK constraint in DO/EXCEPTION block for re-run idempotency.

Fix API key pattern to match dashed formats (sk-proj-xxx-xxx) and add
15-digit AmEx card number pattern.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added the recipe Contribution: step-by-step recipe label Apr 6, 2026
@alanshurafa
Copy link
Copy Markdown
Contributor Author

Review feedback from @matthallett1 addressed in commit 50f46b3:

  1. P1 — MCP filtering claim fixed: Replaced the misleading exclude_restricted=true guidance with honest docs. match_thoughts returns sensitivity_tier but requires MCP-layer post-query filtering. search_thoughts_text supports exclude_restricted natively. Added code examples for both approaches.
  2. P2 — Idempotent constraint: Wrapped ADD CONSTRAINT chk_sensitivity_tier in DO/EXCEPTION block to handle re-runs.
  3. P2 — Regex patterns: API key pattern now matches dashed formats (sk-proj-[A-Za-z0-9_-]+). Added 15-digit AmEx card pattern (3[47]xx-xxxxxx-xxxxx).

Ready for re-review.

@alanshurafa
Copy link
Copy Markdown
Contributor Author

Closing this PR as part of the OB1 Alpha Milestone consolidation. This feature is being rebuilt as one of 12 clean, gate-compliant PRs that together form the alpha upgrade path. The consolidated PRs will be submitted once verified on the fork. See the full plan for details.

@alanshurafa alanshurafa closed this Apr 6, 2026
@alanshurafa alanshurafa reopened this Apr 6, 2026
@alanshurafa
Copy link
Copy Markdown
Contributor Author

Closing as part of the OB1 Alpha Milestone consolidation. This feature is being rebuilt as one of 12 clean, gate-compliant PRs that together form the alpha upgrade path. The consolidated PRs will be submitted once verified on the fork.

@alanshurafa alanshurafa closed this Apr 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

recipe Contribution: step-by-step recipe

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants