[recipes] Sensitivity tiers for content access control#110
[recipes] Sensitivity tiers for content access control#110alanshurafa wants to merge 4 commits intoNateBJones-Projects:mainfrom
Conversation
Three-tier classification (restricted/personal/standard) using regex pattern matching. Detects SSNs, API keys, passwords, credit cards, medications, health data, and financial details. Includes migration SQL, reusable TypeScript detection function, and pattern config. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
justfinethanku
left a comment
There was a problem hiding this comment.
Code Review for Sensitivity Tiers Primitive
Overall Assessment
This is a well-executed contribution that adds real value to the Open Brain ecosystem. The primitive introduces a clean, dependency-free content classification system using pure regex pattern matching.
What's Good
1. Solid technical approach
- Pure TypeScript with no external dependencies keeps the solution lightweight
- Short-circuit logic for restricted patterns (performance optimization)
- Collects all personal-tier matches (helpful for debugging/audit trails)
- Customizable patterns via JSON configuration (maintainable and extensible)
2. Complete implementation
- Migration SQL with proper index and check constraint
- Reusable detection function with clear type definitions
- Well-documented README with integration examples
- Proper metadata.json following repo standards
3. Strong documentation
- Clear explanation of the three-tier system
- Multiple integration examples (Deno, Node.js)
- Backfill guidance for existing data
- Troubleshooting section with common issues
4. Safety-conscious
- No credentials or secrets
- No destructive SQL operations
- Proper use of
IF NOT EXISTSand check constraints - Handles existing columns gracefully
Required Changes
1. Primitives category validation (blocking)
Per CONTRIBUTING.md section "Extensions vs Primitives vs Recipes":
Primitives are reusable concept guides that get referenced by multiple extensions. They teach a pattern... once, so extensions can link to them instead of re-explaining. A primitive should be referenced by at least 2 extensions.
This is currently a standalone concept with no existing extensions referencing it. You have two paths forward:
Option A (Recommended): Move this to category
- Change folder location to
- Update category field to
- Update PR title to
- This makes it immediately available to the community while allowing future promotion to primitive if 2+ extensions adopt it
Option B: Keep as primitive but defer merge until referenced by extensions
- Open an issue proposing which extensions should adopt this pattern
- Submit those extension updates to reference this primitive
- Merge this PR after 2+ extension references exist
2. README: Add "Extensions That Use This" section (blocking if staying as primitive)
Per CONTRIBUTING.md primitive requirements:
Primitives additionally require: "Extensions That Use This" section listing which extensions reference this primitive
Since no extensions currently use this, Option A (moving to recipes) is cleaner.
Suggested Improvements (non-blocking)
1. Pattern accuracy considerations
The regex patterns are good starting points but may need refinement:
- SSN pattern () will match dates like "2024-01-15" or invoice numbers
- Credit card pattern doesn't validate Luhn checksum (will flag random 16-digit sequences)
- Bank account pattern is very broad and may catch non-sensitive number references
Consider adding a comment in acknowledging the tradeoff between false positives and false negatives, with guidance for users to tune patterns based on their content.
2. TypeScript export structure
The current export is clean, but consider adding a convenience function:
export function detectSensitivitySimple(text: string): SensitivityTier {
const patterns = compilePatterns(patternsJson);
return detectSensitivity(text, patterns).tier;
}This would simplify the most common use case (just need the tier, not the reasons).
3. Migration safety note
Add a note in migration.sql about the performance impact of backfilling large datasets:
-- NOTE: For large existing datasets (100K+ thoughts), consider running
-- the backfill in batches with LIMIT/OFFSET to avoid long-running transactions4. Consider a pattern test suite
A JSON file with test cases would help users validate custom patterns:
{
"restricted_tests": [
{"input": "My SSN is 123-45-6789", "should_match": true},
{"input": "Invoice #123-45-6789", "should_match": false}
]
}Automated Review Status
All 14 automated checks should pass once the category is adjusted. The word count (155 words in main sections, 310 total) exceeds the 200-word requirement for primitives.
Verdict
Verdict: Minor fixes needed
Primary blocker: Primitives must be referenced by 2+ extensions per repo standards. Recommend moving to category (simple folder rename + metadata update) to unblock immediate merge while keeping future promotion path open.
The implementation quality is excellent. Once the category is corrected, this is ready to merge.
Action Items:
- Move to folder
- Update : change to
- Update PR title to
- (Optional) Consider adding test cases and the suggested improvements above
Looking forward to seeing this contribution land! The pattern matching approach is clean and the documentation is thorough.
justfinethanku
left a comment
There was a problem hiding this comment.
Code Review for Sensitivity Tiers Primitive
Overall Assessment
This is a well-executed contribution that adds real value to the Open Brain ecosystem. The primitive introduces a clean, dependency-free content classification system using pure regex pattern matching.
What's Good
1. Solid technical approach
- Pure TypeScript with no external dependencies keeps the solution lightweight
- Short-circuit logic for restricted patterns (performance optimization)
- Collects all personal-tier matches (helpful for debugging/audit trails)
- Customizable patterns via JSON configuration (maintainable and extensible)
2. Complete implementation
- Migration SQL with proper index and check constraint
- Reusable detection function with clear type definitions
- Well-documented README with integration examples
- Proper metadata.json following repo standards
3. Strong documentation
- Clear explanation of the three-tier system
- Multiple integration examples (Deno, Node.js)
- Backfill guidance for existing data
- Troubleshooting section with common issues
4. Safety-conscious
- No credentials or secrets
- No destructive SQL operations
- Proper use of IF NOT EXISTS and check constraints
- Handles existing columns gracefully
Required Changes
1. Primitives category validation (blocking)
Per CONTRIBUTING.md section "Extensions vs Primitives vs Recipes":
Primitives are reusable concept guides that get referenced by multiple extensions. They teach a pattern... once, so extensions can link to them instead of re-explaining. A primitive should be referenced by at least 2 extensions.
This is currently a standalone concept with no existing extensions referencing it. You have two paths forward:
Option A (Recommended): Move this to recipes/ category
- Change folder location to recipes/sensitivity-tiers/
- Update metadata.json category field to "recipes"
- Update PR title to [recipes] Sensitivity tiers for content access control
- This makes it immediately available to the community while allowing future promotion to primitive if 2+ extensions adopt it
Option B: Keep as primitive but defer merge until referenced by extensions
- Open an issue proposing which extensions should adopt this pattern
- Submit those extension updates to reference this primitive
- Merge this PR after 2+ extension references exist
2. README: Add "Extensions That Use This" section (blocking if staying as primitive)
Per CONTRIBUTING.md primitive requirements:
Primitives additionally require: "Extensions That Use This" section listing which extensions reference this primitive
Since no extensions currently use this, Option A (moving to recipes) is cleaner.
Suggested Improvements (non-blocking)
1. Pattern accuracy considerations
The regex patterns are good starting points but may need refinement:
- SSN pattern will match dates like "2024-01-15" or invoice numbers
- Credit card pattern doesn't validate Luhn checksum (will flag random 16-digit sequences)
- Bank account pattern is very broad and may catch non-sensitive number references
Consider adding a comment in sensitivity-patterns.json acknowledging the tradeoff between false positives and false negatives, with guidance for users to tune patterns based on their content.
2. TypeScript export structure
The current export is clean, but consider adding a convenience function for the most common use case (just need the tier, not the reasons).
3. Migration safety note
Add a note in migration.sql about the performance impact of backfilling large datasets - consider running backfill in batches with LIMIT/OFFSET to avoid long-running transactions for 100K+ thoughts.
4. Consider a pattern test suite
A JSON file with test cases would help users validate custom patterns and understand false positive scenarios.
Automated Review Status
All 14 automated checks should pass once the category is adjusted. The word count (155 words in main sections, 310 total) exceeds the 200-word requirement for primitives.
Verdict
Verdict: Minor fixes needed
Primary blocker: Primitives must be referenced by 2+ extensions per repo standards. Recommend moving to recipes/ category (simple folder rename + metadata update) to unblock immediate merge while keeping future promotion path open.
The implementation quality is excellent. Once the category is corrected, this is ready to merge.
Action Items:
- Move to recipes/sensitivity-tiers/ folder
- Update metadata.json: change "category": "primitives" to "category": "recipes"
- Update PR title to [recipes] Sensitivity tiers for content access control
- (Optional) Consider adding test cases and the suggested improvements above
Looking forward to seeing this contribution land! The pattern matching approach is clean and the documentation is thorough.
Code Review for Sensitivity Tiers PrimitiveOverall AssessmentThis is a well-executed contribution that adds real value to the Open Brain ecosystem. The primitive introduces a clean, dependency-free content classification system using pure regex pattern matching. What's Good1. Solid technical approach
2. Complete implementation
3. Strong documentation
4. Safety-conscious
Required Changes1. Primitives category validation (blocking) Per CONTRIBUTING.md section "Extensions vs Primitives vs Recipes":
This is currently a standalone concept with no existing extensions referencing it. You have two paths forward: Option A (Recommended): Move this to recipes/ category
Option B: Keep as primitive but defer merge until referenced by extensions
2. README: Add "Extensions That Use This" section (blocking if staying as primitive) Per CONTRIBUTING.md primitive requirements:
Since no extensions currently use this, Option A (moving to recipes) is cleaner. Suggested Improvements (non-blocking)1. Pattern accuracy considerations The regex patterns are good starting points but may need refinement:
Consider adding a comment in sensitivity-patterns.json acknowledging the tradeoff between false positives and false negatives, with guidance for users to tune patterns based on their content. 2. TypeScript export structure The current export is clean, but consider adding a convenience function for the most common use case (just need the tier, not the reasons). 3. Migration safety note Add a note in migration.sql about the performance impact of backfilling large datasets - consider running backfill in batches with LIMIT/OFFSET to avoid long-running transactions for 100K+ thoughts. 4. Consider a pattern test suite A JSON file with test cases would help users validate custom patterns and understand false positive scenarios. Automated Review StatusAll 14 automated checks should pass once the category is adjusted. The word count (155 words in main sections, 310 total) exceeds the 200-word requirement for primitives. VerdictVerdict: Minor fixes needed Primary blocker: Primitives must be referenced by 2+ extensions per repo standards. Recommend moving to recipes/ category (simple folder rename + metadata update) to unblock immediate merge while keeping future promotion path open. The implementation quality is excellent. Once the category is corrected, this is ready to merge. Action Items:
Looking forward to seeing this contribution land! The pattern matching approach is clean and the documentation is thorough. |
Primitives require 2+ extension references per CONTRIBUTING.md. Moving to recipes per maintainer review feedback. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
resolveSensitivityTier() ensures overrides can only escalate (standard → personal → restricted), never downgrade. Unrecognized values normalize to "personal" as a safe default. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@claude review this PR |
|
I'll analyze this and get back to you. |
|
@claude review this PR |
|
I'll analyze this and get back to you. |
|
@claude review this PR |
|
Claude finished @matthallett1's task in 1m 28s —— View job PR Review: [recipes] Sensitivity tiers for content access control
ContextThe previous review by @justfinethanku flagged a blocking issue (primitives category requiring 2+ extension references). That's been resolved — the contribution is now correctly placed in AlignmentFits OB1's mission well. This directly extends the ClarityThe README is clear and well-structured. The step ordering is logical: migrate → integrate → backfill → query. Code examples cover both Deno and Node.js. The "No credentials needed" callout is a nice touch that reduces cognitive overhead for the reader. One gap: Step 2 introduces CompletenessAll required README sections present: What it does ✅, Prerequisites ✅, Step-by-step instructions ✅, Expected Outcome ✅, Troubleshooting ✅. Safety
One Real Issue: Migration is Not Fully Idempotent
The troubleshooting section addresses the column case but not the constraint case. This will bite users who run the migration twice or restore from a backup that already has the column. Suggested fix: DO $$ BEGIN
ALTER TABLE thoughts
ADD CONSTRAINT chk_sensitivity_tier
CHECK (sensitivity_tier IN ('standard', 'personal', 'restricted'));
EXCEPTION
WHEN duplicate_object THEN NULL;
END $$;Or simply add a troubleshooting entry: Fix this → Non-Blocking ObservationsSSN false positives (flagged in prior review, still present): Backfill batching: The example backfill loop fetches all thoughts in one query with no LIMIT. For datasets over ~10K thoughts this could cause timeout or memory issues. A note (not a code rewrite) in Step 3 recommending batch processing for large datasets would be useful.
ValueThis is genuinely useful and different from anything else in the repo. Sensitivity classification is a real gap — without it, all thoughts are treated equally regardless of whether they contain credentials or health data. The escalation-only semantics and the separation of pattern config from logic are good design decisions. VerdictApprove with one requested fix: Make the constraint addition idempotent (either via DO block or adding a troubleshooting note for the constraint duplicate error). Everything else is solid. The quality of this implementation and documentation is well above the bar for a recipe contribution. |
matthallett1
left a comment
There was a problem hiding this comment.
The concept and documentation quality are strong — sensitivity tiers fill a real gap. But Codex review (gpt-5.4) found a critical issue and several pattern gaps:
P1: MCP retrieval doesn't actually filter restricted thoughts. The README claims restricted thoughts are excluded from MCP search, but neither server/index.ts nor the Edge Function accepts exclude_restricted, and match_thoughts can only filter on metadata, not the new sensitivity_tier column. After following this recipe, restricted thoughts still show up in search results. The advertised privacy behavior isn't delivered.
This is the main blocker. Options:
- Add
sensitivity_tierfiltering tomatch_thoughts(would need a schema PR for the RPC change) - Store
sensitivity_tierinmetadataJSONB instead of a top-level column, so existingmetadatafilters work - Remove the MCP filtering claim from the README and document it as a capture-time classification only (honest about current capability)
P2: Non-idempotent constraint. ADD CONSTRAINT chk_sensitivity_tier fails on re-run. Wrap in DO $$ BEGIN ... EXCEPTION WHEN duplicate_object THEN NULL; END $$;
P2: Pattern gaps in restricted detection:
sk-proj-...(OpenAI dashed API keys) — regex only allows alphanumerics after prefix, misses modern dashed formats- 15-digit AmEx card numbers — pattern only matches 4x4 digit groups
The classification logic and escalation semantics are well-designed. These are fixable issues, not architectural problems.
Replace misleading claim that MCP tools filter restricted thoughts with honest documentation: match_thoughts requires MCP-layer post-query filtering, search_thoughts_text supports exclude_restricted at RPC level. Wrap CHECK constraint in DO/EXCEPTION block for re-run idempotency. Fix API key pattern to match dashed formats (sk-proj-xxx-xxx) and add 15-digit AmEx card number pattern. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Review feedback from @matthallett1 addressed in commit 50f46b3:
Ready for re-review. |
|
Closing this PR as part of the OB1 Alpha Milestone consolidation. This feature is being rebuilt as one of 12 clean, gate-compliant PRs that together form the alpha upgrade path. The consolidated PRs will be submitted once verified on the fork. See the full plan for details. |
|
Closing as part of the OB1 Alpha Milestone consolidation. This feature is being rebuilt as one of 12 clean, gate-compliant PRs that together form the alpha upgrade path. The consolidated PRs will be submitted once verified on the fork. |

Summary
restricted,personal,standardALTER TABLE thoughts ADD COLUMN sensitivity_tier), reusable TypeScript detection function, and JSON pattern configsensitivity-patterns.jsonFiles
sensitivity-patterns.jsondetect-sensitivity.tsdetectSensitivity(text, patterns) → { tier, reasons }migration.sqlsensitivity_tiercolumn with index and check constraintmetadata.jsonREADME.mdTest plan
detect-sensitivity.tsand test against sample content🤖 Generated with Claude Code