Skip to content

Restore provider-key and private-key detector coverage in the new sensitive runtime #70

@GuthL

Description

@GuthL

Summary

The new main sensitive-data engine dropped the old bundled provider-key / private-key detector coverage from master.

master used src/gitleaks_rules.rs plus gitleaks.toml to detect provider-specific credentials and PEM/private-key material before placeholdering. origin/main replaces that with src/sensitive.rs, but the new engine currently only has first-class kinds for OpaqueToken, Password, Email, Phone, NationalId, Passport, PaymentCard, and Cvv, and its generic token path is just high-entropy detection.

That means classic structured-but-not-necessarily-high-entropy secrets from the old runtime are no longer first-class matches in the new codebase.

Evidence

  • Old runtime:
    • src/gitleaks_rules.rs
    • gitleaks.toml
    • Examples already covered there include AWS access key IDs and PEM private-key blocks.
  • New runtime:
    • src/sensitive.rs
    • detect_opaque_tokens() only wraps high-entropy token detection.
    • No equivalent provider-key / private-key detector set exists in the new engine.

Why this matters

This narrows practical coverage for exactly the kinds of secrets KeyClaw is supposed to intercept in AI prompts:

  • AWS access key IDs like AKIA...
  • provider-prefixed tokens that are structured rather than purely high-entropy
  • PEM / OpenSSH / age private-key material

Those were major strengths of the old project and are worth extracting into the new architecture rather than leaving behind.

Proposed extraction

Port a curated subset of the old master detector corpus into the new sensitive.rs engine while keeping the new format-preserving placeholder design and session-scoped store.

Minimum bar:

  • Restore explicit detection for common cloud/provider credentials
  • Restore explicit detection for private-key / PEM material
  • Add regression tests showing the new runtime rewrites those cases again

Suggested regression tests

  • AWS access key ID in a chat message is rewritten even when entropy alone would not catch it
  • PEM private key block is rewritten
  • A provider-specific prefixed token is rewritten without relying only on entropy

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions