Skip to content

fix(security): distinguish documentation examples from real secrets#1043

Merged
kantorcodes merged 2 commits into
mainfrom
capy/doc-secret-filtering
Jun 21, 2026
Merged

fix(security): distinguish documentation examples from real secrets#1043
kantorcodes merged 2 commits into
mainfrom
capy/doc-secret-filtering

Conversation

@capy-ai

@capy-ai capy-ai Bot commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

This PR improves the hardcoded secret detection logic to distinguish between real secrets and documentation examples, resolving 87 false positives in documentation-heavy repositories.

  • Add heuristic-based filtering to ignore placeholder patterns (your-api-token, sk-proj-xxxxx, etc.) across documentation, skill files, and test surfaces
  • Introduce entropy and sequence analysis to detect synthetic patterns like abcdefgh123456 in example files while preserving real secret detection
  • Refactor SECRET_PATTERNS into structured SecretPattern objects with kind classification (provider, generic, private_key) and value extraction
  • Enhance findings to include line_number for precise violation location
  • Add comprehensive test coverage for placeholder ignoring and real secret detection in documentation contexts

Open POINT-022 POINT-022

@capy-ai capy-ai Bot added the capy Generated by capy.ai label Jun 21, 2026
@capy-ai capy-ai Bot force-pushed the capy/doc-secret-filtering branch from d276ef4 to 140e601 Compare June 21, 2026 21:19
…positives in hardcoded secret检测

Co-authored-by: kantorcodes <6068672+kantorcodes@users.noreply.github.com>
@capy-ai capy-ai Bot force-pushed the capy/doc-secret-filtering branch from 140e601 to 5e795a6 Compare June 21, 2026 21:24
@greptile-apps

greptile-apps Bot commented Jun 21, 2026

Copy link
Copy Markdown

Greptile Summary

This PR refactors the hardcoded-secret scanner to reduce false positives on documentation-heavy repositories by introducing placeholder heuristics, entropy/sequence analysis for synthetic provider tokens, and a custom private-key block parser — all gated behind an _is_example_surface guard so normal source files are unaffected.

  • Adds SecretPattern dataclass, ~15 new helper functions, and associated constants to distinguish real secrets from documentation examples; the public check_no_hardcoded_secrets API is unchanged but now also reports line_number in each Finding.
  • Introduces _first_private_key_line to handle truncated/placeholder PEM blocks independently of the SECRET_PATTERNS list, but the private_key-kind entry in that list is now dead code.
  • Adds 10 new unit tests covering placeholder ignoring, real-secret detection in docs, synthetic-token filtering, and source-vs-docs surface distinctions.

Confidence Score: 5/5

Safe to merge — the filtering heuristics are gated behind the example-surface check, leaving normal source-file detection unchanged, and the new test suite validates both the suppression and detection paths.

The core detection logic for non-example surfaces is untouched. All new filtering paths require _is_example_surface to return True before any secret match can be suppressed, so regressions on ordinary source files are not possible.

src/codex_plugin_scanner/checks/security.py — dead private_key SecretPattern and file-size overage are worth addressing before the file grows further.

Important Files Changed

Filename Overview
src/codex_plugin_scanner/checks/security.py Adds placeholder-filtering heuristics, entropy/sequence analysis, and private-key scanning — correctly gated behind example-surface detection; contains dead SecretPattern code and exceeds the 500-line file-size limit at 732 lines.
tests/test_security.py Adds 10 new test cases covering placeholder ignoring, real-secret detection in docs, synthetic-token filtering, truncated private keys, and source-vs-docs surface distinctions; within the 500-line limit.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[check_no_hardcoded_secrets] --> B[scan files]
    B --> C[_first_hardcoded_secret_line]
    C --> D[_first_private_key_line\nPEM block scanner]
    C --> E[iterate SECRET_PATTERNS\nskip kind==private_key]
    E --> F[_should_skip_secret_match]
    F --> G{_is_example_surface?}
    G -- No --> H[report finding]
    G -- Yes --> I[_looks_like_placeholder_secret]
    I -- True --> J[skip / no finding]
    I -- False --> K{effective_kind?}
    K -- generic --> L[_looks_like_example_generic_secret\nexact set of 8 values]
    L -- True --> J
    L -- False --> H
    K -- provider --> M{_has_illustrative_context?}
    M -- No --> H
    M -- Yes --> N[synthetic or incomplete candidate?]
    N -- True --> J
    N -- False --> H
    D --> O{body_lines exist and not placeholder?}
    O -- Yes --> H
    O -- No --> J
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[check_no_hardcoded_secrets] --> B[scan files]
    B --> C[_first_hardcoded_secret_line]
    C --> D[_first_private_key_line\nPEM block scanner]
    C --> E[iterate SECRET_PATTERNS\nskip kind==private_key]
    E --> F[_should_skip_secret_match]
    F --> G{_is_example_surface?}
    G -- No --> H[report finding]
    G -- Yes --> I[_looks_like_placeholder_secret]
    I -- True --> J[skip / no finding]
    I -- False --> K{effective_kind?}
    K -- generic --> L[_looks_like_example_generic_secret\nexact set of 8 values]
    L -- True --> J
    L -- False --> H
    K -- provider --> M{_has_illustrative_context?}
    M -- No --> H
    M -- Yes --> N[synthetic or incomplete candidate?]
    N -- True --> J
    N -- False --> H
    D --> O{body_lines exist and not placeholder?}
    O -- Yes --> H
    O -- No --> J
Loading

Reviews (2): Last reviewed commit: "Tighten example-secret suppression heuri..." | Re-trigger Greptile

Comment thread src/codex_plugin_scanner/checks/security.py Outdated
Comment thread src/codex_plugin_scanner/checks/security.py Outdated
Comment thread src/codex_plugin_scanner/checks/security.py Outdated
Comment thread src/codex_plugin_scanner/checks/security.py
Co-authored-by: kantorcodes <6068672+kantorcodes@users.noreply.github.com>
@kantorcodes kantorcodes merged commit 4cf19a1 into main Jun 21, 2026
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

capy Generated by capy.ai

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant