Skip to content

Latest commit

 

History

History
175 lines (128 loc) · 9.64 KB

File metadata and controls

175 lines (128 loc) · 9.64 KB

GitExpose Detection Coverage

Last updated: v0.4

GitExpose detects credential exposure across 23 providers in 5 categories, plus supply-chain risk indicators specific to AI infrastructure. Each finding carries OWASP LLM Top 10 (attack_class) and MITRE ATLAS technique (atlas_technique) metadata.

Credential providers

LLM and AI providers

Provider Pattern Severity Source
OpenAI sk-…, sk-proj-…, sk-svcacct-… CRITICAL v0.1 + v0.2
Anthropic sk-ant-… CRITICAL v0.2
Google AI / Firebase AIzaSy… CRITICAL v0.1
Groq gsk_… CRITICAL v0.2
OpenRouter sk-or-… CRITICAL v0.2
xAI (Grok) xai-… CRITICAL v0.2
Cerebras csk-… CRITICAL v0.2
Hugging Face hf_… CRITICAL v0.2
Replicate r8_… CRITICAL v0.2
Perplexity pplx-… CRITICAL v0.2
ElevenLabs 32-hex (context-bound) CRITICAL v0.2
Voyage AI pa-… CRITICAL v0.3
Cohere co-… CRITICAL v0.3

RAG / Vector DB

Provider Pattern Severity Source
Pinecone pcsk_… CRITICAL v0.2

LLM observability

Provider Pattern Severity Source
LangSmith lsv2_pt_… and ls__… CRITICAL v0.2
Helicone sk-helicone-… HIGH v0.3
Portkey PORTKEY_API_KEY=… (context-bound) HIGH v0.3

LLM infrastructure

Provider Pattern Severity Source
Modal ak-… (token ID) + as-… (token secret) CRITICAL v0.3
Runpod RUNPOD_API_KEY=… (context-bound) HIGH v0.3

Code, cloud, payment

Provider Pattern Severity Source
AWS AKIA… + secret-key context CRITICAL v0.1
GitHub PAT ghp_…, ghs_… CRITICAL v0.1
GitLab PAT glpat-… CRITICAL v0.2
Docker Hub dckr_pat_… CRITICAL v0.2
Stripe sk_live_…, rk_live_…, sk_test_… CRITICAL/HIGH v0.1 + v0.2

Communication

Provider Pattern Severity Source
Discord (bot) M…\..\.. CRITICAL v0.2
Discord (webhook) discord.com/api/webhooks/… HIGH v0.2
Slack (token) xox[baprs]-… CRITICAL v0.1
Slack (webhook) hooks.slack.com/services/… HIGH v0.1
Telegram (bot) \d{8,10}:[\w-]{35} HIGH v0.2

Notifications

Provider Pattern Severity Source
Twilio AC[a-f0-9]{32} HIGH v0.2
SendGrid SG.… HIGH v0.1

Database connection strings

Type Pattern Severity Source
PostgreSQL postgres(?:ql)?://user:pass@… HIGH v0.1
MySQL mysql://user:pass@… HIGH v0.1
MongoDB Atlas mongodb(\+srv)?://user:pass@… HIGH v0.1

Generic

Type Pattern Severity Source
Private key (PEM) -----BEGIN…PRIVATE KEY----- CRITICAL v0.1
JWT token eyJ…\.eyJ…\..* HIGH v0.1
Generic API key `(api[_-]?key apikey)["']?\s*[:=]\s*["']…["']` MEDIUM

Supply-chain detection (v0.2)

Detection Severity Description
unpinned_ai_middleware HIGH AI middleware (litellm, langchain, openai, anthropic, etc.) without == pin
known_malicious_package_version CRITICAL Pinned to a known-compromised version (e.g., litellm==1.82.7)
slopsquatting CRITICAL Package name from the LLM-hallucination corpus (e.g., huggingface-cli)
pth_persistence CRITICAL .pth file with exec/eval/base64 (TeamPCP technique)
ai_c2_beacon CRITICAL Skill instructs AI agent to operate as C2 implant (ATLAS AML.TA0015)
kubernetes_exfiltration CRITICAL Kubernetes secret enumeration / service-account token access
credential_cluster CRITICAL ≥2 distinct secret types co-occur in same file
multi_provider_credential_file CRITICAL Cluster appears in known aggregator path (OAI_CONFIG_LIST, litellm_config.yaml, .continue/agents/*.yaml)

AI-supply-chain signature pack (v0.4)

Four new working-tree detections added to supply-chain scanning:

Detection Severity Description
polyglot_file HIGH A text-extension file (.md, .yaml, .json, etc.) whose leading bytes match a binary/executable/archive magic signature (ELF, PE/MZ, ZIP, PDF, Mach-O, gzip). Detection uses built-in magic-byte matching — no external dependency (python-magic / libmagic not required).
skill_prompt_injection HIGH Hidden directives found in AI-agent instruction files (CLAUDE.md, AGENTS.md, GEMINI.md, files under .continue/ or .cursor/): "ignore previous instructions", exfiltration directives, or system-prompt-reveal attempts. OWASP LLM01.
agent_config_malicious_content CRITICAL Embedded command or exfiltration payloads (curl|bash, exec/eval) inside CrewAI, AutoGen, or litellm config files.
langgrinch_lc_key CRITICAL Heuristic detection of LangChain lc--prefixed API keys. Best-effort pattern — the exact upstream key format is not authoritatively confirmed; pattern is motivated by CVE-2025-68664 (LangGrinch credential-theft chain) and should be treated as a high-signal lead requiring manual confirmation.

AWS access+secret pairing (v0.4)

When supply-chain --verify (or git-history --verify) detects both an aws_access_key and an aws_secret_key from the same source, GitExpose now pairs them and performs a live sts:GetCallerIdentity (SigV4) liveness check. Previously, AWS findings always surfaced as error at verification time because the secret component was unavailable. Pairing is applied automatically — no additional flags are required.

Live dependency SCA (v0.5)

supply-chain parses lock files and queries OSV.dev for live vulnerability intelligence (default on; --offline falls back to the curated list).

Lock-file format Ecosystem
requirements.txt (== pins), poetry.lock, Pipfile.lock PyPI
package-lock.json (v2/v3), yarn.lock (v1 + Berry) npm
Finding type Severity Description
vulnerable_dependency CVSS-mapped (CRITICAL/HIGH/MEDIUM/LOW) A resolved dependency matched a live OSV CVE/GHSA/MAL advisory. Carries vuln_id, fixed_version, direct, pinned, cred_co_present, known_exploited. Mapped to OWASP A06:2021 (Vulnerable & Outdated Components) / CICD-SEC-3; AI middleware additionally keeps AML.T0019.

Findings are ranked by exploitability context (credential-co-presence → known-exploited → direct → unpinned → fix-available → severity → CVSS), not raw CVSS. The CycloneDX 1.6 AI-BOM (-o cyclonedx) carries these as VEX entries; analysis.state is exploitable only when a co-present credential is --verify-confirmed live or OSV flags it known-exploited, else in_triage.

Empirical AI-tool config paths (v0.2)

GitExpose scans for these paths during URL/HTTP scans (where the path is exposed) and during local filesystem scans:

  • .continue/, .continue/agents/*.yaml, .continue/config.yaml
  • claude/.credentials.json
  • **/litellm*config*.{yaml,yml,md}
  • mcp.json, .cursor/mcp.json, **/@config.json.md
  • **/bin/Debug/**/appsettings*.json, **/bin/Release/**/appsettings*.json
  • drizzle.config.ts
  • agents.yaml, tasks.yaml, crew.yaml (CrewAI)
  • OAI_CONFIG_LIST (AutoGen)
  • **/.env.*.example, **/.env.bak, **/.env.*.bak
  • firebase-config.{js,ts}

Git history scanning (v0.4)

gitexpose git-history <path> scans all reachable commits (git log -p --all --reverse) for credentials that were committed and later removed — secrets that no longer appear in the working tree but remain accessible in repository history and may still be live.

Key behaviours:

  • Full credential matrix — the same 29-provider pattern set used by supply-chain applies to every diff hunk.
  • Deduplicated to earliest introduce — each distinct secret value is reported once, at the commit that first introduced it, to avoid alert noise from long-lived secrets touched by many commits.
  • Commit metadata — every finding carries the introducing commit SHA, author, and date.
  • Composes with --verify — pass --verify (plus the --verify* family flags) and historical secrets go through the same liveness-check path as working-tree findings. A typical result: "deleted 47 commits ago, confirmed live."
  • AWS pairing — the same access+secret pairing introduced in v0.4 applies here, so AWS keys found in history can also be verified.
  • Flags: -o/--output {console,json}, --out-file, --since <date>, --max-commits <n>, plus the full --verify family (--verify-only-severity, --verify-timeout, --verify-concurrency).

Verification status

Verification status (v0.4): Tier 1–2 providers (OpenAI, Anthropic, Groq, OpenRouter, Perplexity, xAI, Cerebras, Hugging Face, ElevenLabs, Pinecone, LangSmith, GitHub, GitLab, Docker Hub, Slack token, AWS) support --verify for live/dead status. AWS liveness checks now work reliably via access+secret pairing (v0.4). Tier 3 (Helicone, Portkey, Voyage, Cohere, Modal, Runpod) remain detection-only.

Compliance taxonomies

Every finding includes:

  • attack_class — OWASP LLM Top 10 ID (LLM05 Supply Chain, LLM06 Sensitive Info Disclosure, LLM08 Excessive Agency, etc.)
  • atlas_technique — MITRE ATLAS technique ID (e.g., AML.T0019, AML.TA0015)

These appear in JSON, SARIF (as taxonomy references), HTML (badges), CSV (columns), and console output.

The basis for new patterns and paths is public threat intelligence and real-world leak observations. No external service is queried at scan time.