# Secrets Detection Patterns ## Usage These patterns are used by: - `pre-edit-secrets.mjs` hook — blocks Write/Edit operations containing secrets before they reach disk - `skill-scanner-agent` — flags skills and commands that hardcode or expose secrets Patterns are JavaScript-compatible regex strings. Apply with the `g` (global) and `i` (case-insensitive) flags unless noted otherwise. --- ## Pattern Format Each pattern includes: - `id`: Unique identifier for logging and suppression - `regex`: JavaScript-compatible regex (string form, apply with `new RegExp(...)`) - `description`: What it detects - `severity`: `critical` / `high` / `medium` / `low` - `false_positive_notes`: When this pattern might false-match --- ## Patterns ### 1. AWS #### AWS Access Key ID - **ID:** `aws-access-key-id` - **Regex:** `\bAKIA[0-9A-Z]{16}\b` - **Description:** AWS Access Key ID. Always starts with `AKIA` followed by 16 uppercase alphanumeric characters. - **Severity:** critical - **False Positive Notes:** None — this prefix+length combination is highly specific to AWS. No known false positives in practice. #### AWS Secret Access Key - **ID:** `aws-secret-access-key` - **Regex:** `(?i)aws[_\-\s.]*secret[_\-\s.]*(?:access[_\-\s.]*)?key["'\s]*[:=]["'\s]*([A-Za-z0-9/+]{40})` - **Description:** AWS Secret Access Key — 40-character base64 string following a label like `aws_secret_key`, `AWS_SECRET_ACCESS_KEY`, etc. - **Severity:** critical - **False Positive Notes:** Generic 40-char base64 strings can appear in other contexts. Require the `aws` + `secret` label context. #### AWS Session Token - **ID:** `aws-session-token` - **Regex:** `(?i)aws[_\-\s.]*session[_\-\s.]*token["'\s]*[:=]["'\s]*([A-Za-z0-9/+=]{100,})` - **Description:** Temporary AWS session token (STS). Much longer than access keys — typically 200-400 characters. - **Severity:** critical - **False Positive Notes:** Long base64 blobs in unrelated contexts (e.g., test fixtures, encoded images). Require the `session_token` label. --- ### 2. Azure #### Azure Storage Account Key - **ID:** `azure-storage-key` - **Regex:** `(?i)AccountKey=([A-Za-z0-9+/]{86}==)` - **Description:** Azure Storage Account key embedded in a connection string. Always exactly 88 characters ending in `==`. - **Severity:** critical - **False Positive Notes:** None — the `AccountKey=` prefix plus exact length is highly specific. #### Azure Storage Connection String - **ID:** `azure-storage-connstr` - **Regex:** `DefaultEndpointsProtocol=https?;AccountName=[^;]+;AccountKey=[A-Za-z0-9+/]{86}==` - **Description:** Full Azure Storage connection string including account name and key. - **Severity:** critical - **False Positive Notes:** None. #### Azure SAS Token - **ID:** `azure-sas-token` - **Regex:** `(?i)(?:sv|sig|se|sp|spr|srt)=[A-Za-z0-9%+/=&]{10,}(?:&(?:sv|sig|se|sp|spr|srt)=[A-Za-z0-9%+/=&]{1,}){3,}` - **Description:** Azure Shared Access Signature (SAS) token — URL query string containing multiple SAS parameters. - **Severity:** high - **False Positive Notes:** URL-encoded query strings with similar parameter names. Require at least 4 distinct SAS parameters (`sv`, `sig`, `se`, `sp`). #### Azure Client Secret - **ID:** `azure-client-secret` - **Regex:** `(?i)client[_\-]?secret["'\s]*[:=]["'\s]*([A-Za-z0-9~._\-]{34,40})` - **Description:** Azure AD / Entra ID application client secret — 34-40 character alphanumeric string. - **Severity:** critical - **False Positive Notes:** Generic password fields with similar length. Always flag and require human review. #### Azure Service Bus Connection String - **ID:** `azure-servicebus-connstr` - **Regex:** `Endpoint=sb://[^;]+;SharedAccessKeyName=[^;]+;SharedAccessKey=[A-Za-z0-9+/=]{43}=` - **Description:** Azure Service Bus connection string with shared access key. - **Severity:** critical - **False Positive Notes:** None — format is highly specific. --- ### 3. Google Cloud Platform #### GCP API Key - **ID:** `gcp-api-key` - **Regex:** `\bAIza[0-9A-Za-z_\-]{35}\b` - **Description:** Google Cloud / Firebase API key. Always starts with `AIza` followed by 35 alphanumeric characters. - **Severity:** high - **False Positive Notes:** None — prefix is specific. Note: GCP API keys have varying scopes; some are safe to expose (browser-restricted keys), but flag all for review. #### GCP Service Account JSON Marker - **ID:** `gcp-service-account-json` - **Regex:** `"type"\s*:\s*"service_account"` - **Description:** Google Cloud service account JSON credential file marker. The presence of this key indicates a full service account credential object. - **Severity:** critical - **False Positive Notes:** Only matches within JSON credential blobs. If found alongside `private_key`, treat as confirmed credential leak. --- ### 4. GitHub #### GitHub Personal Access Token (Classic) - **ID:** `github-pat-classic` - **Regex:** `\bghp_[A-Za-z0-9]{36}\b` - **Description:** GitHub classic personal access token (PAT). Prefix `ghp_` followed by exactly 36 alphanumeric characters. - **Severity:** critical - **False Positive Notes:** None — prefix is specific to GitHub. #### GitHub Fine-Grained Personal Access Token - **ID:** `github-pat-fine-grained` - **Regex:** `\bgithub_pat_[A-Za-z0-9_]{82}\b` - **Description:** GitHub fine-grained PAT introduced in 2022. Longer and more structured than classic PATs. - **Severity:** critical - **False Positive Notes:** None. #### GitHub OAuth Token - **ID:** `github-oauth-token` - **Regex:** `\bgho_[A-Za-z0-9]{36}\b` - **Description:** GitHub OAuth access token issued via OAuth app flow. - **Severity:** critical - **False Positive Notes:** None. #### GitHub Actions / Server Token - **ID:** `github-server-token` - **Regex:** `\bghs_[A-Za-z0-9]{36}\b` - **Description:** GitHub Apps installation token or Actions runner token. - **Severity:** high - **False Positive Notes:** None. --- ### 5. npm #### npm Automation / Publish Token - **ID:** `npm-token` - **Regex:** `\bnpm_[A-Za-z0-9]{36}\b` - **Description:** npm registry automation or publish token. Prefix `npm_` followed by 36 alphanumeric characters. - **Severity:** critical - **False Positive Notes:** None — prefix is specific to npm tokens issued after 2021. Older tokens in `.npmrc` are caught by the legacy pattern below. #### npm Legacy Auth Token (.npmrc) - **ID:** `npm-legacy-auth` - **Regex:** `//registry\.npmjs\.org/:_authToken\s*=\s*([a-f0-9\-]{36,})` - **Description:** Legacy npm authentication token in `.npmrc` format. - **Severity:** critical - **False Positive Notes:** None. --- ### 6. Generic API Keys and Authorization Headers #### Bearer Token in Authorization Header - **ID:** `bearer-token` - **Regex:** `(?i)Authorization\s*[:=]\s*["']?Bearer\s+([A-Za-z0-9\-._~+/]+=*)\b` - **Description:** HTTP Authorization header with Bearer scheme. Common in hardcoded fetch/axios calls. - **Severity:** high - **False Positive Notes:** High false positive rate when the value is a variable reference like `Bearer ${token}` or `Bearer `. Skip matches containing `$`, `<`, `>`, or `{`. #### Generic `api_key` / `api-key` Assignment - **ID:** `generic-api-key` - **Regex:** `(?i)\bapi[_\-]?key\s*[:=]\s*["']([A-Za-z0-9\-._]{16,64})["']` - **Description:** Generic API key assignment in config files, source code, or environment exports. - **Severity:** high - **False Positive Notes:** Placeholder values like `your-api-key-here`, ``, `REPLACE_ME`, `xxx...`. Skip matches where the value is all-same-character or contains angle brackets. #### OpenAI API Key (Legacy Format) - **ID:** `openai-api-key-legacy` - **Regex:** `\bsk-[A-Za-z0-9]{20}T3BlbkFJ[A-Za-z0-9]{20}\b` - **Description:** OpenAI API key in the legacy format. The substring `T3BlbkFJ` is base64 for `OpenAI`. - **Severity:** critical - **False Positive Notes:** None for the legacy format. #### OpenAI Project-Scoped Key - **ID:** `openai-project-key` - **Regex:** `\bsk-proj-[A-Za-z0-9\-_]{40,}\b` - **Description:** OpenAI project-scoped API key introduced in 2024. - **Severity:** critical - **False Positive Notes:** None. #### Anthropic API Key - **ID:** `anthropic-api-key` - **Regex:** `\bsk-ant-api03-[A-Za-z0-9\-_]{93}\b` - **Description:** Anthropic Claude API key. - **Severity:** critical - **False Positive Notes:** None — prefix plus exact length is highly specific. --- ### 7. Private Keys (PEM Format) PEM header patterns detect private key material. The regex patterns below use escaped hyphens so they match the literal PEM markers in files at scan time. #### RSA Private Key Header - **ID:** `rsa-private-key` - **Regex:** `-{5}BEGIN RSA PRIVATE KEY-{5}` - **Description:** PEM-encoded RSA private key. The header alone is sufficient to flag — do not require the full key body. - **Severity:** critical - **False Positive Notes:** Test fixtures and documentation examples sometimes include truncated PEM blocks. Flag regardless — a truncated key in committed code still indicates a process failure. #### EC / DSA / OpenSSH Private Key Header - **ID:** `ec-private-key` - **Regex:** `-{5}BEGIN (?:EC|DSA|OPENSSH|ENCRYPTED) PRIVATE KEY-{5}` - **Description:** PEM-encoded elliptic curve, DSA, or OpenSSH private key. - **Severity:** critical - **False Positive Notes:** Same as RSA — flag all occurrences. #### PKCS#8 Private Key Header - **ID:** `pkcs8-private-key` - **Regex:** `-{5}BEGIN PRIVATE KEY-{5}` - **Description:** PKCS#8 encoded private key (format-agnostic, covers RSA, EC, etc.). - **Severity:** critical - **False Positive Notes:** None. **Implementation note for `pre-edit-secrets.mjs`:** Build these regexes at runtime using `new RegExp('-{5}BEGIN RSA PRIVATE KEY-{5}')` rather than as regex literals, so the hook script itself is not flagged by secret scanners. --- ### 8. Database Connection Strings #### PostgreSQL Connection String - **ID:** `postgres-connstr` - **Regex:** `postgres(?:ql)?://[^:]+:[^@]+@[^\s'"]+` - **Description:** PostgreSQL connection URL with embedded credentials in the format `postgresql://user:password@host/db`. - **Severity:** critical - **False Positive Notes:** Matches any non-empty password portion. Skip if password segment is `${...}`, ``, or `*`. #### MongoDB Connection String - **ID:** `mongodb-connstr` - **Regex:** `mongodb(?:\+srv)?://[^:]+:[^@]+@[^\s'"]+` - **Description:** MongoDB Atlas or local connection string with embedded username and password. - **Severity:** critical - **False Positive Notes:** Same exclusions as PostgreSQL. #### MySQL / MariaDB Connection String - **ID:** `mysql-connstr` - **Regex:** `mysql(?:2)?://[^:]+:[^@]+@[^\s'"]+` - **Description:** MySQL or MariaDB connection URL with credentials. - **Severity:** critical - **False Positive Notes:** Same exclusions as PostgreSQL. #### Redis Connection String with Password - **ID:** `redis-connstr` - **Regex:** `redis://:[^@]+@[^\s'"]+` - **Description:** Redis connection URL with password in the format `redis://:password@host`. - **Severity:** high - **False Positive Notes:** Passwordless Redis (`redis://host:6379`) does not match this pattern. #### Generic JDBC Connection String with Password - **ID:** `jdbc-connstr` - **Regex:** `(?i)jdbc:[a-z]+://[^\s"']+;[Pp]assword=[^;\s"']+` - **Description:** Java JDBC connection string with a `Password=` parameter. - **Severity:** critical - **False Positive Notes:** None if `Password=` is present with a non-empty value. --- ### 9. Passwords in Configuration #### `password` Assignment - **ID:** `config-password` - **Regex:** `(?i)(?:^|[\s,;{(])\bpass(?:word|wd)?\s*[:=]\s*["']([^"'$<>{}\s]{6,})["']` - **Description:** Password assignment in config files (YAML, TOML, JSON, .env, INI). Matches `password: "secret"`, `passwd=hunter2`, etc. - **Severity:** high - **False Positive Notes:** High false positive rate in documentation and test fixtures. Skip if value matches common placeholders: `your-password`, `changeme`, `example`, `test`, `placeholder`, `<...>`, `***`, `xxx`. #### `secret` Key Assignment - **ID:** `config-secret` - **Regex:** `(?i)(?:^|[\s,;{(])\bsecret\b\s*[:=]\s*["']([^"'$<>{}\s]{8,})["']` - **Description:** Generic `secret` key assignment in config or environment files. Django `SECRET_KEY` with a real value is a valid finding. - **Severity:** high - **False Positive Notes:** Same exclusions as `config-password`. #### Sensitive Environment Variable Assignment - **ID:** `dotenv-secret` - **Regex:** `(?i)^(?:export\s+)?[A-Z][A-Z0-9_]*(?:SECRET|KEY|TOKEN|PASSWORD|PASSWD|CREDENTIAL|AUTH)[A-Z0-9_]*\s*=\s*(?!["']?\s*["']?)([A-Za-z0-9+/=\-_.@!#%^&*]{8,})` - **Description:** Environment variable with a security-sensitive name (contains SECRET, KEY, TOKEN, PASSWORD, etc.) assigned a non-empty literal value. Matches `.env` file lines. - **Severity:** high - **False Positive Notes:** Variables pointing to file paths (e.g., `KEY_FILE=/etc/ssl/key.pem`) or URLs without credentials. Skip values that are all-uppercase (likely a variable reference like `${DATABASE_URL}`). --- ### 10. JWT Tokens #### JWT Pattern - **ID:** `jwt-token` - **Regex:** `\beyJ[A-Za-z0-9\-_]+\.[A-Za-z0-9\-_]+\.[A-Za-z0-9\-_]+\b` - **Description:** JSON Web Token in its three-part base64url format (`header.payload.signature`). The header always starts with `eyJ` (base64url encoding of `{"`). - **Severity:** medium - **False Positive Notes:** **High false positive rate.** JWTs are frequently used in tests, documentation, and mock data. Many JWTs are intentionally short-lived or scope-limited. Flag for human review rather than hard-blocking. Skip matches in files under `tests/`, `fixtures/`, `__mocks__/`, `*.test.*`, `*.spec.*`. Escalate to `critical` only if the payload segment decodes to contain an `exp` claim more than one year in the future. --- ## False Positive Suppression Rules Apply these globally before reporting any match: 1. **Placeholder values** — Skip if the matched value contains: `your-`, `<`, `>`, `example`, `placeholder`, `replace`, `changeme`, `xxx`, `***`, `TODO`, `FIXME` 2. **Variable references** — Skip if the matched value contains: `${`, `$(`, `%{`, `ENV[`, `os.environ` 3. **Test files** — Lower severity by one level for matches in: `*.test.ts`, `*.spec.js`, `fixtures/`, `__mocks__/`, `testdata/` 4. **Documentation** — Lower severity for matches in: `*.md`, `*.txt`, `docs/`, `README*` — but never suppress `critical` patterns (PEM key headers, real AWS Access Key IDs) 5. **All-same-character values** — Skip if the value is a repetition of a single character (e.g., `xxxxxxxx`, `00000000`) 6. **Short values** — Skip generic patterns if the matched secret value is fewer than 8 characters --- ## Implementation Notes for `pre-edit-secrets.mjs` ```js // Build PEM patterns at runtime to avoid triggering hook self-detection: const PEM_RSA = new RegExp('-{5}BEGIN RSA PRIVATE KEY-{5}'); const PEM_GENERIC = new RegExp('-{5}BEGIN (?:EC|DSA|OPENSSH|ENCRYPTED) PRIVATE KEY-{5}'); const PEM_PKCS8 = new RegExp('-{5}BEGIN PRIVATE KEY-{5}'); const CRITICAL_PATTERNS = [ { id: 'aws-access-key-id', regex: /\bAKIA[0-9A-Z]{16}\b/g }, { id: 'github-pat-classic', regex: /\bghp_[A-Za-z0-9]{36}\b/g }, { id: 'github-pat-fine', regex: /\bgithub_pat_[A-Za-z0-9_]{82}\b/g }, { id: 'npm-token', regex: /\bnpm_[A-Za-z0-9]{36}\b/g }, { id: 'openai-project-key', regex: /\bsk-proj-[A-Za-z0-9\-_]{40,}\b/g }, { id: 'anthropic-api-key', regex: /\bsk-ant-api03-[A-Za-z0-9\-_]{93}\b/g }, { id: 'rsa-private-key', regex: PEM_RSA }, { id: 'ec-private-key', regex: PEM_GENERIC }, { id: 'pkcs8-private-key', regex: PEM_PKCS8 }, ]; // Hard-block on any critical match: for (const { id, regex } of CRITICAL_PATTERNS) { if (regex.test(fileContent)) { console.error(`BLOCKED: ${id} detected. Remove secret before editing.`); process.exit(2); // Non-zero exit blocks the Write/Edit tool use } } ``` For `high`/`medium` severity patterns, emit a warning via `console.error` but exit with `0` (allow the operation to proceed with a visible warning). --- ## References - [OWASP: Credential Stuffing](https://owasp.org/www-community/attacks/Credential_stuffing) - [GitHub: Secret Scanning Patterns](https://docs.github.com/en/code-security/secret-scanning/secret-scanning-patterns) - [Gitleaks Rule Definitions](https://github.com/gitleaks/gitleaks/blob/master/config/gitleaks.toml) - [Trufflehog Detectors](https://github.com/trufflesecurity/trufflehog/tree/main/pkg/detectors)