ktg-plugin-marketplace/plugins/llm-security/knowledge/secrets-patterns.md

352 lines
16 KiB
Markdown

# Secrets Detection Patterns
## Usage
These patterns are used by:
- `pre-edit-secrets.mjs` hook — blocks Write/Edit operations containing secrets before they reach disk
- `skill-scanner-agent` — flags skills and commands that hardcode or expose secrets
Patterns are JavaScript-compatible regex strings. Apply with the `g` (global) and `i` (case-insensitive) flags unless noted otherwise.
---
## Pattern Format
Each pattern includes:
- `id`: Unique identifier for logging and suppression
- `regex`: JavaScript-compatible regex (string form, apply with `new RegExp(...)`)
- `description`: What it detects
- `severity`: `critical` / `high` / `medium` / `low`
- `false_positive_notes`: When this pattern might false-match
---
## Patterns
### 1. AWS
#### AWS Access Key ID
- **ID:** `aws-access-key-id`
- **Regex:** `\bAKIA[0-9A-Z]{16}\b`
- **Description:** AWS Access Key ID. Always starts with `AKIA` followed by 16 uppercase alphanumeric characters.
- **Severity:** critical
- **False Positive Notes:** None — this prefix+length combination is highly specific to AWS. No known false positives in practice.
#### AWS Secret Access Key
- **ID:** `aws-secret-access-key`
- **Regex:** `(?i)aws[_\-\s.]*secret[_\-\s.]*(?:access[_\-\s.]*)?key["'\s]*[:=]["'\s]*([A-Za-z0-9/+]{40})`
- **Description:** AWS Secret Access Key — 40-character base64 string following a label like `aws_secret_key`, `AWS_SECRET_ACCESS_KEY`, etc.
- **Severity:** critical
- **False Positive Notes:** Generic 40-char base64 strings can appear in other contexts. Require the `aws` + `secret` label context.
#### AWS Session Token
- **ID:** `aws-session-token`
- **Regex:** `(?i)aws[_\-\s.]*session[_\-\s.]*token["'\s]*[:=]["'\s]*([A-Za-z0-9/+=]{100,})`
- **Description:** Temporary AWS session token (STS). Much longer than access keys — typically 200-400 characters.
- **Severity:** critical
- **False Positive Notes:** Long base64 blobs in unrelated contexts (e.g., test fixtures, encoded images). Require the `session_token` label.
---
### 2. Azure
#### Azure Storage Account Key
- **ID:** `azure-storage-key`
- **Regex:** `(?i)AccountKey=([A-Za-z0-9+/]{86}==)`
- **Description:** Azure Storage Account key embedded in a connection string. Always exactly 88 characters ending in `==`.
- **Severity:** critical
- **False Positive Notes:** None — the `AccountKey=` prefix plus exact length is highly specific.
#### Azure Storage Connection String
- **ID:** `azure-storage-connstr`
- **Regex:** `DefaultEndpointsProtocol=https?;AccountName=[^;]+;AccountKey=[A-Za-z0-9+/]{86}==`
- **Description:** Full Azure Storage connection string including account name and key.
- **Severity:** critical
- **False Positive Notes:** None.
#### Azure SAS Token
- **ID:** `azure-sas-token`
- **Regex:** `(?i)(?:sv|sig|se|sp|spr|srt)=[A-Za-z0-9%+/=&]{10,}(?:&(?:sv|sig|se|sp|spr|srt)=[A-Za-z0-9%+/=&]{1,}){3,}`
- **Description:** Azure Shared Access Signature (SAS) token — URL query string containing multiple SAS parameters.
- **Severity:** high
- **False Positive Notes:** URL-encoded query strings with similar parameter names. Require at least 4 distinct SAS parameters (`sv`, `sig`, `se`, `sp`).
#### Azure Client Secret
- **ID:** `azure-client-secret`
- **Regex:** `(?i)client[_\-]?secret["'\s]*[:=]["'\s]*([A-Za-z0-9~._\-]{34,40})`
- **Description:** Azure AD / Entra ID application client secret — 34-40 character alphanumeric string.
- **Severity:** critical
- **False Positive Notes:** Generic password fields with similar length. Always flag and require human review.
#### Azure Service Bus Connection String
- **ID:** `azure-servicebus-connstr`
- **Regex:** `Endpoint=sb://[^;]+;SharedAccessKeyName=[^;]+;SharedAccessKey=[A-Za-z0-9+/=]{43}=`
- **Description:** Azure Service Bus connection string with shared access key.
- **Severity:** critical
- **False Positive Notes:** None — format is highly specific.
---
### 3. Google Cloud Platform
#### GCP API Key
- **ID:** `gcp-api-key`
- **Regex:** `\bAIza[0-9A-Za-z_\-]{35}\b`
- **Description:** Google Cloud / Firebase API key. Always starts with `AIza` followed by 35 alphanumeric characters.
- **Severity:** high
- **False Positive Notes:** None — prefix is specific. Note: GCP API keys have varying scopes; some are safe to expose (browser-restricted keys), but flag all for review.
#### GCP Service Account JSON Marker
- **ID:** `gcp-service-account-json`
- **Regex:** `"type"\s*:\s*"service_account"`
- **Description:** Google Cloud service account JSON credential file marker. The presence of this key indicates a full service account credential object.
- **Severity:** critical
- **False Positive Notes:** Only matches within JSON credential blobs. If found alongside `private_key`, treat as confirmed credential leak.
---
### 4. GitHub
#### GitHub Personal Access Token (Classic)
- **ID:** `github-pat-classic`
- **Regex:** `\bghp_[A-Za-z0-9]{36}\b`
- **Description:** GitHub classic personal access token (PAT). Prefix `ghp_` followed by exactly 36 alphanumeric characters.
- **Severity:** critical
- **False Positive Notes:** None — prefix is specific to GitHub.
#### GitHub Fine-Grained Personal Access Token
- **ID:** `github-pat-fine-grained`
- **Regex:** `\bgithub_pat_[A-Za-z0-9_]{82}\b`
- **Description:** GitHub fine-grained PAT introduced in 2022. Longer and more structured than classic PATs.
- **Severity:** critical
- **False Positive Notes:** None.
#### GitHub OAuth Token
- **ID:** `github-oauth-token`
- **Regex:** `\bgho_[A-Za-z0-9]{36}\b`
- **Description:** GitHub OAuth access token issued via OAuth app flow.
- **Severity:** critical
- **False Positive Notes:** None.
#### GitHub Actions / Server Token
- **ID:** `github-server-token`
- **Regex:** `\bghs_[A-Za-z0-9]{36}\b`
- **Description:** GitHub Apps installation token or Actions runner token.
- **Severity:** high
- **False Positive Notes:** None.
---
### 5. npm
#### npm Automation / Publish Token
- **ID:** `npm-token`
- **Regex:** `\bnpm_[A-Za-z0-9]{36}\b`
- **Description:** npm registry automation or publish token. Prefix `npm_` followed by 36 alphanumeric characters.
- **Severity:** critical
- **False Positive Notes:** None — prefix is specific to npm tokens issued after 2021. Older tokens in `.npmrc` are caught by the legacy pattern below.
#### npm Legacy Auth Token (.npmrc)
- **ID:** `npm-legacy-auth`
- **Regex:** `//registry\.npmjs\.org/:_authToken\s*=\s*([a-f0-9\-]{36,})`
- **Description:** Legacy npm authentication token in `.npmrc` format.
- **Severity:** critical
- **False Positive Notes:** None.
---
### 6. Generic API Keys and Authorization Headers
#### Bearer Token in Authorization Header
- **ID:** `bearer-token`
- **Regex:** `(?i)Authorization\s*[:=]\s*["']?Bearer\s+([A-Za-z0-9\-._~+/]+=*)\b`
- **Description:** HTTP Authorization header with Bearer scheme. Common in hardcoded fetch/axios calls.
- **Severity:** high
- **False Positive Notes:** High false positive rate when the value is a variable reference like `Bearer ${token}` or `Bearer <your-token>`. Skip matches containing `$`, `<`, `>`, or `{`.
#### Generic `api_key` / `api-key` Assignment
- **ID:** `generic-api-key`
- **Regex:** `(?i)\bapi[_\-]?key\s*[:=]\s*["']([A-Za-z0-9\-._]{16,64})["']`
- **Description:** Generic API key assignment in config files, source code, or environment exports.
- **Severity:** high
- **False Positive Notes:** Placeholder values like `your-api-key-here`, `<API_KEY>`, `REPLACE_ME`, `xxx...`. Skip matches where the value is all-same-character or contains angle brackets.
#### OpenAI API Key (Legacy Format)
- **ID:** `openai-api-key-legacy`
- **Regex:** `\bsk-[A-Za-z0-9]{20}T3BlbkFJ[A-Za-z0-9]{20}\b`
- **Description:** OpenAI API key in the legacy format. The substring `T3BlbkFJ` is base64 for `OpenAI`.
- **Severity:** critical
- **False Positive Notes:** None for the legacy format.
#### OpenAI Project-Scoped Key
- **ID:** `openai-project-key`
- **Regex:** `\bsk-proj-[A-Za-z0-9\-_]{40,}\b`
- **Description:** OpenAI project-scoped API key introduced in 2024.
- **Severity:** critical
- **False Positive Notes:** None.
#### Anthropic API Key
- **ID:** `anthropic-api-key`
- **Regex:** `\bsk-ant-api03-[A-Za-z0-9\-_]{93}\b`
- **Description:** Anthropic Claude API key.
- **Severity:** critical
- **False Positive Notes:** None — prefix plus exact length is highly specific.
---
### 7. Private Keys (PEM Format)
PEM header patterns detect private key material. The regex patterns below use escaped hyphens so they match the literal PEM markers in files at scan time.
#### RSA Private Key Header
- **ID:** `rsa-private-key`
- **Regex:** `-{5}BEGIN RSA PRIVATE KEY-{5}`
- **Description:** PEM-encoded RSA private key. The header alone is sufficient to flag — do not require the full key body.
- **Severity:** critical
- **False Positive Notes:** Test fixtures and documentation examples sometimes include truncated PEM blocks. Flag regardless — a truncated key in committed code still indicates a process failure.
#### EC / DSA / OpenSSH Private Key Header
- **ID:** `ec-private-key`
- **Regex:** `-{5}BEGIN (?:EC|DSA|OPENSSH|ENCRYPTED) PRIVATE KEY-{5}`
- **Description:** PEM-encoded elliptic curve, DSA, or OpenSSH private key.
- **Severity:** critical
- **False Positive Notes:** Same as RSA — flag all occurrences.
#### PKCS#8 Private Key Header
- **ID:** `pkcs8-private-key`
- **Regex:** `-{5}BEGIN PRIVATE KEY-{5}`
- **Description:** PKCS#8 encoded private key (format-agnostic, covers RSA, EC, etc.).
- **Severity:** critical
- **False Positive Notes:** None.
**Implementation note for `pre-edit-secrets.mjs`:** Build these regexes at runtime using `new RegExp('-{5}BEGIN RSA PRIVATE KEY-{5}')` rather than as regex literals, so the hook script itself is not flagged by secret scanners.
---
### 8. Database Connection Strings
#### PostgreSQL Connection String
- **ID:** `postgres-connstr`
- **Regex:** `postgres(?:ql)?://[^:]+:[^@]+@[^\s'"]+`
- **Description:** PostgreSQL connection URL with embedded credentials in the format `postgresql://user:password@host/db`.
- **Severity:** critical
- **False Positive Notes:** Matches any non-empty password portion. Skip if password segment is `${...}`, `<password>`, or `*`.
#### MongoDB Connection String
- **ID:** `mongodb-connstr`
- **Regex:** `mongodb(?:\+srv)?://[^:]+:[^@]+@[^\s'"]+`
- **Description:** MongoDB Atlas or local connection string with embedded username and password.
- **Severity:** critical
- **False Positive Notes:** Same exclusions as PostgreSQL.
#### MySQL / MariaDB Connection String
- **ID:** `mysql-connstr`
- **Regex:** `mysql(?:2)?://[^:]+:[^@]+@[^\s'"]+`
- **Description:** MySQL or MariaDB connection URL with credentials.
- **Severity:** critical
- **False Positive Notes:** Same exclusions as PostgreSQL.
#### Redis Connection String with Password
- **ID:** `redis-connstr`
- **Regex:** `redis://:[^@]+@[^\s'"]+`
- **Description:** Redis connection URL with password in the format `redis://:password@host`.
- **Severity:** high
- **False Positive Notes:** Passwordless Redis (`redis://host:6379`) does not match this pattern.
#### Generic JDBC Connection String with Password
- **ID:** `jdbc-connstr`
- **Regex:** `(?i)jdbc:[a-z]+://[^\s"']+;[Pp]assword=[^;\s"']+`
- **Description:** Java JDBC connection string with a `Password=` parameter.
- **Severity:** critical
- **False Positive Notes:** None if `Password=` is present with a non-empty value.
---
### 9. Passwords in Configuration
#### `password` Assignment
- **ID:** `config-password`
- **Regex:** `(?i)(?:^|[\s,;{(])\bpass(?:word|wd)?\s*[:=]\s*["']([^"'$<>{}\s]{6,})["']`
- **Description:** Password assignment in config files (YAML, TOML, JSON, .env, INI). Matches `password: "secret"`, `passwd=hunter2`, etc.
- **Severity:** high
- **False Positive Notes:** High false positive rate in documentation and test fixtures. Skip if value matches common placeholders: `your-password`, `changeme`, `example`, `test`, `placeholder`, `<...>`, `***`, `xxx`.
#### `secret` Key Assignment
- **ID:** `config-secret`
- **Regex:** `(?i)(?:^|[\s,;{(])\bsecret\b\s*[:=]\s*["']([^"'$<>{}\s]{8,})["']`
- **Description:** Generic `secret` key assignment in config or environment files. Django `SECRET_KEY` with a real value is a valid finding.
- **Severity:** high
- **False Positive Notes:** Same exclusions as `config-password`.
#### Sensitive Environment Variable Assignment
- **ID:** `dotenv-secret`
- **Regex:** `(?i)^(?:export\s+)?[A-Z][A-Z0-9_]*(?:SECRET|KEY|TOKEN|PASSWORD|PASSWD|CREDENTIAL|AUTH)[A-Z0-9_]*\s*=\s*(?!["']?\s*["']?)([A-Za-z0-9+/=\-_.@!#%^&*]{8,})`
- **Description:** Environment variable with a security-sensitive name (contains SECRET, KEY, TOKEN, PASSWORD, etc.) assigned a non-empty literal value. Matches `.env` file lines.
- **Severity:** high
- **False Positive Notes:** Variables pointing to file paths (e.g., `KEY_FILE=/etc/ssl/key.pem`) or URLs without credentials. Skip values that are all-uppercase (likely a variable reference like `${DATABASE_URL}`).
---
### 10. JWT Tokens
#### JWT Pattern
- **ID:** `jwt-token`
- **Regex:** `\beyJ[A-Za-z0-9\-_]+\.[A-Za-z0-9\-_]+\.[A-Za-z0-9\-_]+\b`
- **Description:** JSON Web Token in its three-part base64url format (`header.payload.signature`). The header always starts with `eyJ` (base64url encoding of `{"`).
- **Severity:** medium
- **False Positive Notes:** **High false positive rate.** JWTs are frequently used in tests, documentation, and mock data. Many JWTs are intentionally short-lived or scope-limited. Flag for human review rather than hard-blocking. Skip matches in files under `tests/`, `fixtures/`, `__mocks__/`, `*.test.*`, `*.spec.*`. Escalate to `critical` only if the payload segment decodes to contain an `exp` claim more than one year in the future.
---
## False Positive Suppression Rules
Apply these globally before reporting any match:
1. **Placeholder values** — Skip if the matched value contains: `your-`, `<`, `>`, `example`, `placeholder`, `replace`, `changeme`, `xxx`, `***`, `TODO`, `FIXME`
2. **Variable references** — Skip if the matched value contains: `${`, `$(`, `%{`, `ENV[`, `os.environ`
3. **Test files** — Lower severity by one level for matches in: `*.test.ts`, `*.spec.js`, `fixtures/`, `__mocks__/`, `testdata/`
4. **Documentation** — Lower severity for matches in: `*.md`, `*.txt`, `docs/`, `README*` — but never suppress `critical` patterns (PEM key headers, real AWS Access Key IDs)
5. **All-same-character values** — Skip if the value is a repetition of a single character (e.g., `xxxxxxxx`, `00000000`)
6. **Short values** — Skip generic patterns if the matched secret value is fewer than 8 characters
---
## Implementation Notes for `pre-edit-secrets.mjs`
```js
// Build PEM patterns at runtime to avoid triggering hook self-detection:
const PEM_RSA = new RegExp('-{5}BEGIN RSA PRIVATE KEY-{5}');
const PEM_GENERIC = new RegExp('-{5}BEGIN (?:EC|DSA|OPENSSH|ENCRYPTED) PRIVATE KEY-{5}');
const PEM_PKCS8 = new RegExp('-{5}BEGIN PRIVATE KEY-{5}');
const CRITICAL_PATTERNS = [
{ id: 'aws-access-key-id', regex: /\bAKIA[0-9A-Z]{16}\b/g },
{ id: 'github-pat-classic', regex: /\bghp_[A-Za-z0-9]{36}\b/g },
{ id: 'github-pat-fine', regex: /\bgithub_pat_[A-Za-z0-9_]{82}\b/g },
{ id: 'npm-token', regex: /\bnpm_[A-Za-z0-9]{36}\b/g },
{ id: 'openai-project-key', regex: /\bsk-proj-[A-Za-z0-9\-_]{40,}\b/g },
{ id: 'anthropic-api-key', regex: /\bsk-ant-api03-[A-Za-z0-9\-_]{93}\b/g },
{ id: 'rsa-private-key', regex: PEM_RSA },
{ id: 'ec-private-key', regex: PEM_GENERIC },
{ id: 'pkcs8-private-key', regex: PEM_PKCS8 },
];
// Hard-block on any critical match:
for (const { id, regex } of CRITICAL_PATTERNS) {
if (regex.test(fileContent)) {
console.error(`BLOCKED: ${id} detected. Remove secret before editing.`);
process.exit(2); // Non-zero exit blocks the Write/Edit tool use
}
}
```
For `high`/`medium` severity patterns, emit a warning via `console.error` but exit with `0` (allow the operation to proceed with a visible warning).
---
## References
- [OWASP: Credential Stuffing](https://owasp.org/www-community/attacks/Credential_stuffing)
- [GitHub: Secret Scanning Patterns](https://docs.github.com/en/code-security/secret-scanning/secret-scanning-patterns)
- [Gitleaks Rule Definitions](https://github.com/gitleaks/gitleaks/blob/master/config/gitleaks.toml)
- [Trufflehog Detectors](https://github.com/trufflesecurity/trufflehog/tree/main/pkg/detectors)