Full port of llm-security plugin for internal use on Windows with GitHub Copilot CLI. Protocol translation layer (copilot-hook-runner.mjs) normalizes Copilot camelCase I/O to Claude Code snake_case format — all original hook scripts run unmodified. - 8 hooks with protocol translation (stdin/stdout/exit code) - 18 SKILL.md skills (Agent Skills Open Standard) - 6 .agent.md agent definitions - 20 scanners + 14 scanner lib modules (unchanged) - 14 knowledge files (unchanged) - 39 test files including copilot-port-verify.mjs (17 tests) - Windows-ready: node:path, os.tmpdir(), process.execPath, no bash Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
352 lines
16 KiB
Markdown
352 lines
16 KiB
Markdown
# Secrets Detection Patterns
|
|
|
|
## Usage
|
|
|
|
These patterns are used by:
|
|
- `pre-edit-secrets.mjs` hook — blocks Write/Edit operations containing secrets before they reach disk
|
|
- `skill-scanner-agent` — flags skills and commands that hardcode or expose secrets
|
|
|
|
Patterns are JavaScript-compatible regex strings. Apply with the `g` (global) and `i` (case-insensitive) flags unless noted otherwise.
|
|
|
|
---
|
|
|
|
## Pattern Format
|
|
|
|
Each pattern includes:
|
|
- `id`: Unique identifier for logging and suppression
|
|
- `regex`: JavaScript-compatible regex (string form, apply with `new RegExp(...)`)
|
|
- `description`: What it detects
|
|
- `severity`: `critical` / `high` / `medium` / `low`
|
|
- `false_positive_notes`: When this pattern might false-match
|
|
|
|
---
|
|
|
|
## Patterns
|
|
|
|
### 1. AWS
|
|
|
|
#### AWS Access Key ID
|
|
- **ID:** `aws-access-key-id`
|
|
- **Regex:** `\bAKIA[0-9A-Z]{16}\b`
|
|
- **Description:** AWS Access Key ID. Always starts with `AKIA` followed by 16 uppercase alphanumeric characters.
|
|
- **Severity:** critical
|
|
- **False Positive Notes:** None — this prefix+length combination is highly specific to AWS. No known false positives in practice.
|
|
|
|
#### AWS Secret Access Key
|
|
- **ID:** `aws-secret-access-key`
|
|
- **Regex:** `(?i)aws[_\-\s.]*secret[_\-\s.]*(?:access[_\-\s.]*)?key["'\s]*[:=]["'\s]*([A-Za-z0-9/+]{40})`
|
|
- **Description:** AWS Secret Access Key — 40-character base64 string following a label like `aws_secret_key`, `AWS_SECRET_ACCESS_KEY`, etc.
|
|
- **Severity:** critical
|
|
- **False Positive Notes:** Generic 40-char base64 strings can appear in other contexts. Require the `aws` + `secret` label context.
|
|
|
|
#### AWS Session Token
|
|
- **ID:** `aws-session-token`
|
|
- **Regex:** `(?i)aws[_\-\s.]*session[_\-\s.]*token["'\s]*[:=]["'\s]*([A-Za-z0-9/+=]{100,})`
|
|
- **Description:** Temporary AWS session token (STS). Much longer than access keys — typically 200-400 characters.
|
|
- **Severity:** critical
|
|
- **False Positive Notes:** Long base64 blobs in unrelated contexts (e.g., test fixtures, encoded images). Require the `session_token` label.
|
|
|
|
---
|
|
|
|
### 2. Azure
|
|
|
|
#### Azure Storage Account Key
|
|
- **ID:** `azure-storage-key`
|
|
- **Regex:** `(?i)AccountKey=([A-Za-z0-9+/]{86}==)`
|
|
- **Description:** Azure Storage Account key embedded in a connection string. Always exactly 88 characters ending in `==`.
|
|
- **Severity:** critical
|
|
- **False Positive Notes:** None — the `AccountKey=` prefix plus exact length is highly specific.
|
|
|
|
#### Azure Storage Connection String
|
|
- **ID:** `azure-storage-connstr`
|
|
- **Regex:** `DefaultEndpointsProtocol=https?;AccountName=[^;]+;AccountKey=[A-Za-z0-9+/]{86}==`
|
|
- **Description:** Full Azure Storage connection string including account name and key.
|
|
- **Severity:** critical
|
|
- **False Positive Notes:** None.
|
|
|
|
#### Azure SAS Token
|
|
- **ID:** `azure-sas-token`
|
|
- **Regex:** `(?i)(?:sv|sig|se|sp|spr|srt)=[A-Za-z0-9%+/=&]{10,}(?:&(?:sv|sig|se|sp|spr|srt)=[A-Za-z0-9%+/=&]{1,}){3,}`
|
|
- **Description:** Azure Shared Access Signature (SAS) token — URL query string containing multiple SAS parameters.
|
|
- **Severity:** high
|
|
- **False Positive Notes:** URL-encoded query strings with similar parameter names. Require at least 4 distinct SAS parameters (`sv`, `sig`, `se`, `sp`).
|
|
|
|
#### Azure Client Secret
|
|
- **ID:** `azure-client-secret`
|
|
- **Regex:** `(?i)client[_\-]?secret["'\s]*[:=]["'\s]*([A-Za-z0-9~._\-]{34,40})`
|
|
- **Description:** Azure AD / Entra ID application client secret — 34-40 character alphanumeric string.
|
|
- **Severity:** critical
|
|
- **False Positive Notes:** Generic password fields with similar length. Always flag and require human review.
|
|
|
|
#### Azure Service Bus Connection String
|
|
- **ID:** `azure-servicebus-connstr`
|
|
- **Regex:** `Endpoint=sb://[^;]+;SharedAccessKeyName=[^;]+;SharedAccessKey=[A-Za-z0-9+/=]{43}=`
|
|
- **Description:** Azure Service Bus connection string with shared access key.
|
|
- **Severity:** critical
|
|
- **False Positive Notes:** None — format is highly specific.
|
|
|
|
---
|
|
|
|
### 3. Google Cloud Platform
|
|
|
|
#### GCP API Key
|
|
- **ID:** `gcp-api-key`
|
|
- **Regex:** `\bAIza[0-9A-Za-z_\-]{35}\b`
|
|
- **Description:** Google Cloud / Firebase API key. Always starts with `AIza` followed by 35 alphanumeric characters.
|
|
- **Severity:** high
|
|
- **False Positive Notes:** None — prefix is specific. Note: GCP API keys have varying scopes; some are safe to expose (browser-restricted keys), but flag all for review.
|
|
|
|
#### GCP Service Account JSON Marker
|
|
- **ID:** `gcp-service-account-json`
|
|
- **Regex:** `"type"\s*:\s*"service_account"`
|
|
- **Description:** Google Cloud service account JSON credential file marker. The presence of this key indicates a full service account credential object.
|
|
- **Severity:** critical
|
|
- **False Positive Notes:** Only matches within JSON credential blobs. If found alongside `private_key`, treat as confirmed credential leak.
|
|
|
|
---
|
|
|
|
### 4. GitHub
|
|
|
|
#### GitHub Personal Access Token (Classic)
|
|
- **ID:** `github-pat-classic`
|
|
- **Regex:** `\bghp_[A-Za-z0-9]{36}\b`
|
|
- **Description:** GitHub classic personal access token (PAT). Prefix `ghp_` followed by exactly 36 alphanumeric characters.
|
|
- **Severity:** critical
|
|
- **False Positive Notes:** None — prefix is specific to GitHub.
|
|
|
|
#### GitHub Fine-Grained Personal Access Token
|
|
- **ID:** `github-pat-fine-grained`
|
|
- **Regex:** `\bgithub_pat_[A-Za-z0-9_]{82}\b`
|
|
- **Description:** GitHub fine-grained PAT introduced in 2022. Longer and more structured than classic PATs.
|
|
- **Severity:** critical
|
|
- **False Positive Notes:** None.
|
|
|
|
#### GitHub OAuth Token
|
|
- **ID:** `github-oauth-token`
|
|
- **Regex:** `\bgho_[A-Za-z0-9]{36}\b`
|
|
- **Description:** GitHub OAuth access token issued via OAuth app flow.
|
|
- **Severity:** critical
|
|
- **False Positive Notes:** None.
|
|
|
|
#### GitHub Actions / Server Token
|
|
- **ID:** `github-server-token`
|
|
- **Regex:** `\bghs_[A-Za-z0-9]{36}\b`
|
|
- **Description:** GitHub Apps installation token or Actions runner token.
|
|
- **Severity:** high
|
|
- **False Positive Notes:** None.
|
|
|
|
---
|
|
|
|
### 5. npm
|
|
|
|
#### npm Automation / Publish Token
|
|
- **ID:** `npm-token`
|
|
- **Regex:** `\bnpm_[A-Za-z0-9]{36}\b`
|
|
- **Description:** npm registry automation or publish token. Prefix `npm_` followed by 36 alphanumeric characters.
|
|
- **Severity:** critical
|
|
- **False Positive Notes:** None — prefix is specific to npm tokens issued after 2021. Older tokens in `.npmrc` are caught by the legacy pattern below.
|
|
|
|
#### npm Legacy Auth Token (.npmrc)
|
|
- **ID:** `npm-legacy-auth`
|
|
- **Regex:** `//registry\.npmjs\.org/:_authToken\s*=\s*([a-f0-9\-]{36,})`
|
|
- **Description:** Legacy npm authentication token in `.npmrc` format.
|
|
- **Severity:** critical
|
|
- **False Positive Notes:** None.
|
|
|
|
---
|
|
|
|
### 6. Generic API Keys and Authorization Headers
|
|
|
|
#### Bearer Token in Authorization Header
|
|
- **ID:** `bearer-token`
|
|
- **Regex:** `(?i)Authorization\s*[:=]\s*["']?Bearer\s+([A-Za-z0-9\-._~+/]+=*)\b`
|
|
- **Description:** HTTP Authorization header with Bearer scheme. Common in hardcoded fetch/axios calls.
|
|
- **Severity:** high
|
|
- **False Positive Notes:** High false positive rate when the value is a variable reference like `Bearer ${token}` or `Bearer <your-token>`. Skip matches containing `$`, `<`, `>`, or `{`.
|
|
|
|
#### Generic `api_key` / `api-key` Assignment
|
|
- **ID:** `generic-api-key`
|
|
- **Regex:** `(?i)\bapi[_\-]?key\s*[:=]\s*["']([A-Za-z0-9\-._]{16,64})["']`
|
|
- **Description:** Generic API key assignment in config files, source code, or environment exports.
|
|
- **Severity:** high
|
|
- **False Positive Notes:** Placeholder values like `your-api-key-here`, `<API_KEY>`, `REPLACE_ME`, `xxx...`. Skip matches where the value is all-same-character or contains angle brackets.
|
|
|
|
#### OpenAI API Key (Legacy Format)
|
|
- **ID:** `openai-api-key-legacy`
|
|
- **Regex:** `\bsk-[A-Za-z0-9]{20}T3BlbkFJ[A-Za-z0-9]{20}\b`
|
|
- **Description:** OpenAI API key in the legacy format. The substring `T3BlbkFJ` is base64 for `OpenAI`.
|
|
- **Severity:** critical
|
|
- **False Positive Notes:** None for the legacy format.
|
|
|
|
#### OpenAI Project-Scoped Key
|
|
- **ID:** `openai-project-key`
|
|
- **Regex:** `\bsk-proj-[A-Za-z0-9\-_]{40,}\b`
|
|
- **Description:** OpenAI project-scoped API key introduced in 2024.
|
|
- **Severity:** critical
|
|
- **False Positive Notes:** None.
|
|
|
|
#### Anthropic API Key
|
|
- **ID:** `anthropic-api-key`
|
|
- **Regex:** `\bsk-ant-api03-[A-Za-z0-9\-_]{93}\b`
|
|
- **Description:** Anthropic Claude API key.
|
|
- **Severity:** critical
|
|
- **False Positive Notes:** None — prefix plus exact length is highly specific.
|
|
|
|
---
|
|
|
|
### 7. Private Keys (PEM Format)
|
|
|
|
PEM header patterns detect private key material. The regex patterns below use escaped hyphens so they match the literal PEM markers in files at scan time.
|
|
|
|
#### RSA Private Key Header
|
|
- **ID:** `rsa-private-key`
|
|
- **Regex:** `-{5}BEGIN RSA PRIVATE KEY-{5}`
|
|
- **Description:** PEM-encoded RSA private key. The header alone is sufficient to flag — do not require the full key body.
|
|
- **Severity:** critical
|
|
- **False Positive Notes:** Test fixtures and documentation examples sometimes include truncated PEM blocks. Flag regardless — a truncated key in committed code still indicates a process failure.
|
|
|
|
#### EC / DSA / OpenSSH Private Key Header
|
|
- **ID:** `ec-private-key`
|
|
- **Regex:** `-{5}BEGIN (?:EC|DSA|OPENSSH|ENCRYPTED) PRIVATE KEY-{5}`
|
|
- **Description:** PEM-encoded elliptic curve, DSA, or OpenSSH private key.
|
|
- **Severity:** critical
|
|
- **False Positive Notes:** Same as RSA — flag all occurrences.
|
|
|
|
#### PKCS#8 Private Key Header
|
|
- **ID:** `pkcs8-private-key`
|
|
- **Regex:** `-{5}BEGIN PRIVATE KEY-{5}`
|
|
- **Description:** PKCS#8 encoded private key (format-agnostic, covers RSA, EC, etc.).
|
|
- **Severity:** critical
|
|
- **False Positive Notes:** None.
|
|
|
|
**Implementation note for `pre-edit-secrets.mjs`:** Build these regexes at runtime using `new RegExp('-{5}BEGIN RSA PRIVATE KEY-{5}')` rather than as regex literals, so the hook script itself is not flagged by secret scanners.
|
|
|
|
---
|
|
|
|
### 8. Database Connection Strings
|
|
|
|
#### PostgreSQL Connection String
|
|
- **ID:** `postgres-connstr`
|
|
- **Regex:** `postgres(?:ql)?://[^:]+:[^@]+@[^\s'"]+`
|
|
- **Description:** PostgreSQL connection URL with embedded credentials in the format `postgresql://user:password@host/db`.
|
|
- **Severity:** critical
|
|
- **False Positive Notes:** Matches any non-empty password portion. Skip if password segment is `${...}`, `<password>`, or `*`.
|
|
|
|
#### MongoDB Connection String
|
|
- **ID:** `mongodb-connstr`
|
|
- **Regex:** `mongodb(?:\+srv)?://[^:]+:[^@]+@[^\s'"]+`
|
|
- **Description:** MongoDB Atlas or local connection string with embedded username and password.
|
|
- **Severity:** critical
|
|
- **False Positive Notes:** Same exclusions as PostgreSQL.
|
|
|
|
#### MySQL / MariaDB Connection String
|
|
- **ID:** `mysql-connstr`
|
|
- **Regex:** `mysql(?:2)?://[^:]+:[^@]+@[^\s'"]+`
|
|
- **Description:** MySQL or MariaDB connection URL with credentials.
|
|
- **Severity:** critical
|
|
- **False Positive Notes:** Same exclusions as PostgreSQL.
|
|
|
|
#### Redis Connection String with Password
|
|
- **ID:** `redis-connstr`
|
|
- **Regex:** `redis://:[^@]+@[^\s'"]+`
|
|
- **Description:** Redis connection URL with password in the format `redis://:password@host`.
|
|
- **Severity:** high
|
|
- **False Positive Notes:** Passwordless Redis (`redis://host:6379`) does not match this pattern.
|
|
|
|
#### Generic JDBC Connection String with Password
|
|
- **ID:** `jdbc-connstr`
|
|
- **Regex:** `(?i)jdbc:[a-z]+://[^\s"']+;[Pp]assword=[^;\s"']+`
|
|
- **Description:** Java JDBC connection string with a `Password=` parameter.
|
|
- **Severity:** critical
|
|
- **False Positive Notes:** None if `Password=` is present with a non-empty value.
|
|
|
|
---
|
|
|
|
### 9. Passwords in Configuration
|
|
|
|
#### `password` Assignment
|
|
- **ID:** `config-password`
|
|
- **Regex:** `(?i)(?:^|[\s,;{(])\bpass(?:word|wd)?\s*[:=]\s*["']([^"'$<>{}\s]{6,})["']`
|
|
- **Description:** Password assignment in config files (YAML, TOML, JSON, .env, INI). Matches `password: "secret"`, `passwd=hunter2`, etc.
|
|
- **Severity:** high
|
|
- **False Positive Notes:** High false positive rate in documentation and test fixtures. Skip if value matches common placeholders: `your-password`, `changeme`, `example`, `test`, `placeholder`, `<...>`, `***`, `xxx`.
|
|
|
|
#### `secret` Key Assignment
|
|
- **ID:** `config-secret`
|
|
- **Regex:** `(?i)(?:^|[\s,;{(])\bsecret\b\s*[:=]\s*["']([^"'$<>{}\s]{8,})["']`
|
|
- **Description:** Generic `secret` key assignment in config or environment files. Django `SECRET_KEY` with a real value is a valid finding.
|
|
- **Severity:** high
|
|
- **False Positive Notes:** Same exclusions as `config-password`.
|
|
|
|
#### Sensitive Environment Variable Assignment
|
|
- **ID:** `dotenv-secret`
|
|
- **Regex:** `(?i)^(?:export\s+)?[A-Z][A-Z0-9_]*(?:SECRET|KEY|TOKEN|PASSWORD|PASSWD|CREDENTIAL|AUTH)[A-Z0-9_]*\s*=\s*(?!["']?\s*["']?)([A-Za-z0-9+/=\-_.@!#%^&*]{8,})`
|
|
- **Description:** Environment variable with a security-sensitive name (contains SECRET, KEY, TOKEN, PASSWORD, etc.) assigned a non-empty literal value. Matches `.env` file lines.
|
|
- **Severity:** high
|
|
- **False Positive Notes:** Variables pointing to file paths (e.g., `KEY_FILE=/etc/ssl/key.pem`) or URLs without credentials. Skip values that are all-uppercase (likely a variable reference like `${DATABASE_URL}`).
|
|
|
|
---
|
|
|
|
### 10. JWT Tokens
|
|
|
|
#### JWT Pattern
|
|
- **ID:** `jwt-token`
|
|
- **Regex:** `\beyJ[A-Za-z0-9\-_]+\.[A-Za-z0-9\-_]+\.[A-Za-z0-9\-_]+\b`
|
|
- **Description:** JSON Web Token in its three-part base64url format (`header.payload.signature`). The header always starts with `eyJ` (base64url encoding of `{"`).
|
|
- **Severity:** medium
|
|
- **False Positive Notes:** **High false positive rate.** JWTs are frequently used in tests, documentation, and mock data. Many JWTs are intentionally short-lived or scope-limited. Flag for human review rather than hard-blocking. Skip matches in files under `tests/`, `fixtures/`, `__mocks__/`, `*.test.*`, `*.spec.*`. Escalate to `critical` only if the payload segment decodes to contain an `exp` claim more than one year in the future.
|
|
|
|
---
|
|
|
|
## False Positive Suppression Rules
|
|
|
|
Apply these globally before reporting any match:
|
|
|
|
1. **Placeholder values** — Skip if the matched value contains: `your-`, `<`, `>`, `example`, `placeholder`, `replace`, `changeme`, `xxx`, `***`, `TODO`, `FIXME`
|
|
2. **Variable references** — Skip if the matched value contains: `${`, `$(`, `%{`, `ENV[`, `os.environ`
|
|
3. **Test files** — Lower severity by one level for matches in: `*.test.ts`, `*.spec.js`, `fixtures/`, `__mocks__/`, `testdata/`
|
|
4. **Documentation** — Lower severity for matches in: `*.md`, `*.txt`, `docs/`, `README*` — but never suppress `critical` patterns (PEM key headers, real AWS Access Key IDs)
|
|
5. **All-same-character values** — Skip if the value is a repetition of a single character (e.g., `xxxxxxxx`, `00000000`)
|
|
6. **Short values** — Skip generic patterns if the matched secret value is fewer than 8 characters
|
|
|
|
---
|
|
|
|
## Implementation Notes for `pre-edit-secrets.mjs`
|
|
|
|
```js
|
|
// Build PEM patterns at runtime to avoid triggering hook self-detection:
|
|
const PEM_RSA = new RegExp('-{5}BEGIN RSA PRIVATE KEY-{5}');
|
|
const PEM_GENERIC = new RegExp('-{5}BEGIN (?:EC|DSA|OPENSSH|ENCRYPTED) PRIVATE KEY-{5}');
|
|
const PEM_PKCS8 = new RegExp('-{5}BEGIN PRIVATE KEY-{5}');
|
|
|
|
const CRITICAL_PATTERNS = [
|
|
{ id: 'aws-access-key-id', regex: /\bAKIA[0-9A-Z]{16}\b/g },
|
|
{ id: 'github-pat-classic', regex: /\bghp_[A-Za-z0-9]{36}\b/g },
|
|
{ id: 'github-pat-fine', regex: /\bgithub_pat_[A-Za-z0-9_]{82}\b/g },
|
|
{ id: 'npm-token', regex: /\bnpm_[A-Za-z0-9]{36}\b/g },
|
|
{ id: 'openai-project-key', regex: /\bsk-proj-[A-Za-z0-9\-_]{40,}\b/g },
|
|
{ id: 'anthropic-api-key', regex: /\bsk-ant-api03-[A-Za-z0-9\-_]{93}\b/g },
|
|
{ id: 'rsa-private-key', regex: PEM_RSA },
|
|
{ id: 'ec-private-key', regex: PEM_GENERIC },
|
|
{ id: 'pkcs8-private-key', regex: PEM_PKCS8 },
|
|
];
|
|
|
|
// Hard-block on any critical match:
|
|
for (const { id, regex } of CRITICAL_PATTERNS) {
|
|
if (regex.test(fileContent)) {
|
|
console.error(`BLOCKED: ${id} detected. Remove secret before editing.`);
|
|
process.exit(2); // Non-zero exit blocks the Write/Edit tool use
|
|
}
|
|
}
|
|
```
|
|
|
|
For `high`/`medium` severity patterns, emit a warning via `console.error` but exit with `0` (allow the operation to proceed with a visible warning).
|
|
|
|
---
|
|
|
|
## References
|
|
|
|
- [OWASP: Credential Stuffing](https://owasp.org/www-community/attacks/Credential_stuffing)
|
|
- [GitHub: Secret Scanning Patterns](https://docs.github.com/en/code-security/secret-scanning/secret-scanning-patterns)
|
|
- [Gitleaks Rule Definitions](https://github.com/gitleaks/gitleaks/blob/master/config/gitleaks.toml)
|
|
- [Trufflehog Detectors](https://github.com/trufflesecurity/trufflehog/tree/main/pkg/detectors)
|