ktg-plugin-marketplace/plugins/llm-security-copilot/knowledge/secrets-patterns.md

# Secrets Detection Patterns

## Usage

These patterns are used by:
- `pre-edit-secrets.mjs` hook — blocks Write/Edit operations containing secrets before they reach disk
- `skill-scanner-agent` — flags skills and commands that hardcode or expose secrets

Patterns are JavaScript-compatible regex strings. Apply with the `g` (global) and `i` (case-insensitive) flags unless noted otherwise.

---

## Pattern Format

Each pattern includes:
- `id`: Unique identifier for logging and suppression
- `regex`: JavaScript-compatible regex (string form, apply with `new RegExp(...)`)
- `description`: What it detects
- `severity`: `critical` / `high` / `medium` / `low`
- `false_positive_notes`: When this pattern might false-match

---

## Patterns

### 1. AWS

#### AWS Access Key ID
- **ID:** `aws-access-key-id`
- **Regex:** `\bAKIA[0-9A-Z]{16}\b`
- **Description:** AWS Access Key ID. Always starts with `AKIA` followed by 16 uppercase alphanumeric characters.
- **Severity:** critical
- **False Positive Notes:** None — this prefix+length combination is highly specific to AWS. No known false positives in practice.

#### AWS Secret Access Key
- **ID:** `aws-secret-access-key`
- **Regex:** `(?i)aws[_\-\s.]*secret[_\-\s.]*(?:access[_\-\s.]*)?key["'\s]*[:=]["'\s]*([A-Za-z0-9/+]{40})`
- **Description:** AWS Secret Access Key — 40-character base64 string following a label like `aws_secret_key`, `AWS_SECRET_ACCESS_KEY`, etc.
- **Severity:** critical
- **False Positive Notes:** Generic 40-char base64 strings can appear in other contexts. Require the `aws` + `secret` label context.

#### AWS Session Token
- **ID:** `aws-session-token`
- **Regex:** `(?i)aws[_\-\s.]*session[_\-\s.]*token["'\s]*[:=]["'\s]*([A-Za-z0-9/+=]{100,})`
- **Description:** Temporary AWS session token (STS). Much longer than access keys — typically 200-400 characters.
- **Severity:** critical
- **False Positive Notes:** Long base64 blobs in unrelated contexts (e.g., test fixtures, encoded images). Require the `session_token` label.

---

### 2. Azure

#### Azure Storage Account Key
- **ID:** `azure-storage-key`
- **Regex:** `(?i)AccountKey=([A-Za-z0-9+/]{86}==)`
- **Description:** Azure Storage Account key embedded in a connection string. Always exactly 88 characters ending in `==`.
- **Severity:** critical
- **False Positive Notes:** None — the `AccountKey=` prefix plus exact length is highly specific.

#### Azure Storage Connection String
- **ID:** `azure-storage-connstr`
- **Regex:** `DefaultEndpointsProtocol=https?;AccountName=[^;]+;AccountKey=[A-Za-z0-9+/]{86}==`
- **Description:** Full Azure Storage connection string including account name and key.
- **Severity:** critical
- **False Positive Notes:** None.

#### Azure SAS Token
- **ID:** `azure-sas-token`
- **Regex:** `(?i)(?:sv|sig|se|sp|spr|srt)=[A-Za-z0-9%+/=&]{10,}(?:&(?:sv|sig|se|sp|spr|srt)=[A-Za-z0-9%+/=&]{1,}){3,}`
- **Description:** Azure Shared Access Signature (SAS) token — URL query string containing multiple SAS parameters.
- **Severity:** high
- **False Positive Notes:** URL-encoded query strings with similar parameter names. Require at least 4 distinct SAS parameters (`sv`, `sig`, `se`, `sp`).

#### Azure Client Secret
- **ID:** `azure-client-secret`
- **Regex:** `(?i)client[_\-]?secret["'\s]*[:=]["'\s]*([A-Za-z0-9~._\-]{34,40})`
- **Description:** Azure AD / Entra ID application client secret — 34-40 character alphanumeric string.
- **Severity:** critical
- **False Positive Notes:** Generic password fields with similar length. Always flag and require human review.

#### Azure Service Bus Connection String
- **ID:** `azure-servicebus-connstr`
- **Regex:** `Endpoint=sb://[^;]+;SharedAccessKeyName=[^;]+;SharedAccessKey=[A-Za-z0-9+/=]{43}=`
- **Description:** Azure Service Bus connection string with shared access key.
- **Severity:** critical
- **False Positive Notes:** None — format is highly specific.

---

### 3. Google Cloud Platform

#### GCP API Key
- **ID:** `gcp-api-key`
- **Regex:** `\bAIza[0-9A-Za-z_\-]{35}\b`
- **Description:** Google Cloud / Firebase API key. Always starts with `AIza` followed by 35 alphanumeric characters.
- **Severity:** high
- **False Positive Notes:** None — prefix is specific. Note: GCP API keys have varying scopes; some are safe to expose (browser-restricted keys), but flag all for review.

#### GCP Service Account JSON Marker
- **ID:** `gcp-service-account-json`
- **Regex:** `"type"\s*:\s*"service_account"`
- **Description:** Google Cloud service account JSON credential file marker. The presence of this key indicates a full service account credential object.
- **Severity:** critical
- **False Positive Notes:** Only matches within JSON credential blobs. If found alongside `private_key`, treat as confirmed credential leak.

---

### 4. GitHub

#### GitHub Personal Access Token (Classic)
- **ID:** `github-pat-classic`
- **Regex:** `\bghp_[A-Za-z0-9]{36}\b`
- **Description:** GitHub classic personal access token (PAT). Prefix `ghp_` followed by exactly 36 alphanumeric characters.
- **Severity:** critical
- **False Positive Notes:** None — prefix is specific to GitHub.

#### GitHub Fine-Grained Personal Access Token
- **ID:** `github-pat-fine-grained`
- **Regex:** `\bgithub_pat_[A-Za-z0-9_]{82}\b`
- **Description:** GitHub fine-grained PAT introduced in 2022. Longer and more structured than classic PATs.
- **Severity:** critical
- **False Positive Notes:** None.

#### GitHub OAuth Token
- **ID:** `github-oauth-token`
- **Regex:** `\bgho_[A-Za-z0-9]{36}\b`
- **Description:** GitHub OAuth access token issued via OAuth app flow.
- **Severity:** critical
- **False Positive Notes:** None.

#### GitHub Actions / Server Token
- **ID:** `github-server-token`
- **Regex:** `\bghs_[A-Za-z0-9]{36}\b`
- **Description:** GitHub Apps installation token or Actions runner token.
- **Severity:** high
- **False Positive Notes:** None.

---

### 5. npm

#### npm Automation / Publish Token
- **ID:** `npm-token`
- **Regex:** `\bnpm_[A-Za-z0-9]{36}\b`
- **Description:** npm registry automation or publish token. Prefix `npm_` followed by 36 alphanumeric characters.
- **Severity:** critical
- **False Positive Notes:** None — prefix is specific to npm tokens issued after 2021. Older tokens in `.npmrc` are caught by the legacy pattern below.

#### npm Legacy Auth Token (.npmrc)
- **ID:** `npm-legacy-auth`
- **Regex:** `//registry\.npmjs\.org/:_authToken\s*=\s*([a-f0-9\-]{36,})`
- **Description:** Legacy npm authentication token in `.npmrc` format.
- **Severity:** critical
- **False Positive Notes:** None.

---

### 6. Generic API Keys and Authorization Headers

#### Bearer Token in Authorization Header
- **ID:** `bearer-token`
- **Regex:** `(?i)Authorization\s*[:=]\s*["']?Bearer\s+([A-Za-z0-9\-._~+/]+=*)\b`
- **Description:** HTTP Authorization header with Bearer scheme. Common in hardcoded fetch/axios calls.
- **Severity:** high
- **False Positive Notes:** High false positive rate when the value is a variable reference like `Bearer ${token}` or `Bearer <your-token>`. Skip matches containing `$`, `<`, `>`, or `{`.

#### Generic `api_key` / `api-key` Assignment
- **ID:** `generic-api-key`
- **Regex:** `(?i)\bapi[_\-]?key\s*[:=]\s*["']([A-Za-z0-9\-._]{16,64})["']`
- **Description:** Generic API key assignment in config files, source code, or environment exports.
- **Severity:** high
- **False Positive Notes:** Placeholder values like `your-api-key-here`, `<API_KEY>`, `REPLACE_ME`, `xxx...`. Skip matches where the value is all-same-character or contains angle brackets.

#### OpenAI API Key (Legacy Format)
- **ID:** `openai-api-key-legacy`
- **Regex:** `\bsk-[A-Za-z0-9]{20}T3BlbkFJ[A-Za-z0-9]{20}\b`
- **Description:** OpenAI API key in the legacy format. The substring `T3BlbkFJ` is base64 for `OpenAI`.
- **Severity:** critical
- **False Positive Notes:** None for the legacy format.

#### OpenAI Project-Scoped Key
- **ID:** `openai-project-key`
- **Regex:** `\bsk-proj-[A-Za-z0-9\-_]{40,}\b`
- **Description:** OpenAI project-scoped API key introduced in 2024.
- **Severity:** critical
- **False Positive Notes:** None.

#### Anthropic API Key
- **ID:** `anthropic-api-key`
- **Regex:** `\bsk-ant-api03-[A-Za-z0-9\-_]{93}\b`
- **Description:** Anthropic Claude API key.
- **Severity:** critical
- **False Positive Notes:** None — prefix plus exact length is highly specific.

---

### 7. Private Keys (PEM Format)

PEM header patterns detect private key material. The regex patterns below use escaped hyphens so they match the literal PEM markers in files at scan time.

#### RSA Private Key Header
- **ID:** `rsa-private-key`
- **Regex:** `-{5}BEGIN RSA PRIVATE KEY-{5}`
- **Description:** PEM-encoded RSA private key. The header alone is sufficient to flag — do not require the full key body.
- **Severity:** critical
- **False Positive Notes:** Test fixtures and documentation examples sometimes include truncated PEM blocks. Flag regardless — a truncated key in committed code still indicates a process failure.

#### EC / DSA / OpenSSH Private Key Header
- **ID:** `ec-private-key`
- **Regex:** `-{5}BEGIN (?:EC|DSA|OPENSSH|ENCRYPTED) PRIVATE KEY-{5}`
- **Description:** PEM-encoded elliptic curve, DSA, or OpenSSH private key.
- **Severity:** critical
- **False Positive Notes:** Same as RSA — flag all occurrences.

#### PKCS#8 Private Key Header
- **ID:** `pkcs8-private-key`
- **Regex:** `-{5}BEGIN PRIVATE KEY-{5}`
- **Description:** PKCS#8 encoded private key (format-agnostic, covers RSA, EC, etc.).
- **Severity:** critical
- **False Positive Notes:** None.

**Implementation note for `pre-edit-secrets.mjs`:** Build these regexes at runtime using `new RegExp('-{5}BEGIN RSA PRIVATE KEY-{5}')` rather than as regex literals, so the hook script itself is not flagged by secret scanners.

---

### 8. Database Connection Strings

#### PostgreSQL Connection String
- **ID:** `postgres-connstr`
- **Regex:** `postgres(?:ql)?://[^:]+:[^@]+@[^\s'"]+`
- **Description:** PostgreSQL connection URL with embedded credentials in the format `postgresql://user:password@host/db`.
- **Severity:** critical
- **False Positive Notes:** Matches any non-empty password portion. Skip if password segment is `${...}`, `<password>`, or `*`.

#### MongoDB Connection String
- **ID:** `mongodb-connstr`
- **Regex:** `mongodb(?:\+srv)?://[^:]+:[^@]+@[^\s'"]+`
- **Description:** MongoDB Atlas or local connection string with embedded username and password.
- **Severity:** critical
- **False Positive Notes:** Same exclusions as PostgreSQL.

#### MySQL / MariaDB Connection String
- **ID:** `mysql-connstr`
- **Regex:** `mysql(?:2)?://[^:]+:[^@]+@[^\s'"]+`
- **Description:** MySQL or MariaDB connection URL with credentials.
- **Severity:** critical
- **False Positive Notes:** Same exclusions as PostgreSQL.

#### Redis Connection String with Password
- **ID:** `redis-connstr`
- **Regex:** `redis://:[^@]+@[^\s'"]+`
- **Description:** Redis connection URL with password in the format `redis://:password@host`.
- **Severity:** high
- **False Positive Notes:** Passwordless Redis (`redis://host:6379`) does not match this pattern.

#### Generic JDBC Connection String with Password
- **ID:** `jdbc-connstr`
- **Regex:** `(?i)jdbc:[a-z]+://[^\s"']+;[Pp]assword=[^;\s"']+`
- **Description:** Java JDBC connection string with a `Password=` parameter.
- **Severity:** critical
- **False Positive Notes:** None if `Password=` is present with a non-empty value.

---

### 9. Passwords in Configuration

#### `password` Assignment
- **ID:** `config-password`
- **Regex:** `(?i)(?:^|[\s,;{(])\bpass(?:word|wd)?\s*[:=]\s*["']([^"'$<>{}\s]{6,})["']`
- **Description:** Password assignment in config files (YAML, TOML, JSON, .env, INI). Matches `password: "secret"`, `passwd=hunter2`, etc.
- **Severity:** high
- **False Positive Notes:** High false positive rate in documentation and test fixtures. Skip if value matches common placeholders: `your-password`, `changeme`, `example`, `test`, `placeholder`, `<...>`, `***`, `xxx`.

#### `secret` Key Assignment
- **ID:** `config-secret`
- **Regex:** `(?i)(?:^|[\s,;{(])\bsecret\b\s*[:=]\s*["']([^"'$<>{}\s]{8,})["']`
- **Description:** Generic `secret` key assignment in config or environment files. Django `SECRET_KEY` with a real value is a valid finding.
- **Severity:** high
- **False Positive Notes:** Same exclusions as `config-password`.

#### Sensitive Environment Variable Assignment
- **ID:** `dotenv-secret`
- **Regex:** `(?i)^(?:export\s+)?[A-Z][A-Z0-9_]*(?:SECRET|KEY|TOKEN|PASSWORD|PASSWD|CREDENTIAL|AUTH)[A-Z0-9_]*\s*=\s*(?!["']?\s*["']?)([A-Za-z0-9+/=\-_.@!#%^&*]{8,})`
- **Description:** Environment variable with a security-sensitive name (contains SECRET, KEY, TOKEN, PASSWORD, etc.) assigned a non-empty literal value. Matches `.env` file lines.
- **Severity:** high
- **False Positive Notes:** Variables pointing to file paths (e.g., `KEY_FILE=/etc/ssl/key.pem`) or URLs without credentials. Skip values that are all-uppercase (likely a variable reference like `${DATABASE_URL}`).

---

### 10. JWT Tokens

#### JWT Pattern
- **ID:** `jwt-token`
- **Regex:** `\beyJ[A-Za-z0-9\-_]+\.[A-Za-z0-9\-_]+\.[A-Za-z0-9\-_]+\b`
- **Description:** JSON Web Token in its three-part base64url format (`header.payload.signature`). The header always starts with `eyJ` (base64url encoding of `{"`).
- **Severity:** medium
- **False Positive Notes:** **High false positive rate.** JWTs are frequently used in tests, documentation, and mock data. Many JWTs are intentionally short-lived or scope-limited. Flag for human review rather than hard-blocking. Skip matches in files under `tests/`, `fixtures/`, `__mocks__/`, `*.test.*`, `*.spec.*`. Escalate to `critical` only if the payload segment decodes to contain an `exp` claim more than one year in the future.

---

## False Positive Suppression Rules

Apply these globally before reporting any match:

1. **Placeholder values** — Skip if the matched value contains: `your-`, `<`, `>`, `example`, `placeholder`, `replace`, `changeme`, `xxx`, `***`, `TODO`, `FIXME`
2. **Variable references** — Skip if the matched value contains: `${`, `$(`, `%{`, `ENV[`, `os.environ`
3. **Test files** — Lower severity by one level for matches in: `*.test.ts`, `*.spec.js`, `fixtures/`, `__mocks__/`, `testdata/`
4. **Documentation** — Lower severity for matches in: `*.md`, `*.txt`, `docs/`, `README*` — but never suppress `critical` patterns (PEM key headers, real AWS Access Key IDs)
5. **All-same-character values** — Skip if the value is a repetition of a single character (e.g., `xxxxxxxx`, `00000000`)
6. **Short values** — Skip generic patterns if the matched secret value is fewer than 8 characters

---

## Implementation Notes for `pre-edit-secrets.mjs`

```js
// Build PEM patterns at runtime to avoid triggering hook self-detection:
const PEM_RSA = new RegExp('-{5}BEGIN RSA PRIVATE KEY-{5}');
const PEM_GENERIC = new RegExp('-{5}BEGIN (?:EC|DSA|OPENSSH|ENCRYPTED) PRIVATE KEY-{5}');
const PEM_PKCS8 = new RegExp('-{5}BEGIN PRIVATE KEY-{5}');

const CRITICAL_PATTERNS = [
  { id: 'aws-access-key-id',    regex: /\bAKIA[0-9A-Z]{16}\b/g },
  { id: 'github-pat-classic',   regex: /\bghp_[A-Za-z0-9]{36}\b/g },
  { id: 'github-pat-fine',      regex: /\bgithub_pat_[A-Za-z0-9_]{82}\b/g },
  { id: 'npm-token',            regex: /\bnpm_[A-Za-z0-9]{36}\b/g },
  { id: 'openai-project-key',   regex: /\bsk-proj-[A-Za-z0-9\-_]{40,}\b/g },
  { id: 'anthropic-api-key',    regex: /\bsk-ant-api03-[A-Za-z0-9\-_]{93}\b/g },
  { id: 'rsa-private-key',      regex: PEM_RSA },
  { id: 'ec-private-key',       regex: PEM_GENERIC },
  { id: 'pkcs8-private-key',    regex: PEM_PKCS8 },
];

// Hard-block on any critical match:
for (const { id, regex } of CRITICAL_PATTERNS) {
  if (regex.test(fileContent)) {
    console.error(`BLOCKED: ${id} detected. Remove secret before editing.`);
    process.exit(2); // Non-zero exit blocks the Write/Edit tool use
  }
}
```

For `high`/`medium` severity patterns, emit a warning via `console.error` but exit with `0` (allow the operation to proceed with a visible warning).

---

## References

- [OWASP: Credential Stuffing](https://owasp.org/www-community/attacks/Credential_stuffing)
- [GitHub: Secret Scanning Patterns](https://docs.github.com/en/code-security/secret-scanning/secret-scanning-patterns)
- [Gitleaks Rule Definitions](https://github.com/gitleaks/gitleaks/blob/master/config/gitleaks.toml)
- [Trufflehog Detectors](https://github.com/trufflesecurity/trufflehog/tree/main/pkg/detectors)