Kjell Tore Guttormsen f418a8fe08 feat(llm-security-copilot): port llm-security v5.1.0 to GitHub Copilot CLI

Full port of llm-security plugin for internal use on Windows with GitHub
Copilot CLI. Protocol translation layer (copilot-hook-runner.mjs)
normalizes Copilot camelCase I/O to Claude Code snake_case format — all
original hook scripts run unmodified.

- 8 hooks with protocol translation (stdin/stdout/exit code)
- 18 SKILL.md skills (Agent Skills Open Standard)
- 6 .agent.md agent definitions
- 20 scanners + 14 scanner lib modules (unchanged)
- 14 knowledge files (unchanged)
- 39 test files including copilot-port-verify.mjs (17 tests)
- Windows-ready: node:path, os.tmpdir(), process.execPath, no bash

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-04-09 21:56:10 +02:00

16 KiB

Raw Blame History

Secrets Detection Patterns

Usage

These patterns are used by:

pre-edit-secrets.mjs hook — blocks Write/Edit operations containing secrets before they reach disk
skill-scanner-agent — flags skills and commands that hardcode or expose secrets

Patterns are JavaScript-compatible regex strings. Apply with the g (global) and i (case-insensitive) flags unless noted otherwise.

Pattern Format

Each pattern includes:

id: Unique identifier for logging and suppression
regex: JavaScript-compatible regex (string form, apply with new RegExp(...))
description: What it detects
severity: critical / high / medium / low
false_positive_notes: When this pattern might false-match

Patterns

1. AWS

AWS Access Key ID

ID: aws-access-key-id
Regex: \bAKIA[0-9A-Z]{16}\b
Description: AWS Access Key ID. Always starts with AKIA followed by 16 uppercase alphanumeric characters.
Severity: critical
False Positive Notes: None — this prefix+length combination is highly specific to AWS. No known false positives in practice.

AWS Secret Access Key

ID: aws-secret-access-key
Regex: (?i)aws[_\-\s.]*secret[_\-\s.]*(?:access[_\-\s.]*)?key["'\s]*[:=]["'\s]*([A-Za-z0-9/+]{40})
Description: AWS Secret Access Key — 40-character base64 string following a label like aws_secret_key, AWS_SECRET_ACCESS_KEY, etc.
Severity: critical
False Positive Notes: Generic 40-char base64 strings can appear in other contexts. Require the aws + secret label context.

AWS Session Token

ID: aws-session-token
Regex: (?i)aws[_\-\s.]*session[_\-\s.]*token["'\s]*[:=]["'\s]*([A-Za-z0-9/+=]{100,})
Description: Temporary AWS session token (STS). Much longer than access keys — typically 200-400 characters.
Severity: critical
False Positive Notes: Long base64 blobs in unrelated contexts (e.g., test fixtures, encoded images). Require the session_token label.

2. Azure

Azure Storage Account Key

ID: azure-storage-key
Regex: (?i)AccountKey=([A-Za-z0-9+/]{86}==)
Description: Azure Storage Account key embedded in a connection string. Always exactly 88 characters ending in ==.
Severity: critical
False Positive Notes: None — the AccountKey= prefix plus exact length is highly specific.

Azure Storage Connection String

ID: azure-storage-connstr
Regex: DefaultEndpointsProtocol=https?;AccountName=[^;]+;AccountKey=[A-Za-z0-9+/]{86}==
Description: Full Azure Storage connection string including account name and key.
Severity: critical
False Positive Notes: None.

Azure SAS Token

ID: azure-sas-token
Regex: (?i)(?:sv|sig|se|sp|spr|srt)=[A-Za-z0-9%+/=&]{10,}(?:&(?:sv|sig|se|sp|spr|srt)=[A-Za-z0-9%+/=&]{1,}){3,}
Description: Azure Shared Access Signature (SAS) token — URL query string containing multiple SAS parameters.
Severity: high
False Positive Notes: URL-encoded query strings with similar parameter names. Require at least 4 distinct SAS parameters (sv, sig, se, sp).

Azure Client Secret

ID: azure-client-secret
Regex: (?i)client[_\-]?secret["'\s]*[:=]["'\s]*([A-Za-z0-9~._\-]{34,40})
Description: Azure AD / Entra ID application client secret — 34-40 character alphanumeric string.
Severity: critical
False Positive Notes: Generic password fields with similar length. Always flag and require human review.

Azure Service Bus Connection String

ID: azure-servicebus-connstr
Regex: Endpoint=sb://[^;]+;SharedAccessKeyName=[^;]+;SharedAccessKey=[A-Za-z0-9+/=]{43}=
Description: Azure Service Bus connection string with shared access key.
Severity: critical
False Positive Notes: None — format is highly specific.

3. Google Cloud Platform

GCP API Key

ID: gcp-api-key
Regex: \bAIza[0-9A-Za-z_\-]{35}\b
Description: Google Cloud / Firebase API key. Always starts with AIza followed by 35 alphanumeric characters.
Severity: high
False Positive Notes: None — prefix is specific. Note: GCP API keys have varying scopes; some are safe to expose (browser-restricted keys), but flag all for review.

GCP Service Account JSON Marker

ID: gcp-service-account-json
Regex: "type"\s*:\s*"service_account"
Description: Google Cloud service account JSON credential file marker. The presence of this key indicates a full service account credential object.
Severity: critical
False Positive Notes: Only matches within JSON credential blobs. If found alongside private_key, treat as confirmed credential leak.

4. GitHub

GitHub Personal Access Token (Classic)

ID: github-pat-classic
Regex: \bghp_[A-Za-z0-9]{36}\b
Description: GitHub classic personal access token (PAT). Prefix ghp_ followed by exactly 36 alphanumeric characters.
Severity: critical
False Positive Notes: None — prefix is specific to GitHub.

GitHub Fine-Grained Personal Access Token

ID: github-pat-fine-grained
Regex: \bgithub_pat_[A-Za-z0-9_]{82}\b
Description: GitHub fine-grained PAT introduced in 2022. Longer and more structured than classic PATs.
Severity: critical
False Positive Notes: None.

GitHub OAuth Token

ID: github-oauth-token
Regex: \bgho_[A-Za-z0-9]{36}\b
Description: GitHub OAuth access token issued via OAuth app flow.
Severity: critical
False Positive Notes: None.

GitHub Actions / Server Token

ID: github-server-token
Regex: \bghs_[A-Za-z0-9]{36}\b
Description: GitHub Apps installation token or Actions runner token.
Severity: high
False Positive Notes: None.

5. npm

npm Automation / Publish Token

ID: npm-token
Regex: \bnpm_[A-Za-z0-9]{36}\b
Description: npm registry automation or publish token. Prefix npm_ followed by 36 alphanumeric characters.
Severity: critical
False Positive Notes: None — prefix is specific to npm tokens issued after 2021. Older tokens in .npmrc are caught by the legacy pattern below.

npm Legacy Auth Token (.npmrc)

ID: npm-legacy-auth
Regex: //registry\.npmjs\.org/:_authToken\s*=\s*([a-f0-9\-]{36,})
Description: Legacy npm authentication token in .npmrc format.
Severity: critical
False Positive Notes: None.

6. Generic API Keys and Authorization Headers

Bearer Token in Authorization Header

ID: bearer-token
Regex: (?i)Authorization\s*[:=]\s*["']?Bearer\s+([A-Za-z0-9\-._~+/]+=*)\b
Description: HTTP Authorization header with Bearer scheme. Common in hardcoded fetch/axios calls.
Severity: high
False Positive Notes: High false positive rate when the value is a variable reference like Bearer ${token} or Bearer <your-token>. Skip matches containing $, <, >, or {.

Generic `api_key` / `api-key` Assignment

ID: generic-api-key
Regex: (?i)\bapi[_\-]?key\s*[:=]\s*["']([A-Za-z0-9\-._]{16,64})["']
Description: Generic API key assignment in config files, source code, or environment exports.
Severity: high
False Positive Notes: Placeholder values like your-api-key-here, <API_KEY>, REPLACE_ME, xxx.... Skip matches where the value is all-same-character or contains angle brackets.

OpenAI API Key (Legacy Format)

ID: openai-api-key-legacy
Regex: \bsk-[A-Za-z0-9]{20}T3BlbkFJ[A-Za-z0-9]{20}\b
Description: OpenAI API key in the legacy format. The substring T3BlbkFJ is base64 for OpenAI.
Severity: critical
False Positive Notes: None for the legacy format.

OpenAI Project-Scoped Key

ID: openai-project-key
Regex: \bsk-proj-[A-Za-z0-9\-_]{40,}\b
Description: OpenAI project-scoped API key introduced in 2024.
Severity: critical
False Positive Notes: None.

Anthropic API Key

ID: anthropic-api-key
Regex: \bsk-ant-api03-[A-Za-z0-9\-_]{93}\b
Description: Anthropic Claude API key.
Severity: critical
False Positive Notes: None — prefix plus exact length is highly specific.

7. Private Keys (PEM Format)

PEM header patterns detect private key material. The regex patterns below use escaped hyphens so they match the literal PEM markers in files at scan time.

RSA Private Key Header

ID: rsa-private-key
Regex: -{5}BEGIN RSA PRIVATE KEY-{5}
Description: PEM-encoded RSA private key. The header alone is sufficient to flag — do not require the full key body.
Severity: critical
False Positive Notes: Test fixtures and documentation examples sometimes include truncated PEM blocks. Flag regardless — a truncated key in committed code still indicates a process failure.

EC / DSA / OpenSSH Private Key Header

ID: ec-private-key
Regex: -{5}BEGIN (?:EC|DSA|OPENSSH|ENCRYPTED) PRIVATE KEY-{5}
Description: PEM-encoded elliptic curve, DSA, or OpenSSH private key.
Severity: critical
False Positive Notes: Same as RSA — flag all occurrences.

PKCS#8 Private Key Header

ID: pkcs8-private-key
Regex: -{5}BEGIN PRIVATE KEY-{5}
Description: PKCS#8 encoded private key (format-agnostic, covers RSA, EC, etc.).
Severity: critical
False Positive Notes: None.

Implementation note for pre-edit-secrets.mjs: Build these regexes at runtime using new RegExp('-{5}BEGIN RSA PRIVATE KEY-{5}') rather than as regex literals, so the hook script itself is not flagged by secret scanners.

8. Database Connection Strings

PostgreSQL Connection String

ID: postgres-connstr
Regex: postgres(?:ql)?://[^:]+:[^@]+@[^\s'"]+
Description: PostgreSQL connection URL with embedded credentials in the format postgresql://user:password@host/db.
Severity: critical
False Positive Notes: Matches any non-empty password portion. Skip if password segment is ${...}, <password>, or *.

MongoDB Connection String

ID: mongodb-connstr
Regex: mongodb(?:\+srv)?://[^:]+:[^@]+@[^\s'"]+
Description: MongoDB Atlas or local connection string with embedded username and password.
Severity: critical
False Positive Notes: Same exclusions as PostgreSQL.

MySQL / MariaDB Connection String

ID: mysql-connstr
Regex: mysql(?:2)?://[^:]+:[^@]+@[^\s'"]+
Description: MySQL or MariaDB connection URL with credentials.
Severity: critical
False Positive Notes: Same exclusions as PostgreSQL.

Redis Connection String with Password

ID: redis-connstr
Regex: redis://:[^@]+@[^\s'"]+
Description: Redis connection URL with password in the format redis://:password@host.
Severity: high
False Positive Notes: Passwordless Redis (redis://host:6379) does not match this pattern.

Generic JDBC Connection String with Password

ID: jdbc-connstr
Regex: (?i)jdbc:[a-z]+://[^\s"']+;[Pp]assword=[^;\s"']+
Description: Java JDBC connection string with a Password= parameter.
Severity: critical
False Positive Notes: None if Password= is present with a non-empty value.

9. Passwords in Configuration

`password` Assignment

ID: config-password
Regex: (?i)(?:^|[\s,;{(])\bpass(?:word|wd)?\s*[:=]\s*["']([^"'$<>{}\s]{6,})["']
Description: Password assignment in config files (YAML, TOML, JSON, .env, INI). Matches password: "secret", passwd=hunter2, etc.
Severity: high
False Positive Notes: High false positive rate in documentation and test fixtures. Skip if value matches common placeholders: your-password, changeme, example, test, placeholder, <...>, ***, xxx.

`secret` Key Assignment

ID: config-secret
Regex: (?i)(?:^|[\s,;{(])\bsecret\b\s*[:=]\s*["']([^"'$<>{}\s]{8,})["']
Description: Generic secret key assignment in config or environment files. Django SECRET_KEY with a real value is a valid finding.
Severity: high
False Positive Notes: Same exclusions as config-password.

Sensitive Environment Variable Assignment

ID: dotenv-secret
Regex: (?i)^(?:export\s+)?[A-Z][A-Z0-9_]*(?:SECRET|KEY|TOKEN|PASSWORD|PASSWD|CREDENTIAL|AUTH)[A-Z0-9_]*\s*=\s*(?!["']?\s*["']?)([A-Za-z0-9+/=\-_.@!#%^&*]{8,})
Description: Environment variable with a security-sensitive name (contains SECRET, KEY, TOKEN, PASSWORD, etc.) assigned a non-empty literal value. Matches .env file lines.
Severity: high
False Positive Notes: Variables pointing to file paths (e.g., KEY_FILE=/etc/ssl/key.pem) or URLs without credentials. Skip values that are all-uppercase (likely a variable reference like ${DATABASE_URL}).

10. JWT Tokens

JWT Pattern

ID: jwt-token
Regex: \beyJ[A-Za-z0-9\-_]+\.[A-Za-z0-9\-_]+\.[A-Za-z0-9\-_]+\b
Description: JSON Web Token in its three-part base64url format (header.payload.signature). The header always starts with eyJ (base64url encoding of {").
Severity: medium
False Positive Notes: High false positive rate. JWTs are frequently used in tests, documentation, and mock data. Many JWTs are intentionally short-lived or scope-limited. Flag for human review rather than hard-blocking. Skip matches in files under tests/, fixtures/, __mocks__/, *.test.*, *.spec.*. Escalate to critical only if the payload segment decodes to contain an exp claim more than one year in the future.

False Positive Suppression Rules

Apply these globally before reporting any match:

Placeholder values — Skip if the matched value contains: your-, <, >, example, placeholder, replace, changeme, xxx, ***, TODO, FIXME
Variable references — Skip if the matched value contains: ${, $(, %{, ENV[, os.environ
Test files — Lower severity by one level for matches in: *.test.ts, *.spec.js, fixtures/, __mocks__/, testdata/
Documentation — Lower severity for matches in: *.md, *.txt, docs/, README* — but never suppress critical patterns (PEM key headers, real AWS Access Key IDs)
All-same-character values — Skip if the value is a repetition of a single character (e.g., xxxxxxxx, 00000000)
Short values — Skip generic patterns if the matched secret value is fewer than 8 characters

Implementation Notes for `pre-edit-secrets.mjs`

// Build PEM patterns at runtime to avoid triggering hook self-detection:
const PEM_RSA = new RegExp('-{5}BEGIN RSA PRIVATE KEY-{5}');
const PEM_GENERIC = new RegExp('-{5}BEGIN (?:EC|DSA|OPENSSH|ENCRYPTED) PRIVATE KEY-{5}');
const PEM_PKCS8 = new RegExp('-{5}BEGIN PRIVATE KEY-{5}');

const CRITICAL_PATTERNS = [
  { id: 'aws-access-key-id',    regex: /\bAKIA[0-9A-Z]{16}\b/g },
  { id: 'github-pat-classic',   regex: /\bghp_[A-Za-z0-9]{36}\b/g },
  { id: 'github-pat-fine',      regex: /\bgithub_pat_[A-Za-z0-9_]{82}\b/g },
  { id: 'npm-token',            regex: /\bnpm_[A-Za-z0-9]{36}\b/g },
  { id: 'openai-project-key',   regex: /\bsk-proj-[A-Za-z0-9\-_]{40,}\b/g },
  { id: 'anthropic-api-key',    regex: /\bsk-ant-api03-[A-Za-z0-9\-_]{93}\b/g },
  { id: 'rsa-private-key',      regex: PEM_RSA },
  { id: 'ec-private-key',       regex: PEM_GENERIC },
  { id: 'pkcs8-private-key',    regex: PEM_PKCS8 },
];

// Hard-block on any critical match:
for (const { id, regex } of CRITICAL_PATTERNS) {
  if (regex.test(fileContent)) {
    console.error(`BLOCKED: ${id} detected. Remove secret before editing.`);
    process.exit(2); // Non-zero exit blocks the Write/Edit tool use
  }
}

For high/medium severity patterns, emit a warning via console.error but exit with 0 (allow the operation to proceed with a visible warning).

16 KiB Raw Blame History