diff --git a/plugins/llm-security/CLAUDE.md b/plugins/llm-security/CLAUDE.md index d80691a..74b96df 100644 --- a/plugins/llm-security/CLAUDE.md +++ b/plugins/llm-security/CLAUDE.md @@ -1,11 +1,11 @@ # LLM Security Plugin (v7.0.0) -Security scanning, auditing, and threat modeling for Claude Code projects. 5 frameworks: OWASP LLM Top 10, Agentic AI Top 10 (ASI), Skills Top 10 (AST), MCP Top 10, AI Agent Traps (DeepMind). 1487 tests. +Security scanning, auditing, and threat modeling for Claude Code projects. 5 frameworks: OWASP LLM Top 10, Agentic AI Top 10 (ASI), Skills Top 10 (AST), MCP Top 10, AI Agent Traps (DeepMind). 1511 unit and integration tests; mutation-testing coverage not published. -**v7.0.0 — Trustworthy scoring (BREAKING).** Three changes target the false-positive cascade on real codebases (hyperframes.com gave `BLOCK / Extreme / 100`, ~70% noise): +**v7.0.0 — Severity-dominated risk scoring (v2 model, BREAKING).** Three changes target the false-positive cascade on real codebases (hyperframes.com gave `BLOCK / Extreme / 100`, ~70% noise): 1. **Risk-score v2 formula** (`scanners/lib/severity.mjs`) — severity-dominated, log-scaled within tier. Replaces v1 sum-and-cap that collapsed every non-trivial scan to 100/Extreme. Tiers: critical → 70–95, high only → 40–65, medium only → 15–35, low only → 1–11. Verdict cutoffs realigned to new bands (BLOCK ≥65, WARNING ≥15). -2. **Context-aware entropy scanner** — file-extension skip (`.glsl/.frag/.vert/.shader/.wgsl/.css/.scss/.sass/.less/.svg/.min.*/.map`) + 8 new line-suppression rules (GLSL keywords, CSS-in-JS, inline SVG, ffmpeg `filter_complex`, User-Agent strings, SQL DDL, `throw new Error(\`...\`)`, markdown image URLs). Configurable via `.llm-security/policy.json` `entropy` section (thresholds, `suppress_extensions`, `suppress_line_patterns`, `suppress_paths`). Envelope `calibration` block reports skip counters + effective thresholds + policy source. +2. **Rule-based entropy scanner with file-extension skip, 8 line-level suppression rules, and configurable policy** — extensions skipped (`.glsl/.frag/.vert/.shader/.wgsl/.css/.scss/.sass/.less/.svg/.min.*/.map`); line-suppression rules (GLSL keywords, CSS-in-JS, inline SVG, ffmpeg `filter_complex`, User-Agent strings, SQL DDL, `throw new Error(\`...\`)`, markdown image URLs). Configurable via `.llm-security/policy.json` `entropy` section (thresholds, `suppress_extensions`, `suppress_line_patterns`, `suppress_paths`). Envelope `calibration` block reports skip counters + effective thresholds + policy source. 3. **DEP typosquat allowlist expansion** — 22 npm + 5 PyPI entries for short-name tools that tripped Levenshtein detection on every modern codebase (`knip`, `oxlint`, `tsx`, `nx`, `rimraf`, `uv`, `ruff`, etc.). See `docs/security-hardening-guide.md` §6 for the calibration story. @@ -21,7 +21,7 @@ See `docs/security-hardening-guide.md` §6 for the calibration story. | `/security plugin-audit [path\|url]` | Plugin trust assessment (local or GitHub URL) | | `/security mcp-audit [--live]` | MCP server config audit (add `--live` for runtime inspection) | | `/security mcp-inspect` | Live MCP server inspection — connect via JSON-RPC 2.0, scan tool descriptions | -| `/security ide-scan [target\|url]` | Scan installed VS Code + JetBrains extensions/plugins — OR fetch a remote VSIX from Marketplace, OpenVSX, or direct URL (v6.4.0), OR a JetBrains plugin from `plugins.jetbrains.com` (v6.6.0). 7 VS Code checks + 7 JetBrains-specific checks (theme-with-code, broad activation, Premain-Class instrumentation, native binaries, depends-chain, typosquat, shaded jars). Hardened ZIP extractor (zip-slip, symlink, bomb, ratio caps). Orchestrates reused scanners (UNI/ENT/NET/TNT/MEM/SCR) per extension. Offline by default, `--online` opt-in | +| `/security ide-scan [target\|url]` | Scan installed VS Code + JetBrains extensions/plugins — OR fetch a remote VSIX from Marketplace, OpenVSX, or direct URL (v6.4.0), OR a JetBrains plugin from `plugins.jetbrains.com` (v6.6.0). 7 VS Code checks + 7 JetBrains-specific checks (theme-with-code, broad activation, Premain-Class instrumentation, native binaries, depends-chain, typosquat, shaded jars). Hardened ZIP extractor (zip-slip, symlink, bomb, ratio caps — no fuzz-testing results published to date). Orchestrates reused scanners (UNI/ENT/NET/TNT/MEM/SCR) per extension. Offline by default, `--online` opt-in | | `/security posture` | Quick scorecard (13 categories) | | `/security threat-model` | Interactive STRIDE/MAESTRO session | | `/security diff [path]` | Compare scan against baseline — shows new/resolved/unchanged/moved | @@ -133,7 +133,7 @@ Subcommands: `scan`, `deep-scan`, `posture`, `audit-bom`, `benchmark`. Dispatche Pipeline templates in `ci/`: `github-action.yml`, `azure-pipelines.yml`, `gitlab-ci.yml`. Documentation: `docs/ci-cd-guide.md`. All templates use `--fail-on high --format sarif --output-file results.sarif` with SARIF upload per platform. -Standalone CLI makes zero network calls (except opt-in OSV.dev in supply-chain-recheck). Fully Schrems II compatible. +Standalone CLI makes zero network calls in default mode. Schrems II compatible in default offline mode. Optional OSV.dev enrichment (`supply-chain-recheck --online`) transmits package identifiers to a Google-operated API and is a separate compliance consideration. ## Knowledge Files (20) @@ -179,7 +179,7 @@ No persistent state except `post-session-guard.mjs` which maintains a per-sessio Prompt injection is **structurally unsolvable** with current architectures (joint paper, 14 researchers, 95-100% ASR against all 12 tested defenses). v5.0 does not claim to "prevent" injection. Instead, it implements **defense-in-depth**: - **Broader detection** — MEDIUM advisory for obfuscation signals (leetspeak, homoglyphs, zero-width, multi-language), Unicode Tag steganography, bash expansion evasion -- **Increased attack cost** — Rule of Two enforcement (configurable block/warn/off for lethal trifecta), bash normalization before gate matching +- **Increased attack cost** — Rule of Two detection (configurable block/warn/off for lethal trifecta; default `warn`, blocks on high-confidence trifectas in opt-in `block` mode; distributed trifectas across MCP servers are detected but not blocked by default), bash normalization before gate matching - **Longer monitoring windows** — 100-call long-horizon alongside 20-call sliding window, slow-burn trifecta detection, behavioral drift via Jensen-Shannon divergence - **Architectural constraints** — opportunistic byte-matching of truncated output fingerprints (first 200 bytes, SHA-256/16-hex tag; not semantic lineage; trivially bypassed by mutation or summarisation of tool output), sub-agent delegation tracking, HITL trap detection. Inspired by CaMeL (DeepMind, 2025), but this is a lightweight byte-fingerprint, not semantic capability tracking - **Honest documentation** — Known Limitations section acknowledges what deterministic hooks cannot detect diff --git a/plugins/llm-security/commands/ide-scan.md b/plugins/llm-security/commands/ide-scan.md index 6187ddd..4794892 100644 --- a/plugins/llm-security/commands/ide-scan.md +++ b/plugins/llm-security/commands/ide-scan.md @@ -39,7 +39,7 @@ Arguments (pass through as provided by the user): - `--fail-on ` — exit 1 if findings at/above severity URL mode notes: -- Hardened ZIP extractor with caps: 50MB compressed, 500MB uncompressed, 100x expansion ratio, 10 000 entries, depth 20. +- Hardened ZIP extractor with caps: 50MB compressed, 500MB uncompressed, 100x expansion ratio, 10 000 entries, depth 20. No fuzz-testing results published to date. - Rejects: zip-slip paths, symlink entries, absolute paths, drive letters, encrypted entries, ZIP64. - TLS verified, HTTPS only, 30s timeout. Cross-host redirects rejected. - Temp directory always cleaned up (success, error, abort). diff --git a/plugins/llm-security/docs/security-hardening-guide.md b/plugins/llm-security/docs/security-hardening-guide.md index 11c8c5d..eb7f698 100644 --- a/plugins/llm-security/docs/security-hardening-guide.md +++ b/plugins/llm-security/docs/security-hardening-guide.md @@ -98,6 +98,15 @@ recorded in the audit trail. techniques before the denylist gate runs. These are **defense-in-depth** layers that complement the Claude Code 2.1.98+ harness-level fixes, not a replacement. +The plugin's "defense-in-depth" claim resolves to **three independent detection +layers with documented bypass classes**: (1) the Claude Code harness denylist +(out of plugin scope, evolves with platform); (2) `bash-normalize.mjs` T1-T6 +collapse rules; (3) `pre-bash-destructive.mjs` post-normalization pattern match ++ `post-session-guard.mjs` runtime trifecta correlation. Each layer has known +bypasses (see Defense Philosophy in `CLAUDE.md` and `docs/critical-review-2026-04-20.md` +§4 for the evasion arsenal). Stacking layers raises attacker cost; it does not +provide formal worst-case guarantees. + | Layer | Technique | Example | Normalization | |-------|-----------|---------|---------------| | T1 | Empty quotes | `rm''-rf /` | strip `''` / `""` between tokens | diff --git a/plugins/llm-security/knowledge/mitigation-matrix.md b/plugins/llm-security/knowledge/mitigation-matrix.md index 753d934..203eee3 100644 --- a/plugins/llm-security/knowledge/mitigation-matrix.md +++ b/plugins/llm-security/knowledge/mitigation-matrix.md @@ -32,7 +32,7 @@ Attacker injects instructions via external content (files, web pages, tool outpu | Prompt injection input scanning | Automated | `pre-prompt-inject-scan.mjs` detects CRITICAL/HIGH/MEDIUM injection patterns in user prompts | Hook file exists; MEDIUM advisory enabled | | Unicode Tag steganography detection | Automated | `string-utils.mjs` decodes U+E0000-E007F tags; `injection-patterns.mjs` escalates to CRITICAL/HIGH | `decodeUnicodeTags()` in normalization pipeline | | Bash evasion normalization | Automated | `bash-normalize.mjs` strips parameter expansion before pattern matching | `normalizeBashExpansion()` called by both bash hooks | -| Rule of Two enforcement | Automated | `post-session-guard.mjs` detects trifecta (untrusted input + sensitive data + exfil) | `LLM_SECURITY_TRIFECTA_MODE` env var respected; block mode available | +| Rule of Two detection (block-mode opt-in) | Automated | `post-session-guard.mjs` detects trifecta (untrusted input + sensitive data + exfil); blocks only when `LLM_SECURITY_TRIFECTA_MODE=block` AND high-confidence trifecta is observed; default `warn` | `LLM_SECURITY_TRIFECTA_MODE` env var respected; block mode opt-in | | Long-horizon monitoring | Automated | `post-session-guard.mjs` 100-call window + behavioral drift detection | Long-horizon window active alongside 20-call window | | HITL trap detection | Automated | `injection-patterns.mjs` HIGH patterns for approval urgency, summary suppression, scope minimization | HITL patterns present in HIGH_PATTERNS array | | Hybrid attack detection | Automated | `injection-patterns.mjs` HYBRID_PATTERNS for P2SQL, recursive injection, XSS | Hybrid patterns checked in tool output scanning |