From a46308b1e9396e5a97a20ff1315b6ca38827b9e1 Mon Sep 17 00:00:00 2001 From: Kjell Tore Guttormsen Date: Wed, 29 Apr 2026 11:52:55 +0200 Subject: [PATCH] =?UTF-8?q?docs(llm-security):=20A3=20honesty-sweep=20?= =?UTF-8?q?=E2=80=94=207=20sitater=20nedtonet=20(critical-review=20=C2=A79?= =?UTF-8?q?)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Closes A3 of v7.1.0 critical-review patch. Each rewrite preserves the underlying claim where it is accurate but removes hype/overreach language. Historical CHANGELOG/README version-table rows are intentionally left as-is (they document what was claimed at the time of release, not what is true today). Changes (CLAUDE.md, commands/ide-scan.md, knowledge/mitigation-matrix.md, docs/security-hardening-guide.md): - "Trustworthy scoring (BREAKING)" → "Severity-dominated risk scoring (v2 model, BREAKING)". Removes hype framing; describes the actual mechanism. - "Context-aware entropy scanner" → "Rule-based entropy scanner with file-extension skip, 8 line-level suppression rules, and configurable policy". No ML/context inference; just rules. - "1487 tests" → "1511 unit and integration tests; mutation-testing coverage not published". Updated count after A1+A2 (+24) and added qualifier. - "Fully Schrems II compatible" → "Schrems II compatible in default offline mode. Optional OSV.dev enrichment (`supply-chain-recheck --online`) transmits package identifiers to a Google-operated API and is a separate compliance consideration." Acknowledges the OSV.dev opt-in caveat. - "Rule of Two enforcement" → "Rule of Two detection (configurable; default warn; blocks on high-confidence trifectas in opt-in `block` mode; distributed trifectas detected but not blocked by default)". "Enforcement" implied block; default is warn. - "Hardened ZIP extractor" → suffix " — no fuzz-testing results published to date". Caps and class-of-attacks rejected are accurate; absence of formal fuzz coverage now stated. - "defense-in-depth" — preserved as framing, but quantified in security-hardening-guide §4: "three independent detection layers with documented bypass classes". Each layer named, each layer's known bypasses pointed to (critical-review §4 evasion arsenal). Tests: 1511/1511 green (no behavioural change). --- plugins/llm-security/CLAUDE.md | 12 ++++++------ plugins/llm-security/commands/ide-scan.md | 2 +- .../llm-security/docs/security-hardening-guide.md | 9 +++++++++ plugins/llm-security/knowledge/mitigation-matrix.md | 2 +- 4 files changed, 17 insertions(+), 8 deletions(-) diff --git a/plugins/llm-security/CLAUDE.md b/plugins/llm-security/CLAUDE.md index d80691a..74b96df 100644 --- a/plugins/llm-security/CLAUDE.md +++ b/plugins/llm-security/CLAUDE.md @@ -1,11 +1,11 @@ # LLM Security Plugin (v7.0.0) -Security scanning, auditing, and threat modeling for Claude Code projects. 5 frameworks: OWASP LLM Top 10, Agentic AI Top 10 (ASI), Skills Top 10 (AST), MCP Top 10, AI Agent Traps (DeepMind). 1487 tests. +Security scanning, auditing, and threat modeling for Claude Code projects. 5 frameworks: OWASP LLM Top 10, Agentic AI Top 10 (ASI), Skills Top 10 (AST), MCP Top 10, AI Agent Traps (DeepMind). 1511 unit and integration tests; mutation-testing coverage not published. -**v7.0.0 — Trustworthy scoring (BREAKING).** Three changes target the false-positive cascade on real codebases (hyperframes.com gave `BLOCK / Extreme / 100`, ~70% noise): +**v7.0.0 — Severity-dominated risk scoring (v2 model, BREAKING).** Three changes target the false-positive cascade on real codebases (hyperframes.com gave `BLOCK / Extreme / 100`, ~70% noise): 1. **Risk-score v2 formula** (`scanners/lib/severity.mjs`) — severity-dominated, log-scaled within tier. Replaces v1 sum-and-cap that collapsed every non-trivial scan to 100/Extreme. Tiers: critical → 70–95, high only → 40–65, medium only → 15–35, low only → 1–11. Verdict cutoffs realigned to new bands (BLOCK ≥65, WARNING ≥15). -2. **Context-aware entropy scanner** — file-extension skip (`.glsl/.frag/.vert/.shader/.wgsl/.css/.scss/.sass/.less/.svg/.min.*/.map`) + 8 new line-suppression rules (GLSL keywords, CSS-in-JS, inline SVG, ffmpeg `filter_complex`, User-Agent strings, SQL DDL, `throw new Error(\`...\`)`, markdown image URLs). Configurable via `.llm-security/policy.json` `entropy` section (thresholds, `suppress_extensions`, `suppress_line_patterns`, `suppress_paths`). Envelope `calibration` block reports skip counters + effective thresholds + policy source. +2. **Rule-based entropy scanner with file-extension skip, 8 line-level suppression rules, and configurable policy** — extensions skipped (`.glsl/.frag/.vert/.shader/.wgsl/.css/.scss/.sass/.less/.svg/.min.*/.map`); line-suppression rules (GLSL keywords, CSS-in-JS, inline SVG, ffmpeg `filter_complex`, User-Agent strings, SQL DDL, `throw new Error(\`...\`)`, markdown image URLs). Configurable via `.llm-security/policy.json` `entropy` section (thresholds, `suppress_extensions`, `suppress_line_patterns`, `suppress_paths`). Envelope `calibration` block reports skip counters + effective thresholds + policy source. 3. **DEP typosquat allowlist expansion** — 22 npm + 5 PyPI entries for short-name tools that tripped Levenshtein detection on every modern codebase (`knip`, `oxlint`, `tsx`, `nx`, `rimraf`, `uv`, `ruff`, etc.). See `docs/security-hardening-guide.md` §6 for the calibration story. @@ -21,7 +21,7 @@ See `docs/security-hardening-guide.md` §6 for the calibration story. | `/security plugin-audit [path\|url]` | Plugin trust assessment (local or GitHub URL) | | `/security mcp-audit [--live]` | MCP server config audit (add `--live` for runtime inspection) | | `/security mcp-inspect` | Live MCP server inspection — connect via JSON-RPC 2.0, scan tool descriptions | -| `/security ide-scan [target\|url]` | Scan installed VS Code + JetBrains extensions/plugins — OR fetch a remote VSIX from Marketplace, OpenVSX, or direct URL (v6.4.0), OR a JetBrains plugin from `plugins.jetbrains.com` (v6.6.0). 7 VS Code checks + 7 JetBrains-specific checks (theme-with-code, broad activation, Premain-Class instrumentation, native binaries, depends-chain, typosquat, shaded jars). Hardened ZIP extractor (zip-slip, symlink, bomb, ratio caps). Orchestrates reused scanners (UNI/ENT/NET/TNT/MEM/SCR) per extension. Offline by default, `--online` opt-in | +| `/security ide-scan [target\|url]` | Scan installed VS Code + JetBrains extensions/plugins — OR fetch a remote VSIX from Marketplace, OpenVSX, or direct URL (v6.4.0), OR a JetBrains plugin from `plugins.jetbrains.com` (v6.6.0). 7 VS Code checks + 7 JetBrains-specific checks (theme-with-code, broad activation, Premain-Class instrumentation, native binaries, depends-chain, typosquat, shaded jars). Hardened ZIP extractor (zip-slip, symlink, bomb, ratio caps — no fuzz-testing results published to date). Orchestrates reused scanners (UNI/ENT/NET/TNT/MEM/SCR) per extension. Offline by default, `--online` opt-in | | `/security posture` | Quick scorecard (13 categories) | | `/security threat-model` | Interactive STRIDE/MAESTRO session | | `/security diff [path]` | Compare scan against baseline — shows new/resolved/unchanged/moved | @@ -133,7 +133,7 @@ Subcommands: `scan`, `deep-scan`, `posture`, `audit-bom`, `benchmark`. Dispatche Pipeline templates in `ci/`: `github-action.yml`, `azure-pipelines.yml`, `gitlab-ci.yml`. Documentation: `docs/ci-cd-guide.md`. All templates use `--fail-on high --format sarif --output-file results.sarif` with SARIF upload per platform. -Standalone CLI makes zero network calls (except opt-in OSV.dev in supply-chain-recheck). Fully Schrems II compatible. +Standalone CLI makes zero network calls in default mode. Schrems II compatible in default offline mode. Optional OSV.dev enrichment (`supply-chain-recheck --online`) transmits package identifiers to a Google-operated API and is a separate compliance consideration. ## Knowledge Files (20) @@ -179,7 +179,7 @@ No persistent state except `post-session-guard.mjs` which maintains a per-sessio Prompt injection is **structurally unsolvable** with current architectures (joint paper, 14 researchers, 95-100% ASR against all 12 tested defenses). v5.0 does not claim to "prevent" injection. Instead, it implements **defense-in-depth**: - **Broader detection** — MEDIUM advisory for obfuscation signals (leetspeak, homoglyphs, zero-width, multi-language), Unicode Tag steganography, bash expansion evasion -- **Increased attack cost** — Rule of Two enforcement (configurable block/warn/off for lethal trifecta), bash normalization before gate matching +- **Increased attack cost** — Rule of Two detection (configurable block/warn/off for lethal trifecta; default `warn`, blocks on high-confidence trifectas in opt-in `block` mode; distributed trifectas across MCP servers are detected but not blocked by default), bash normalization before gate matching - **Longer monitoring windows** — 100-call long-horizon alongside 20-call sliding window, slow-burn trifecta detection, behavioral drift via Jensen-Shannon divergence - **Architectural constraints** — opportunistic byte-matching of truncated output fingerprints (first 200 bytes, SHA-256/16-hex tag; not semantic lineage; trivially bypassed by mutation or summarisation of tool output), sub-agent delegation tracking, HITL trap detection. Inspired by CaMeL (DeepMind, 2025), but this is a lightweight byte-fingerprint, not semantic capability tracking - **Honest documentation** — Known Limitations section acknowledges what deterministic hooks cannot detect diff --git a/plugins/llm-security/commands/ide-scan.md b/plugins/llm-security/commands/ide-scan.md index 6187ddd..4794892 100644 --- a/plugins/llm-security/commands/ide-scan.md +++ b/plugins/llm-security/commands/ide-scan.md @@ -39,7 +39,7 @@ Arguments (pass through as provided by the user): - `--fail-on ` — exit 1 if findings at/above severity URL mode notes: -- Hardened ZIP extractor with caps: 50MB compressed, 500MB uncompressed, 100x expansion ratio, 10 000 entries, depth 20. +- Hardened ZIP extractor with caps: 50MB compressed, 500MB uncompressed, 100x expansion ratio, 10 000 entries, depth 20. No fuzz-testing results published to date. - Rejects: zip-slip paths, symlink entries, absolute paths, drive letters, encrypted entries, ZIP64. - TLS verified, HTTPS only, 30s timeout. Cross-host redirects rejected. - Temp directory always cleaned up (success, error, abort). diff --git a/plugins/llm-security/docs/security-hardening-guide.md b/plugins/llm-security/docs/security-hardening-guide.md index 11c8c5d..eb7f698 100644 --- a/plugins/llm-security/docs/security-hardening-guide.md +++ b/plugins/llm-security/docs/security-hardening-guide.md @@ -98,6 +98,15 @@ recorded in the audit trail. techniques before the denylist gate runs. These are **defense-in-depth** layers that complement the Claude Code 2.1.98+ harness-level fixes, not a replacement. +The plugin's "defense-in-depth" claim resolves to **three independent detection +layers with documented bypass classes**: (1) the Claude Code harness denylist +(out of plugin scope, evolves with platform); (2) `bash-normalize.mjs` T1-T6 +collapse rules; (3) `pre-bash-destructive.mjs` post-normalization pattern match ++ `post-session-guard.mjs` runtime trifecta correlation. Each layer has known +bypasses (see Defense Philosophy in `CLAUDE.md` and `docs/critical-review-2026-04-20.md` +§4 for the evasion arsenal). Stacking layers raises attacker cost; it does not +provide formal worst-case guarantees. + | Layer | Technique | Example | Normalization | |-------|-----------|---------|---------------| | T1 | Empty quotes | `rm''-rf /` | strip `''` / `""` between tokens | diff --git a/plugins/llm-security/knowledge/mitigation-matrix.md b/plugins/llm-security/knowledge/mitigation-matrix.md index 753d934..203eee3 100644 --- a/plugins/llm-security/knowledge/mitigation-matrix.md +++ b/plugins/llm-security/knowledge/mitigation-matrix.md @@ -32,7 +32,7 @@ Attacker injects instructions via external content (files, web pages, tool outpu | Prompt injection input scanning | Automated | `pre-prompt-inject-scan.mjs` detects CRITICAL/HIGH/MEDIUM injection patterns in user prompts | Hook file exists; MEDIUM advisory enabled | | Unicode Tag steganography detection | Automated | `string-utils.mjs` decodes U+E0000-E007F tags; `injection-patterns.mjs` escalates to CRITICAL/HIGH | `decodeUnicodeTags()` in normalization pipeline | | Bash evasion normalization | Automated | `bash-normalize.mjs` strips parameter expansion before pattern matching | `normalizeBashExpansion()` called by both bash hooks | -| Rule of Two enforcement | Automated | `post-session-guard.mjs` detects trifecta (untrusted input + sensitive data + exfil) | `LLM_SECURITY_TRIFECTA_MODE` env var respected; block mode available | +| Rule of Two detection (block-mode opt-in) | Automated | `post-session-guard.mjs` detects trifecta (untrusted input + sensitive data + exfil); blocks only when `LLM_SECURITY_TRIFECTA_MODE=block` AND high-confidence trifecta is observed; default `warn` | `LLM_SECURITY_TRIFECTA_MODE` env var respected; block mode opt-in | | Long-horizon monitoring | Automated | `post-session-guard.mjs` 100-call window + behavioral drift detection | Long-horizon window active alongside 20-call window | | HITL trap detection | Automated | `injection-patterns.mjs` HIGH patterns for approval urgency, summary suppression, scope minimization | HITL patterns present in HIGH_PATTERNS array | | Hybrid attack detection | Automated | `injection-patterns.mjs` HYBRID_PATTERNS for P2SQL, recursive injection, XSS | Hybrid patterns checked in tool output scanning |