docs(llm-security): A3 honesty-sweep — 7 sitater nedtonet (critical-review §9)

Closes A3 of v7.1.0 critical-review patch. Each rewrite preserves the underlying claim where it is accurate but removes hype/overreach language. Historical CHANGELOG/README version-table rows are intentionally left as-is (they document what was claimed at the time of release, not what is true today). Changes (CLAUDE.md, commands/ide-scan.md, knowledge/mitigation-matrix.md, docs/security-hardening-guide.md): - "Trustworthy scoring (BREAKING)" → "Severity-dominated risk scoring (v2 model, BREAKING)". Removes hype framing; describes the actual mechanism. - "Context-aware entropy scanner" → "Rule-based entropy scanner with file-extension skip, 8 line-level suppression rules, and configurable policy". No ML/context inference; just rules. - "1487 tests" → "1511 unit and integration tests; mutation-testing coverage not published". Updated count after A1+A2 (+24) and added qualifier. - "Fully Schrems II compatible" → "Schrems II compatible in default offline mode. Optional OSV.dev enrichment (`supply-chain-recheck --online`) transmits package identifiers to a Google-operated API and is a separate compliance consideration." Acknowledges the OSV.dev opt-in caveat. - "Rule of Two enforcement" → "Rule of Two detection (configurable; default warn; blocks on high-confidence trifectas in opt-in `block` mode; distributed trifectas detected but not blocked by default)". "Enforcement" implied block; default is warn. - "Hardened ZIP extractor" → suffix " — no fuzz-testing results published to date". Caps and class-of-attacks rejected are accurate; absence of formal fuzz coverage now stated. - "defense-in-depth" — preserved as framing, but quantified in security-hardening-guide §4: "three independent detection layers with documented bypass classes". Each layer named, each layer's known bypasses pointed to (critical-review §4 evasion arsenal). Tests: 1511/1511 green (no behavioural change).
2026-04-29 11:52:55 +02:00 · 2026-04-29 11:52:55 +02:00 · a46308b1e9
commit a46308b1e9
parent 4aa5318bcb
4 changed files with 17 additions and 8 deletions
--- a/plugins/llm-security/CLAUDE.md
+++ b/plugins/llm-security/CLAUDE.md
@ -1,11 +1,11 @@
 # LLM Security Plugin (v7.0.0)

-Security scanning, auditing, and threat modeling for Claude Code projects. 5 frameworks: OWASP LLM Top 10, Agentic AI Top 10 (ASI), Skills Top 10 (AST), MCP Top 10, AI Agent Traps (DeepMind). 1487 tests.
+Security scanning, auditing, and threat modeling for Claude Code projects. 5 frameworks: OWASP LLM Top 10, Agentic AI Top 10 (ASI), Skills Top 10 (AST), MCP Top 10, AI Agent Traps (DeepMind). 1511 unit and integration tests; mutation-testing coverage not published.

-**v7.0.0 — Trustworthy scoring (BREAKING).** Three changes target the false-positive cascade on real codebases (hyperframes.com gave `BLOCK / Extreme / 100`, ~70% noise):
+**v7.0.0 — Severity-dominated risk scoring (v2 model, BREAKING).** Three changes target the false-positive cascade on real codebases (hyperframes.com gave `BLOCK / Extreme / 100`, ~70% noise):

 1. **Risk-score v2 formula** (`scanners/lib/severity.mjs`) — severity-dominated, log-scaled within tier. Replaces v1 sum-and-cap that collapsed every non-trivial scan to 100/Extreme. Tiers: critical → 70–95, high only → 40–65, medium only → 15–35, low only → 1–11. Verdict cutoffs realigned to new bands (BLOCK ≥65, WARNING ≥15).
-2. **Context-aware entropy scanner** — file-extension skip (`.glsl/.frag/.vert/.shader/.wgsl/.css/.scss/.sass/.less/.svg/.min.*/.map`) + 8 new line-suppression rules (GLSL keywords, CSS-in-JS, inline SVG, ffmpeg `filter_complex`, User-Agent strings, SQL DDL, `throw new Error(\`...\`)`, markdown image URLs). Configurable via `.llm-security/policy.json` `entropy` section (thresholds, `suppress_extensions`, `suppress_line_patterns`, `suppress_paths`). Envelope `calibration` block reports skip counters + effective thresholds + policy source.
+2. **Rule-based entropy scanner with file-extension skip, 8 line-level suppression rules, and configurable policy** — extensions skipped (`.glsl/.frag/.vert/.shader/.wgsl/.css/.scss/.sass/.less/.svg/.min.*/.map`); line-suppression rules (GLSL keywords, CSS-in-JS, inline SVG, ffmpeg `filter_complex`, User-Agent strings, SQL DDL, `throw new Error(\`...\`)`, markdown image URLs). Configurable via `.llm-security/policy.json` `entropy` section (thresholds, `suppress_extensions`, `suppress_line_patterns`, `suppress_paths`). Envelope `calibration` block reports skip counters + effective thresholds + policy source.
 3. **DEP typosquat allowlist expansion** — 22 npm + 5 PyPI entries for short-name tools that tripped Levenshtein detection on every modern codebase (`knip`, `oxlint`, `tsx`, `nx`, `rimraf`, `uv`, `ruff`, etc.).

 See `docs/security-hardening-guide.md` §6 for the calibration story.
@ -21,7 +21,7 @@ See `docs/security-hardening-guide.md` §6 for the calibration story.
 | `/security plugin-audit [path\|url]` | Plugin trust assessment (local or GitHub URL) |
 | `/security mcp-audit [--live]` | MCP server config audit (add `--live` for runtime inspection) |
 | `/security mcp-inspect` | Live MCP server inspection — connect via JSON-RPC 2.0, scan tool descriptions |
-| `/security ide-scan [target\|url]` | Scan installed VS Code + JetBrains extensions/plugins — OR fetch a remote VSIX from Marketplace, OpenVSX, or direct URL (v6.4.0), OR a JetBrains plugin from `plugins.jetbrains.com` (v6.6.0). 7 VS Code checks + 7 JetBrains-specific checks (theme-with-code, broad activation, Premain-Class instrumentation, native binaries, depends-chain, typosquat, shaded jars). Hardened ZIP extractor (zip-slip, symlink, bomb, ratio caps). Orchestrates reused scanners (UNI/ENT/NET/TNT/MEM/SCR) per extension. Offline by default, `--online` opt-in |
+| `/security ide-scan [target\|url]` | Scan installed VS Code + JetBrains extensions/plugins — OR fetch a remote VSIX from Marketplace, OpenVSX, or direct URL (v6.4.0), OR a JetBrains plugin from `plugins.jetbrains.com` (v6.6.0). 7 VS Code checks + 7 JetBrains-specific checks (theme-with-code, broad activation, Premain-Class instrumentation, native binaries, depends-chain, typosquat, shaded jars). Hardened ZIP extractor (zip-slip, symlink, bomb, ratio caps — no fuzz-testing results published to date). Orchestrates reused scanners (UNI/ENT/NET/TNT/MEM/SCR) per extension. Offline by default, `--online` opt-in |
 | `/security posture` | Quick scorecard (13 categories) |
 | `/security threat-model` | Interactive STRIDE/MAESTRO session |
 | `/security diff [path]` | Compare scan against baseline — shows new/resolved/unchanged/moved |
@ -133,7 +133,7 @@ Subcommands: `scan`, `deep-scan`, `posture`, `audit-bom`, `benchmark`. Dispatche

 Pipeline templates in `ci/`: `github-action.yml`, `azure-pipelines.yml`, `gitlab-ci.yml`. Documentation: `docs/ci-cd-guide.md`.
 All templates use `--fail-on high --format sarif --output-file results.sarif` with SARIF upload per platform.
-Standalone CLI makes zero network calls (except opt-in OSV.dev in supply-chain-recheck). Fully Schrems II compatible.
+Standalone CLI makes zero network calls in default mode. Schrems II compatible in default offline mode. Optional OSV.dev enrichment (`supply-chain-recheck --online`) transmits package identifiers to a Google-operated API and is a separate compliance consideration.

 ## Knowledge Files (20)

@ -179,7 +179,7 @@ No persistent state except `post-session-guard.mjs` which maintains a per-sessio
 Prompt injection is **structurally unsolvable** with current architectures (joint paper, 14 researchers, 95-100% ASR against all 12 tested defenses). v5.0 does not claim to "prevent" injection. Instead, it implements **defense-in-depth**:

 - **Broader detection** — MEDIUM advisory for obfuscation signals (leetspeak, homoglyphs, zero-width, multi-language), Unicode Tag steganography, bash expansion evasion
- **Increased attack cost** — Rule of Two enforcement (configurable block/warn/off for lethal trifecta), bash normalization before gate matching
+- **Increased attack cost** — Rule of Two detection (configurable block/warn/off for lethal trifecta; default `warn`, blocks on high-confidence trifectas in opt-in `block` mode; distributed trifectas across MCP servers are detected but not blocked by default), bash normalization before gate matching
 - **Longer monitoring windows** — 100-call long-horizon alongside 20-call sliding window, slow-burn trifecta detection, behavioral drift via Jensen-Shannon divergence
 - **Architectural constraints** — opportunistic byte-matching of truncated output fingerprints (first 200 bytes, SHA-256/16-hex tag; not semantic lineage; trivially bypassed by mutation or summarisation of tool output), sub-agent delegation tracking, HITL trap detection. Inspired by CaMeL (DeepMind, 2025), but this is a lightweight byte-fingerprint, not semantic capability tracking
 - **Honest documentation** — Known Limitations section acknowledges what deterministic hooks cannot detect
--- a/plugins/llm-security/commands/ide-scan.md
+++ b/plugins/llm-security/commands/ide-scan.md
@ -39,7 +39,7 @@ Arguments (pass through as provided by the user):
 - `--fail-on <severity>` — exit 1 if findings at/above severity

 URL mode notes:
- Hardened ZIP extractor with caps: 50MB compressed, 500MB uncompressed, 100x expansion ratio, 10 000 entries, depth 20.
+- Hardened ZIP extractor with caps: 50MB compressed, 500MB uncompressed, 100x expansion ratio, 10 000 entries, depth 20. No fuzz-testing results published to date.
 - Rejects: zip-slip paths, symlink entries, absolute paths, drive letters, encrypted entries, ZIP64.
 - TLS verified, HTTPS only, 30s timeout. Cross-host redirects rejected.
 - Temp directory always cleaned up (success, error, abort).
--- a/plugins/llm-security/docs/security-hardening-guide.md
+++ b/plugins/llm-security/docs/security-hardening-guide.md
@ -98,6 +98,15 @@ recorded in the audit trail.
 techniques before the denylist gate runs. These are **defense-in-depth** layers
 that complement the Claude Code 2.1.98+ harness-level fixes, not a replacement.

+The plugin's "defense-in-depth" claim resolves to **three independent detection
+layers with documented bypass classes**: (1) the Claude Code harness denylist
+(out of plugin scope, evolves with platform); (2) `bash-normalize.mjs` T1-T6
+collapse rules; (3) `pre-bash-destructive.mjs` post-normalization pattern match
+ `post-session-guard.mjs` runtime trifecta correlation. Each layer has known
+bypasses (see Defense Philosophy in `CLAUDE.md` and `docs/critical-review-2026-04-20.md`
+§4 for the evasion arsenal). Stacking layers raises attacker cost; it does not
+provide formal worst-case guarantees.
+
 | Layer | Technique | Example | Normalization |
 |-------|-----------|---------|---------------|
 | T1 | Empty quotes | `rm''-rf /` | strip `''` / `""` between tokens |
--- a/plugins/llm-security/knowledge/mitigation-matrix.md
+++ b/plugins/llm-security/knowledge/mitigation-matrix.md
@ -32,7 +32,7 @@ Attacker injects instructions via external content (files, web pages, tool outpu
 | Prompt injection input scanning | Automated | `pre-prompt-inject-scan.mjs` detects CRITICAL/HIGH/MEDIUM injection patterns in user prompts | Hook file exists; MEDIUM advisory enabled |
 | Unicode Tag steganography detection | Automated | `string-utils.mjs` decodes U+E0000-E007F tags; `injection-patterns.mjs` escalates to CRITICAL/HIGH | `decodeUnicodeTags()` in normalization pipeline |
 | Bash evasion normalization | Automated | `bash-normalize.mjs` strips parameter expansion before pattern matching | `normalizeBashExpansion()` called by both bash hooks |
-| Rule of Two enforcement | Automated | `post-session-guard.mjs` detects trifecta (untrusted input + sensitive data + exfil) | `LLM_SECURITY_TRIFECTA_MODE` env var respected; block mode available |
+| Rule of Two detection (block-mode opt-in) | Automated | `post-session-guard.mjs` detects trifecta (untrusted input + sensitive data + exfil); blocks only when `LLM_SECURITY_TRIFECTA_MODE=block` AND high-confidence trifecta is observed; default `warn` | `LLM_SECURITY_TRIFECTA_MODE` env var respected; block mode opt-in |
 | Long-horizon monitoring | Automated | `post-session-guard.mjs` 100-call window + behavioral drift detection | Long-horizon window active alongside 20-call window |
 | HITL trap detection | Automated | `injection-patterns.mjs` HIGH patterns for approval urgency, summary suppression, scope minimization | HITL patterns present in HIGH_PATTERNS array |
 | Hybrid attack detection | Automated | `injection-patterns.mjs` HYBRID_PATTERNS for P2SQL, recursive injection, XSS | Hybrid patterns checked in tool output scanning |