From a46308b1e9396e5a97a20ff1315b6ca38827b9e1 Mon Sep 17 00:00:00 2001
From: Kjell Tore Guttormsen <ktg@humanize.no>
Date: Wed, 29 Apr 2026 11:52:55 +0200
Subject: [PATCH] =?UTF-8?q?docs(llm-security):=20A3=20honesty-sweep=20?=
 =?UTF-8?q?=E2=80=94=207=20sitater=20nedtonet=20(critical-review=20=C2=A79?=
 =?UTF-8?q?)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Closes A3 of v7.1.0 critical-review patch. Each rewrite preserves the underlying
claim where it is accurate but removes hype/overreach language. Historical
CHANGELOG/README version-table rows are intentionally left as-is (they document
what was claimed at the time of release, not what is true today).

Changes (CLAUDE.md, commands/ide-scan.md, knowledge/mitigation-matrix.md,
docs/security-hardening-guide.md):

- "Trustworthy scoring (BREAKING)" → "Severity-dominated risk scoring
  (v2 model, BREAKING)". Removes hype framing; describes the actual mechanism.
- "Context-aware entropy scanner" → "Rule-based entropy scanner with
  file-extension skip, 8 line-level suppression rules, and configurable policy".
  No ML/context inference; just rules.
- "1487 tests" → "1511 unit and integration tests; mutation-testing coverage
  not published". Updated count after A1+A2 (+24) and added qualifier.
- "Fully Schrems II compatible" → "Schrems II compatible in default offline
  mode. Optional OSV.dev enrichment (`supply-chain-recheck --online`)
  transmits package identifiers to a Google-operated API and is a separate
  compliance consideration." Acknowledges the OSV.dev opt-in caveat.
- "Rule of Two enforcement" → "Rule of Two detection (configurable; default
  warn; blocks on high-confidence trifectas in opt-in `block` mode; distributed
  trifectas detected but not blocked by default)". "Enforcement" implied
  block; default is warn.
- "Hardened ZIP extractor" → suffix " — no fuzz-testing results published
  to date". Caps and class-of-attacks rejected are accurate; absence of
  formal fuzz coverage now stated.
- "defense-in-depth" — preserved as framing, but quantified in
  security-hardening-guide §4: "three independent detection layers with
  documented bypass classes". Each layer named, each layer's known bypasses
  pointed to (critical-review §4 evasion arsenal).

Tests: 1511/1511 green (no behavioural change).
---
 plugins/llm-security/CLAUDE.md                       | 12 ++++++------
 plugins/llm-security/commands/ide-scan.md            |  2 +-
 .../llm-security/docs/security-hardening-guide.md    |  9 +++++++++
 plugins/llm-security/knowledge/mitigation-matrix.md  |  2 +-
 4 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/plugins/llm-security/CLAUDE.md b/plugins/llm-security/CLAUDE.md
index d80691a..74b96df 100644
--- a/plugins/llm-security/CLAUDE.md
+++ b/plugins/llm-security/CLAUDE.md
@@ -1,11 +1,11 @@
 # LLM Security Plugin (v7.0.0)
 
-Security scanning, auditing, and threat modeling for Claude Code projects. 5 frameworks: OWASP LLM Top 10, Agentic AI Top 10 (ASI), Skills Top 10 (AST), MCP Top 10, AI Agent Traps (DeepMind). 1487 tests.
+Security scanning, auditing, and threat modeling for Claude Code projects. 5 frameworks: OWASP LLM Top 10, Agentic AI Top 10 (ASI), Skills Top 10 (AST), MCP Top 10, AI Agent Traps (DeepMind). 1511 unit and integration tests; mutation-testing coverage not published.
 
-**v7.0.0 — Trustworthy scoring (BREAKING).** Three changes target the false-positive cascade on real codebases (hyperframes.com gave `BLOCK / Extreme / 100`, ~70% noise):
+**v7.0.0 — Severity-dominated risk scoring (v2 model, BREAKING).** Three changes target the false-positive cascade on real codebases (hyperframes.com gave `BLOCK / Extreme / 100`, ~70% noise):
 
 1. **Risk-score v2 formula** (`scanners/lib/severity.mjs`) — severity-dominated, log-scaled within tier. Replaces v1 sum-and-cap that collapsed every non-trivial scan to 100/Extreme. Tiers: critical → 70–95, high only → 40–65, medium only → 15–35, low only → 1–11. Verdict cutoffs realigned to new bands (BLOCK ≥65, WARNING ≥15).
-2. **Context-aware entropy scanner** — file-extension skip (`.glsl/.frag/.vert/.shader/.wgsl/.css/.scss/.sass/.less/.svg/.min.*/.map`) + 8 new line-suppression rules (GLSL keywords, CSS-in-JS, inline SVG, ffmpeg `filter_complex`, User-Agent strings, SQL DDL, `throw new Error(\`...\`)`, markdown image URLs). Configurable via `.llm-security/policy.json` `entropy` section (thresholds, `suppress_extensions`, `suppress_line_patterns`, `suppress_paths`). Envelope `calibration` block reports skip counters + effective thresholds + policy source.
+2. **Rule-based entropy scanner with file-extension skip, 8 line-level suppression rules, and configurable policy** — extensions skipped (`.glsl/.frag/.vert/.shader/.wgsl/.css/.scss/.sass/.less/.svg/.min.*/.map`); line-suppression rules (GLSL keywords, CSS-in-JS, inline SVG, ffmpeg `filter_complex`, User-Agent strings, SQL DDL, `throw new Error(\`...\`)`, markdown image URLs). Configurable via `.llm-security/policy.json` `entropy` section (thresholds, `suppress_extensions`, `suppress_line_patterns`, `suppress_paths`). Envelope `calibration` block reports skip counters + effective thresholds + policy source.
 3. **DEP typosquat allowlist expansion** — 22 npm + 5 PyPI entries for short-name tools that tripped Levenshtein detection on every modern codebase (`knip`, `oxlint`, `tsx`, `nx`, `rimraf`, `uv`, `ruff`, etc.).
 
 See `docs/security-hardening-guide.md` §6 for the calibration story.
@@ -21,7 +21,7 @@ See `docs/security-hardening-guide.md` §6 for the calibration story.
 | `/security plugin-audit [path\|url]` | Plugin trust assessment (local or GitHub URL) |
 | `/security mcp-audit [--live]` | MCP server config audit (add `--live` for runtime inspection) |
 | `/security mcp-inspect` | Live MCP server inspection — connect via JSON-RPC 2.0, scan tool descriptions |
-| `/security ide-scan [target\|url]` | Scan installed VS Code + JetBrains extensions/plugins — OR fetch a remote VSIX from Marketplace, OpenVSX, or direct URL (v6.4.0), OR a JetBrains plugin from `plugins.jetbrains.com` (v6.6.0). 7 VS Code checks + 7 JetBrains-specific checks (theme-with-code, broad activation, Premain-Class instrumentation, native binaries, depends-chain, typosquat, shaded jars). Hardened ZIP extractor (zip-slip, symlink, bomb, ratio caps). Orchestrates reused scanners (UNI/ENT/NET/TNT/MEM/SCR) per extension. Offline by default, `--online` opt-in |
+| `/security ide-scan [target\|url]` | Scan installed VS Code + JetBrains extensions/plugins — OR fetch a remote VSIX from Marketplace, OpenVSX, or direct URL (v6.4.0), OR a JetBrains plugin from `plugins.jetbrains.com` (v6.6.0). 7 VS Code checks + 7 JetBrains-specific checks (theme-with-code, broad activation, Premain-Class instrumentation, native binaries, depends-chain, typosquat, shaded jars). Hardened ZIP extractor (zip-slip, symlink, bomb, ratio caps — no fuzz-testing results published to date). Orchestrates reused scanners (UNI/ENT/NET/TNT/MEM/SCR) per extension. Offline by default, `--online` opt-in |
 | `/security posture` | Quick scorecard (13 categories) |
 | `/security threat-model` | Interactive STRIDE/MAESTRO session |
 | `/security diff [path]` | Compare scan against baseline — shows new/resolved/unchanged/moved |
@@ -133,7 +133,7 @@ Subcommands: `scan`, `deep-scan`, `posture`, `audit-bom`, `benchmark`. Dispatche
 
 Pipeline templates in `ci/`: `github-action.yml`, `azure-pipelines.yml`, `gitlab-ci.yml`. Documentation: `docs/ci-cd-guide.md`.
 All templates use `--fail-on high --format sarif --output-file results.sarif` with SARIF upload per platform.
-Standalone CLI makes zero network calls (except opt-in OSV.dev in supply-chain-recheck). Fully Schrems II compatible.
+Standalone CLI makes zero network calls in default mode. Schrems II compatible in default offline mode. Optional OSV.dev enrichment (`supply-chain-recheck --online`) transmits package identifiers to a Google-operated API and is a separate compliance consideration.
 
 ## Knowledge Files (20)
 
@@ -179,7 +179,7 @@ No persistent state except `post-session-guard.mjs` which maintains a per-sessio
 Prompt injection is **structurally unsolvable** with current architectures (joint paper, 14 researchers, 95-100% ASR against all 12 tested defenses). v5.0 does not claim to "prevent" injection. Instead, it implements **defense-in-depth**:
 
 - **Broader detection** — MEDIUM advisory for obfuscation signals (leetspeak, homoglyphs, zero-width, multi-language), Unicode Tag steganography, bash expansion evasion
-- **Increased attack cost** — Rule of Two enforcement (configurable block/warn/off for lethal trifecta), bash normalization before gate matching
+- **Increased attack cost** — Rule of Two detection (configurable block/warn/off for lethal trifecta; default `warn`, blocks on high-confidence trifectas in opt-in `block` mode; distributed trifectas across MCP servers are detected but not blocked by default), bash normalization before gate matching
 - **Longer monitoring windows** — 100-call long-horizon alongside 20-call sliding window, slow-burn trifecta detection, behavioral drift via Jensen-Shannon divergence
 - **Architectural constraints** — opportunistic byte-matching of truncated output fingerprints (first 200 bytes, SHA-256/16-hex tag; not semantic lineage; trivially bypassed by mutation or summarisation of tool output), sub-agent delegation tracking, HITL trap detection. Inspired by CaMeL (DeepMind, 2025), but this is a lightweight byte-fingerprint, not semantic capability tracking
 - **Honest documentation** — Known Limitations section acknowledges what deterministic hooks cannot detect
diff --git a/plugins/llm-security/commands/ide-scan.md b/plugins/llm-security/commands/ide-scan.md
index 6187ddd..4794892 100644
--- a/plugins/llm-security/commands/ide-scan.md
+++ b/plugins/llm-security/commands/ide-scan.md
@@ -39,7 +39,7 @@ Arguments (pass through as provided by the user):
 - `--fail-on <severity>` — exit 1 if findings at/above severity
 
 URL mode notes:
-- Hardened ZIP extractor with caps: 50MB compressed, 500MB uncompressed, 100x expansion ratio, 10 000 entries, depth 20.
+- Hardened ZIP extractor with caps: 50MB compressed, 500MB uncompressed, 100x expansion ratio, 10 000 entries, depth 20. No fuzz-testing results published to date.
 - Rejects: zip-slip paths, symlink entries, absolute paths, drive letters, encrypted entries, ZIP64.
 - TLS verified, HTTPS only, 30s timeout. Cross-host redirects rejected.
 - Temp directory always cleaned up (success, error, abort).
diff --git a/plugins/llm-security/docs/security-hardening-guide.md b/plugins/llm-security/docs/security-hardening-guide.md
index 11c8c5d..eb7f698 100644
--- a/plugins/llm-security/docs/security-hardening-guide.md
+++ b/plugins/llm-security/docs/security-hardening-guide.md
@@ -98,6 +98,15 @@ recorded in the audit trail.
 techniques before the denylist gate runs. These are **defense-in-depth** layers
 that complement the Claude Code 2.1.98+ harness-level fixes, not a replacement.
 
+The plugin's "defense-in-depth" claim resolves to **three independent detection
+layers with documented bypass classes**: (1) the Claude Code harness denylist
+(out of plugin scope, evolves with platform); (2) `bash-normalize.mjs` T1-T6
+collapse rules; (3) `pre-bash-destructive.mjs` post-normalization pattern match
++ `post-session-guard.mjs` runtime trifecta correlation. Each layer has known
+bypasses (see Defense Philosophy in `CLAUDE.md` and `docs/critical-review-2026-04-20.md`
+§4 for the evasion arsenal). Stacking layers raises attacker cost; it does not
+provide formal worst-case guarantees.
+
 | Layer | Technique | Example | Normalization |
 |-------|-----------|---------|---------------|
 | T1 | Empty quotes | `rm''-rf /` | strip `''` / `""` between tokens |
diff --git a/plugins/llm-security/knowledge/mitigation-matrix.md b/plugins/llm-security/knowledge/mitigation-matrix.md
index 753d934..203eee3 100644
--- a/plugins/llm-security/knowledge/mitigation-matrix.md
+++ b/plugins/llm-security/knowledge/mitigation-matrix.md
@@ -32,7 +32,7 @@ Attacker injects instructions via external content (files, web pages, tool outpu
 | Prompt injection input scanning | Automated | `pre-prompt-inject-scan.mjs` detects CRITICAL/HIGH/MEDIUM injection patterns in user prompts | Hook file exists; MEDIUM advisory enabled |
 | Unicode Tag steganography detection | Automated | `string-utils.mjs` decodes U+E0000-E007F tags; `injection-patterns.mjs` escalates to CRITICAL/HIGH | `decodeUnicodeTags()` in normalization pipeline |
 | Bash evasion normalization | Automated | `bash-normalize.mjs` strips parameter expansion before pattern matching | `normalizeBashExpansion()` called by both bash hooks |
-| Rule of Two enforcement | Automated | `post-session-guard.mjs` detects trifecta (untrusted input + sensitive data + exfil) | `LLM_SECURITY_TRIFECTA_MODE` env var respected; block mode available |
+| Rule of Two detection (block-mode opt-in) | Automated | `post-session-guard.mjs` detects trifecta (untrusted input + sensitive data + exfil); blocks only when `LLM_SECURITY_TRIFECTA_MODE=block` AND high-confidence trifecta is observed; default `warn` | `LLM_SECURITY_TRIFECTA_MODE` env var respected; block mode opt-in |
 | Long-horizon monitoring | Automated | `post-session-guard.mjs` 100-call window + behavioral drift detection | Long-horizon window active alongside 20-call window |
 | HITL trap detection | Automated | `injection-patterns.mjs` HIGH patterns for approval urgency, summary suppression, scope minimization | HITL patterns present in HIGH_PATTERNS array |
 | Hybrid attack detection | Automated | `injection-patterns.mjs` HYBRID_PATTERNS for P2SQL, recursive injection, XSS | Hybrid patterns checked in tool output scanning |