docs(llm-security): A3 honesty-sweep — 7 sitater nedtonet (critical-review §9)

Closes A3 of v7.1.0 critical-review patch. Each rewrite preserves the underlying claim where it is accurate but removes hype/overreach language. Historical CHANGELOG/README version-table rows are intentionally left as-is (they document what was claimed at the time of release, not what is true today). Changes (CLAUDE.md, commands/ide-scan.md, knowledge/mitigation-matrix.md, docs/security-hardening-guide.md): - "Trustworthy scoring (BREAKING)" → "Severity-dominated risk scoring (v2 model, BREAKING)". Removes hype framing; describes the actual mechanism. - "Context-aware entropy scanner" → "Rule-based entropy scanner with file-extension skip, 8 line-level suppression rules, and configurable policy". No ML/context inference; just rules. - "1487 tests" → "1511 unit and integration tests; mutation-testing coverage not published". Updated count after A1+A2 (+24) and added qualifier. - "Fully Schrems II compatible" → "Schrems II compatible in default offline mode. Optional OSV.dev enrichment (`supply-chain-recheck --online`) transmits package identifiers to a Google-operated API and is a separate compliance consideration." Acknowledges the OSV.dev opt-in caveat. - "Rule of Two enforcement" → "Rule of Two detection (configurable; default warn; blocks on high-confidence trifectas in opt-in `block` mode; distributed trifectas detected but not blocked by default)". "Enforcement" implied block; default is warn. - "Hardened ZIP extractor" → suffix " — no fuzz-testing results published to date". Caps and class-of-attacks rejected are accurate; absence of formal fuzz coverage now stated. - "defense-in-depth" — preserved as framing, but quantified in security-hardening-guide §4: "three independent detection layers with documented bypass classes". Each layer named, each layer's known bypasses pointed to (critical-review §4 evasion arsenal). Tests: 1511/1511 green (no behavioural change).
2026-04-29 11:52:55 +02:00 · 2026-04-29 11:52:55 +02:00 · a46308b1e9
commit a46308b1e9
parent 4aa5318bcb
4 changed files with 17 additions and 8 deletions
--- a/plugins/llm-security/knowledge/mitigation-matrix.md
+++ b/plugins/llm-security/knowledge/mitigation-matrix.md
@ -32,7 +32,7 @@ Attacker injects instructions via external content (files, web pages, tool outpu
 | Prompt injection input scanning | Automated | `pre-prompt-inject-scan.mjs` detects CRITICAL/HIGH/MEDIUM injection patterns in user prompts | Hook file exists; MEDIUM advisory enabled |
 | Unicode Tag steganography detection | Automated | `string-utils.mjs` decodes U+E0000-E007F tags; `injection-patterns.mjs` escalates to CRITICAL/HIGH | `decodeUnicodeTags()` in normalization pipeline |
 | Bash evasion normalization | Automated | `bash-normalize.mjs` strips parameter expansion before pattern matching | `normalizeBashExpansion()` called by both bash hooks |
-| Rule of Two enforcement | Automated | `post-session-guard.mjs` detects trifecta (untrusted input + sensitive data + exfil) | `LLM_SECURITY_TRIFECTA_MODE` env var respected; block mode available |
+| Rule of Two detection (block-mode opt-in) | Automated | `post-session-guard.mjs` detects trifecta (untrusted input + sensitive data + exfil); blocks only when `LLM_SECURITY_TRIFECTA_MODE=block` AND high-confidence trifecta is observed; default `warn` | `LLM_SECURITY_TRIFECTA_MODE` env var respected; block mode opt-in |
 | Long-horizon monitoring | Automated | `post-session-guard.mjs` 100-call window + behavioral drift detection | Long-horizon window active alongside 20-call window |
 | HITL trap detection | Automated | `injection-patterns.mjs` HIGH patterns for approval urgency, summary suppression, scope minimization | HITL patterns present in HIGH_PATTERNS array |
 | Hybrid attack detection | Automated | `injection-patterns.mjs` HYBRID_PATTERNS for P2SQL, recursive injection, XSS | Hybrid patterns checked in tool output scanning |