fix(llm-security): A2 batch — JSDoc arithmetic + co-monotonicity test + CaMeL nedton

Closes A2 of v7.1.0 critical-review patch (docs/critical-review-2026-04-20.md): - B4 (severity JSDoc): 4 critical = 93, not 90. Fixed in scanners/lib/severity.mjs:23 and CHANGELOG.md v7.0.0 tier description. The actual computation has always been 93 (70 + log2(5)*10 = 93.22 → round); only the docs were wrong. - §5.4 co-monotonicity: new sweep test in tests/lib/severity.test.mjs over 15 representative count vectors. Asserts that (verdict, riskBand) agree under the v7.0.0 contract for every case — catches future drift between riskScore tiers, verdict cutoffs, and riskBand cutoffs. Includes a B4 anchor test (riskScore {critical: 4} === 93) so doc/code drift fails loudly. - B8 (CaMeL claims toned down): post-session-guard.mjs:646 comment block and CLAUDE.md:184 Defense Philosophy bullet now describe the implementation honestly — opportunistic byte-matching of truncated output fingerprints (first 200 bytes, SHA-256/16-hex), not semantic data-flow tracking. Trivially bypassed by mutation, summarisation, or re-encoding. Inspired by CaMeL (DeepMind 2025), but not a CaMeL capability-tracking implementation. Tests: 1495 → 1511 (+16: 15 sweep cases + 1 B4 anchor). All green.
2026-04-29 11:49:08 +02:00 · 2026-04-29 11:49:08 +02:00 · 4aa5318bcb
commit 4aa5318bcb
parent 36be963d4d
5 changed files with 84 additions and 6 deletions
--- a/plugins/llm-security/CLAUDE.md
+++ b/plugins/llm-security/CLAUDE.md
@ -181,7 +181,7 @@ Prompt injection is **structurally unsolvable** with current architectures (join
 - **Broader detection** — MEDIUM advisory for obfuscation signals (leetspeak, homoglyphs, zero-width, multi-language), Unicode Tag steganography, bash expansion evasion
 - **Increased attack cost** — Rule of Two enforcement (configurable block/warn/off for lethal trifecta), bash normalization before gate matching
 - **Longer monitoring windows** — 100-call long-horizon alongside 20-call sliding window, slow-burn trifecta detection, behavioral drift via Jensen-Shannon divergence
- **Architectural constraints** — CaMeL-inspired data flow tagging, sub-agent delegation tracking, HITL trap detection
+- **Architectural constraints** — opportunistic byte-matching of truncated output fingerprints (first 200 bytes, SHA-256/16-hex tag; not semantic lineage; trivially bypassed by mutation or summarisation of tool output), sub-agent delegation tracking, HITL trap detection. Inspired by CaMeL (DeepMind, 2025), but this is a lightweight byte-fingerprint, not semantic capability tracking
 - **Honest documentation** — Known Limitations section acknowledges what deterministic hooks cannot detect

 **Bash evasion layers (T1-T6):** `bash-normalize.mjs` collapses six known obfuscation techniques before gate matching as a defense-in-depth layer. T1 empty quotes (`rm''-rf`), T2 `${}` parameter expansion, T3 backslash continuation, T4 tab/whitespace splitting, T5 `${IFS}` word-splitting, T6 ANSI-C hex quoting (`$'\x72\x6d'`). These layers complement — not replace — Claude Code 2.1.98+ harness-level protections. Full reference: `docs/security-hardening-guide.md`.