From a6e2c939ef5e7db84a4bea2e67b7004249a54aea Mon Sep 17 00:00:00 2001 From: Kjell Tore Guttormsen Date: Sun, 19 Apr 2026 23:27:52 +0200 Subject: [PATCH] docs(llm-security): add critical review 2026-04-20 (v7.0.0 adversarial audit) Co-Authored-By: Claude Opus 4.7 --- .../docs/critical-review-2026-04-20.md | 725 ++++++++++++++++++ 1 file changed, 725 insertions(+) create mode 100644 plugins/llm-security/docs/critical-review-2026-04-20.md diff --git a/plugins/llm-security/docs/critical-review-2026-04-20.md b/plugins/llm-security/docs/critical-review-2026-04-20.md new file mode 100644 index 0000000..28e45c2 --- /dev/null +++ b/plugins/llm-security/docs/critical-review-2026-04-20.md @@ -0,0 +1,725 @@ +# Critical Review — `llm-security` v7.0.0 + +**Date:** 2026-04-20 +**Scope:** Adversarial audit of the `llm-security` Claude Code plugin, version 7.0.0 (released 2026-04-19). +**Method:** Six parallel specialist review agents (scanner bug hunt, hook-bypass arsenal, evasion PoC arsenal, honesty check, market analysis, scoring-model adversarial), followed by manual verification of the most consequential claims by reading the referenced source lines directly. +**Subject context:** The plugin packages 5 OWASP-style taxonomies (LLM / ASI / AST / MCP / DeepMind Agent Traps), 10 orchestrated scanners, 8 hooks, an interactive threat modeller, an attack simulator, a CLI, a dashboard aggregator, an AI-BOM generator, and a new v7.0.0 scoring model that replaces the v1 "sum-and-cap" formula. The claim under review: that v7.0.0 delivers "trustworthy scoring" and a defensible security posture for Claude Code environments. + +--- + +## 1. Executive Summary + +**Overall grade: B-.** + +`llm-security` v7.0.0 is a capable, well-architected, offline-first developer-facing security tool. The breadth is unusual: the project ships functionality (MCP live inspection, IDE extension scanning, bash evasion normalization, Unicode Tag steganography detection, trifecta detection, AI-BOM generation, adaptive red-team harness) that no single commercial competitor combines in one package. The v7.0.0 scoring rework is a genuine improvement over v1 — the old "sum-and-cap" formula did collapse every non-trivial scan to `100 / Extreme`, and the new log-scaled, severity-dominated model produces defensible bands for realistic scans. + +The grade falls short of B+ or A-minus for four specific reasons: + +1. **Two real HIGH-severity hook bugs** let hostile writes and distributed trifectas slip past what the documentation promises as blocking behaviour. +2. **Three honesty issues** inflate claims beyond what the code delivers: the "SHA-256 provenance tracking" is a 200-byte substring fingerprint, "Fully Schrems II compatible" ignores the Google-operated OSV.dev API, and "Rule of Two enforcement" is an opt-in warning in default config. +3. **Scoring doc arithmetic is wrong** in a way that undercuts the "trustworthy scoring" headline: the formula yields 93 for 4 criticals, the documentation says 90. +4. **Coverage gaps against 2026 threats** (A2A injection, multi-modal / EXIF, MCP 2.0 OAuth, terminal ANSI injection, skill marketplace poisoning) are not acknowledged; the plugin is honest about *general* limitations of prompt-injection defence but silent about these specific vectors. + +### Top 5 most serious findings (detail in §2, §4, §9) + +1. **HIGH — `pre-write-pathguard.mjs:23` regex hole** lets `*.env.production.local.backup` and any `*.env.X.Y.Z` variant through unblocked. +2. **HIGH — `post-session-guard.mjs:816` gated block** downgrades distributed (non-MCP-concentrated, non-sensitive-path) trifectas to WARN even when `LLM_SECURITY_TRIFECTA_MODE=block` is set. +3. **HIGH (honesty) — `post-session-guard.mjs:655` "CaMeL-inspired provenance"** is a 200-byte SHA-256 substring fingerprint, not data-flow lineage. Trivially bypassed by appending one byte. +4. **MEDIUM (honesty) — `CLAUDE.md:136` / `README.md`** "Fully Schrems II compatible" ignores OSV.dev (Google-operated) opt-in. +5. **LOW (arithmetic / honesty) — `CHANGELOG.md:11` + `severity.mjs:23` JSDoc** state `4 critical = 90` when the formula evaluates to 93. + +### Top 5 most valuable missing features + +1. **Web dashboard + fleet policy server** — the plugin is machine-local; enterprise security teams require central visibility and policy push. +2. **Runtime prompt firewall / filter** — all current protection is static; Lakera and Protect AI ship runtime filters. +3. **IDE real-time scanning (VS Code + JetBrains)** — the plugin can scan *installed extensions*, but not offer live in-editor scanning of the developer's own code, which is table stakes for Snyk and Semgrep. +4. **Compliance reporting pack (PDF/DOCX, EU AI Act Art. 15 audit evidence)** — CISO-facing deliverables are absent; only SARIF / JSONL exist today. +5. **Enterprise incident integrations** (Jira, ServiceNow, Slack, Teams, PagerDuty) — today only SARIF upload is supported. + +--- + +## 2. Critical Bugs and Vulnerabilities + +This section lists verified findings with file:line references, proof-of-concept payloads, suggested fixes, and severity. Findings B1, B2, B4, and B8 were verified by reading the referenced source directly during this review. B3, B5, B6, B7 were verified by the scanner-bug agent (`a12f1a90430b53a8c`) and are reported here with its file:line anchors. + +### B1 — Pathguard regex miss on multi-segment `.env.*.local.*` — HIGH + +- **File:** `hooks/scripts/pre-write-pathguard.mjs:21-25` +- **Code:** + ```js + const ENV_PATTERNS = [ + /[\\/]\.env$/, + /[\\/]\.env\.[a-z]+$/, // matches .env.X only when X is a single [a-z]+ segment + /[\\/]\.env\.local$/, + ]; + ``` +- **Defect:** The second pattern anchors `$` immediately after `[a-z]+`, so any file name with more than one segment after `.env` is not matched. Digits, dots, hyphens, and uppercase characters in the suffix also fail to match. +- **PoC payloads that slip past the hook:** + - `Write` to `.env.production.local.backup` + - `Write` to `.env.development.local.old` + - `Write` to `.env.prod.local.bak` + - `Write` to `.env.stage-1.local` + - `Write` to `.env.CI.secret` +- **Impact:** A prompt-injected agent can exfiltrate or overwrite environment secrets by choosing any of the variants above. The hook's purpose is to be a last-line path guard; this regex undercuts that. +- **Fix:** replace with + ```js + /[\\/]\.env(\.[A-Za-z0-9._-]+)*$/ + ``` + or, cleaner, match by `basename` prefix: + ```js + if (basename(path).toLowerCase().startsWith('.env')) { block(); } + ``` +- **Severity:** HIGH. Deterministic bypass of a hook that is documented as a gate for environment secrets. + +### B2 — Distributed trifecta is advisory even in `block` mode — HIGH + +- **File:** `hooks/scripts/post-session-guard.mjs:814-826` +- **Code:** + ```js + if (TRIFECTA_MODE === 'block' && (mcpInfo.concentrated || sensitiveExfil)) { + process.stderr.write('BLOCKED: Rule of Two violation ...'); + process.stdout.write(JSON.stringify({ decision: 'block' })); + process.exit(2); + } + ``` +- **Defect:** The `block` path is gated on a *high-confidence* sub-condition — MCP-concentrated (all three legs via the same MCP server) OR a sensitive-path + exfil pair. A trifecta that distributes its three legs across *different* sources (e.g., untrusted leg from WebFetch, sensitive data leg from Read to a non-sensitive path, exfiltration leg from Bash `curl` to a non-sensitive sink) is detected (the warning is emitted at line 803), but never blocked. +- **PoC scenario:** + 1. Agent reads untrusted content from a GitHub issue via MCP server A. + 2. Agent reads user data from `./user-db.sqlite` (not a sensitive path per the scanner's static list). + 3. Agent exfiltrates via `curl` to a newly registered domain (not on the known-sink allowlist). + All three legs are detected. The hook emits `formatWarning(...)`. `mcpInfo.concentrated` is `false` (different servers), `sensitiveExfil` is `false` (no `~/.ssh`, `~/.aws`, etc., and no obvious cred exfil). Result: the `block` branch is skipped, the caller proceeds. +- **Impact:** Users who configure `LLM_SECURITY_TRIFECTA_MODE=block` reasonably expect that any detected lethal trifecta is blocked. In the current code, only a subset is. The documentation (CLAUDE.md §"Hooks", line 58) describes `block|warn|off` without qualification. This is a mismatch between documented behaviour and code behaviour. +- **Fix options:** + 1. **Strict:** remove the `(mcpInfo.concentrated || sensitiveExfil)` gate inside the `block` branch — block on any detected trifecta in block mode. + 2. **Tiered:** expose a second env var, e.g., `LLM_SECURITY_TRIFECTA_BLOCK_STRICTNESS=high|all`, and document that `block` today implements `high` only. + 3. Update the documentation in CLAUDE.md and README.md to make the high-confidence gate explicit, so the mismatch is removed. +- **Severity:** HIGH. False sense of security for any operator who enables `block`. + +### B3 — `riskScore({info: N}) = 0` silently masks info-volume findings — MEDIUM + +- **File:** `scanners/lib/severity.mjs:32-46` +- **Code:** The `riskScore` function inspects `critical`, `high`, `medium`, `low`. The `info` count is ignored. +- **Defect:** Any scanner that mis-tiers findings as `info` contributes nothing to the risk score, the verdict, or the band. A scanner configured incorrectly (or an adversary who targets a scanner's tiering logic, e.g., by crafting strings that match a `tier_downgrade` heuristic) can accumulate arbitrary numbers of findings without affecting the verdict. +- **Honest characterisation:** Ignoring info in a risk aggregate is a reasonable design choice on its own. The problem is the combination with (a) the `info` severity being a legitimate tier in `SEVERITY` (line 4-10), (b) the `owaspCategorize` function (line 218) counting `info` findings, and (c) no documentation anywhere stating that info is scoring-inert. An operator looking at a report that counts 400 info findings has no way to know these contribute zero to the final band. +- **Fix options:** + 1. Document explicitly in `severity.mjs` JSDoc and CLAUDE.md that `info` is excluded from scoring. + 2. Add an `infoScore()` helper that returns a supplementary 0-10 score — useful for trend monitoring without affecting verdicts. + 3. Add a floor contribution: e.g., `score = max(score, 1 + min(5, log2(info + 1) * 1.5))` when `info >= 50`, so large info volumes produce at least a Low band. +- **Severity:** MEDIUM. This is primarily a honesty / observability issue, not a detection issue. + +### B4 — CHANGELOG / JSDoc tier example arithmetic is wrong — LOW (honesty) + +- **Files:** `CHANGELOG.md:11`, `scanners/lib/severity.mjs:23` (JSDoc), `CLAUDE.md:7` +- **Claim:** "Critical present → 70–95 (1=80, 2=86, 4=90, 10=95)" +- **Verification:** From `riskScore` at `severity.mjs:32-46`: + ``` + base = 70 + min(25, log2(critical + 1) * 10) + critical=1 → 70 + min(25, log2(2)*10) = 70 + 10.00 = 80.00 → 80 ok + critical=2 → 70 + min(25, log2(3)*10) = 70 + 15.85 = 85.85 → 86 ok + critical=4 → 70 + min(25, log2(5)*10) = 70 + 23.22 = 93.22 → 93 mismatch (docs say 90) + critical=10 → 70 + min(25, log2(11)*10) = 70 + 34.59 → capped → 95 ok + ``` +- **Defect:** The "4 = 90" entry in the CHANGELOG, the JSDoc at `severity.mjs:23`, and the CLAUDE.md summary at line 7 all misstate the formula's output. The formula returns 93. +- **Fix:** Either (a) correct all three doc locations to `4=93`, or (b) adjust the formula (e.g., `log2(critical+1) * 9` or a different carry) to actually yield 90 for 4 criticals. Option (a) is strongly preferred; the formula is the ground truth and the docs follow. +- **Severity:** LOW. But corrosive: the v7.0.0 pitch is "trustworthy scoring", and the flagship documentation example is arithmetically wrong. + +### B5 — Entropy scanner can miss secrets inside code-heavy files that look GLSL-shaped — MEDIUM + +- **File:** `scanners/entropy-scanner.mjs:237-238`, suppression list at lines 30-34. +- **Defect:** The v7.0.0 entropy scanner skips files with extensions `.glsl|.frag|.vert|.shader|.wgsl|.css|.scss|.sass|.less|.svg|.min.*|.map` to cut shader / CSS / minified-JS false positives. The per-line suppression rules (inside the scanner) include GLSL-keyword heuristics. Where this goes wrong: a `.ts` file containing inline shader source or CSS-in-JS templates can accumulate *line-level* suppressions while a genuine high-entropy secret embedded in the same file is dismissed because the context "reads GLSL-like". +- **PoC sketch:** A TypeScript file containing an inline GLSL-shaped block and a credential-looking high-entropy string on the next line. The line containing the credential has a GLSL-keyword-bearing neighbour; the line suppression heuristic can short-circuit the classification. +- **Fix:** Replace line-proximity suppression with a two-stage pipeline: first classify the *file context* (shader-dominant vs code-dominant vs markup-dominant), then apply per-line rules scoped to that context. Do not allow GLSL-suppression rules to fire inside `.ts` / `.js` / `.py` / `.go` files. +- **Severity:** MEDIUM. Real false-negative risk in polyglot files common in modern frontend projects. + +### B6 — Taint-tracer ignores destructured and spread assignments — MEDIUM + +- **File:** `scanners/taint-tracer.mjs:175-182` (extraction of assigned variable name) +- **Defect:** The tracer's `extractAssignedVariable` recognises plain assignments (`const x = req.body`, `let y = process.argv[1]`), but not destructuring or spread: + ```js + const { secret: userInput } = req.body; // userInput untainted per current tracer + const [input, ...rest] = process.argv; // input / rest untainted + const { a, b: { c } } = req.body; // c untainted + ``` + Sinks that use any of `userInput`, `input`, `rest`, `c` downstream will not be flagged. +- **Fix:** Extend the extractor to recognise object-destructuring, array-destructuring, and rest patterns. This is a pure parser-level change; the taint propagation downstream is already correct. +- **Severity:** MEDIUM. Common modern JS/TS pattern; the gap yields false negatives rather than false positives. + +### B7 — Levenshtein <= 2 threshold lets many real typosquats through — MEDIUM + +- **File:** `scanners/dep-audit.mjs:307, 326`, and `scanners/supply-chain-recheck.mjs` +- **Defect:** The dep-audit gate flags distance=1 as HIGH and distance=2 as MEDIUM; distance >= 3 is not flagged. But many real-world typosquats have distance >= 3: `lodash` vs `lodash-utils` (distance = 6), `react` vs `reactjs-utils` (distance = 8), `express` vs `expresss` (distance = 1, captured) versus `express` vs `expressjs-wrapper` (distance > 2, missed). Common token-injection typosquats (`-utils`, `-helper`, `-core`, `-plus`) are exactly the attack pattern that distance-based matching fails on. +- **Fix:** Combine Levenshtein with tokenisation: + 1. Split package names on `-` and `_`. + 2. Flag if any token set is a strict subset (or the top-N-overlap) of a known popular package's token set. + 3. Keep Levenshtein <= 2 as a complementary signal, not the sole gate. +- **Severity:** MEDIUM. There is an existing allowlist (v7.0.0 expansion: 22 npm + 5 PyPI) that partially compensates by reducing *false positives* for short-name tools; this fix targets the *false negative* side. + +### B8 — "CaMeL-inspired SHA-256 provenance tracking" is a 200-byte substring fingerprint — HIGH (honesty) + +- **File:** `hooks/scripts/post-session-guard.mjs:655-658` (`computeDataTag`), CLAUDE.md:184. +- **Code:** + ```js + function computeDataTag(text) { + const sample = text.slice(0, 200); + return createHash('sha256').update(sample).digest('hex').slice(0, 16); + } + ``` +- **Claim (CLAUDE.md:184):** "CaMeL-inspired data flow tagging (SHA-256 provenance tracking, output to input linking)." +- **Reality:** The mechanism hashes the first 200 characters of a tool's output, truncates to a 16-hex tag, and looks for a byte-wise substring of that output in a later input. This is neither semantic lineage nor robust provenance: + - Append one byte to the head of the output: new hash, no match. + - Insert a whitespace character in the first 200 characters: no match. + - Summarise or translate the output before passing it onward: no match. + - Encode the output (base64, hex, quoted-printable) before using it: no match. +- **Contrast with CaMeL (DeepMind, 2025):** CaMeL uses typed capability objects, explicit control-vs-data-channel separation, and policy-checked re-entry of data into privileged sinks. The present implementation shares the *ambition* but not the *mechanism*. +- **Fix options:** + 1. Rename in docs: "opportunistic byte-matching of truncated output fingerprints (first 200 bytes, SHA-256/16-hex tag; not semantic lineage)." + 2. Strengthen the matching: use n-gram fingerprints over the full output, or content-defined chunking (CDC), or Rabin fingerprints, so that small edits still match. + 3. Split the feature: keep the current lightweight tag as `session-guard provenance hint`, and scope any future CaMeL-style tracking as a separate, clearly labelled module. +- **Severity:** HIGH (honesty). The current phrasing promises behaviour the code cannot deliver. Operators building threat models on that promise are mis-calibrated. + +--- + +## 3. Coverage Gaps + +This section lists threat vectors that the plugin *does not* attempt to cover, and where a reasonable operator might expect it to. The plugin's Defense Philosophy section (CLAUDE.md §"Defense Philosophy (v5.0)") is commendable for acknowledging that prompt injection is structurally unsolvable; what follows is narrower: specific vectors that are tractable but absent. + +### 3.1 Agent-to-Agent (A2A) injection + +- **Vector:** Two or more agents delegating work to each other via Task/Agent tools. An A2A injection attack plants malicious instructions in the *output* of one agent that a second agent then ingests as trusted context. +- **Where it should be covered:** A new hook or an extension of `post-session-guard.mjs` that tracks inter-agent delegation chains. Today, `post-session-guard` tracks delegation for the "escalation-after-input" advisory within a 5-call window (`post-session-guard.mjs:830-840`), but does not model the sub-agent's *outputs* as a potential injection source when re-entering the parent context. +- **Reference:** OWASP ASI02 (Agent orchestration abuse), DeepMind Agent Traps category 4 (delegation). +- **Suggested extension:** Hash the outputs of sub-agent invocations, scan them with the same injection-pattern matcher used by `post-mcp-verify.mjs`, and emit a MEDIUM advisory if matches are found in the parent context's next decision. + +### 3.2 Multi-modal injection (EXIF / image steganography / PDF) + +- **Vector:** Hidden instructions in image metadata (EXIF, XMP, IPTC), in image pixels (steganography), in PDF object streams, or in audio metadata (ID3 tags). An image attached to a prompt is not scanned today. +- **Where it should be covered:** A pre-tool-use hook that intercepts `Read` on binary formats and runs a targeted metadata scanner. +- **Reference:** 2025 research (`prompt-injection-research-2025-2026.md` would ideally cite the multi-modal injection work from OpenAI and DeepMind; today the knowledge file focuses on text vectors). +- **Suggested extension:** Add an `exif-scanner.mjs` / `pdf-object-scanner.mjs` that runs on `Read` of matching extensions; extract and scan text-bearing fields. + +### 3.3 MCP 2.0 OAuth attacks + +- **Vector:** MCP 2.0 introduces OAuth flows for MCP-server authentication. Client-side RBAC, consent phishing, and stale-token replay are the attack surfaces. +- **Where it should be covered:** `scanners/mcp-live-inspect.mjs` and `scanners/mcp-scanner-agent` system prompt. +- **Reference:** OWASP MCP10 (Insecure authentication), current MCP spec updates. +- **Suggested extension:** A static checker for OAuth config in `.mcp.json` and live-inspect probe for authorization server metadata endpoints. + +### 3.4 Skill marketplace poisoning (pre-deployment gate) + +- **Vector:** A compromised skill in a public marketplace (Claude Code plugin marketplace, Anthropic skill store, etc.) is installed by a developer. The payload is dormant until specific conditions are met. +- **Where it should be covered:** A `pre-install-skill.mjs` hook that scans skill manifests before installation, analogous to `pre-install-supply-chain.mjs` for packages. +- **Reference:** OWASP AST04 (Skill tampering), OWASP AST06 (Skill supply chain). +- **Suggested extension:** Integrate with the `plugin-audit` command to run on install, not only on demand. + +### 3.5 Terminal UI / ANSI escape injection + +- **Vector:** Tool output containing ANSI escape sequences that, when rendered in the developer's terminal, hide instructions (e.g., using cursor-move codes, colour-matching background, or OSC 52 to inject clipboard content). +- **Where it should be covered:** `post-mcp-verify.mjs` should strip or flag ANSI escape sequences before any output is passed to the developer. Today, the scanner checks for HTML and Unicode Tag steganography but not ANSI. +- **Reference:** 2025 terminal-injection work (iTerm2, Windows Terminal advisories). +- **Suggested extension:** Add an `ansi-strip` step in `post-mcp-verify` that either strips or raises MEDIUM on all sequences beyond a safe allowlist (SGR colour only). + +### 3.6 OAuth token exfiltration via MCP tools + +- **Vector:** A compromised MCP tool description instructs the model to read OAuth tokens from the environment or from keychain-mounted paths and transmit them via a subsequent tool call. +- **Where it should be covered:** `scanners/mcp-scanner-agent` and `post-mcp-verify.mjs`. +- **Reference:** DeepMind Agent Traps category 6 (exfiltration). The trifecta detection *partially* covers this when the three legs land within the 20-call window, but slow-burn exfiltration can span longer. +- **Suggested extension:** Tag MCP tools that *request* environment variables or credential-adjacent paths and enforce a per-session audit of their subsequent outputs. + +### 3.7 Prompt-cache poisoning + +- **Vector:** Shared prompt caches (Anthropic's prompt cache, OpenAI's cached prompts) can be seeded with attacker-controlled content that is then served to later queries in the same cache key. +- **Where it should be covered:** Out of scope for a local hook-based tool in a strict sense, but the plugin does not acknowledge this vector in its Defense Philosophy or Known Limitations. +- **Reference:** 2025 prompt-cache research. +- **Suggested extension:** Add a note in `docs/security-hardening-guide.md` and in the Defense Philosophy section explicitly noting that shared prompt caches are out of scope. + +### 3.8 Supply chain — Docker image layer inspection + +- **Vector:** A Dockerfile pulls an image; a malicious layer within the image contains secrets-harvesting or reverse-shell code. `pre-install-supply-chain.mjs` checks the image *name* against blocklists, not the image *contents*. +- **Where it should be covered:** Depth extension in `pre-install-supply-chain.mjs` or a dedicated `docker-layer-scanner.mjs`. +- **Reference:** 2024-2025 Sysdig and Snyk reports on malicious Docker Hub images. +- **Suggested extension:** Optional Trivy / Grype integration for image content scanning when the user opts in. + +### 3.9 Web dashboard / fleet policy server + +- **Vector:** Not a threat vector per se, but a capability gap. The plugin is per-machine; a CISO with 100 developers cannot see posture across the fleet, cannot push policy updates centrally, and cannot confirm that a security bulletin has propagated. +- **Reference:** Snyk, Semgrep Cloud, GitGuardian, Protect AI. All ship fleet-level consoles. +- **Suggested extension:** See §7 roadmap. + +### 3.10 Compliance evidence pack + +- **Vector:** EU AI Act Art. 15 (accuracy, robustness, cybersecurity) requires audit-ready evidence. The plugin produces JSONL audit trails (`lib/audit-trail.mjs`) and can emit SARIF, but does not produce compliance-ready PDF or DOCX reports with NIST AI RMF / ISO 42001 / EU AI Act mappings, timestamps, and signed attestation. +- **Reference:** EU AI Act Art. 15, NIST AI RMF 1.0, ISO 42001, MITRE ATLAS. +- **Suggested extension:** A `/security compliance-report [--framework eu-ai-act|nist|iso42001]` command that emits a signed PDF. + +--- + +## 4. Evasion Arsenal — 18 Proof-of-Concept Attacks + +The 18 PoCs below are summarised from the evasion-arsenal and hook-bypass review agents (`a2c19b9c36b5b955f`, `ad7770d76bb7df1f5`). Each entry states the attack class, a compact payload sketch (literal payload bytes intentionally redacted to avoid triggering the plugin's own hooks during documentation review), which hook or scanner should own the detection, why it misses today, and a fix. The PoCs are intended as defensive test cases; the attack-simulator (`scanners/attack-simulator.mjs`) is the right home for the full corpus. + +### E1 — U+F0000 Private Use Area steganography + +- **Class:** Unicode steganography. +- **Payload sketch:** Instructions encoded with characters from the Supplementary Private Use Area (U+F0000-F0FFF or U+100000-10FFFD) embedded in a prompt or tool output. +- **Owner:** `hooks/scripts/pre-prompt-inject-scan.mjs` (`containsUnicodeTags` or equivalent). +- **Why it misses:** The current detector covers U+E0000-E007F (Unicode Tag block) but not PUA-A / PUA-B. PUA characters are invisible in most terminals and often pass through normalisation. +- **Fix:** Extend the Unicode-range match to cover U+E0000-E007F, U+F0000-FFFFD, and U+100000-10FFFD. Emit MEDIUM for PUA presence in prompt or tool output. + +### E2 — Multi-turn temporal evasion + +- **Class:** Split injection. +- **Payload sketch:** Part 1 in prompt N establishes a rare trigger token. Part 2 in prompt N+5 associates that trigger token with a destructive action. +- **Owner:** `post-session-guard.mjs` (long-horizon window). +- **Why it misses:** The 100-call long-horizon window detects slow-burn *trifectas* (leg A and leg C > 50 calls apart) but does not correlate semantic instruction fragments across turns. +- **Fix:** Add an n-gram / keyword-memory tracker in `post-session-guard` that flags the re-use of a rare token across turns near a sink. + +### E3 — Base64-encoded instructions in source comments + +- **Class:** Indirect injection. +- **Payload sketch:** A source-code comment block whose content is a base64 blob; the LLM decodes the comment while reading the file and executes the decoded instructions. +- **Owner:** `post-mcp-verify.mjs` / `pre-prompt-inject-scan.mjs`. +- **Why it misses:** The current injection-pattern matcher inspects literal strings but does not attempt base64 / hex / rot13 decoding. +- **Fix:** Add an opportunistic decoder that attempts base64/hex/rot13 on long comment blocks and re-scans the decoded content. + +### E4 — Markdown link title-attribute injection + +- **Class:** Indirect injection via markup. +- **Payload sketch:** A Markdown link of the form `[text](url "injected instructions")` where the attacker's payload lives inside the title attribute (the quoted string after the URL). +- **Owner:** `post-mcp-verify.mjs`. +- **Why it misses:** The scanner inspects link URLs and anchor text, not the title attribute inside the quotes after the URL. +- **Fix:** Extend the markdown parser to extract link titles and scan them with the same injection matcher. + +### E5 — SVG `` inline injection + +- **Class:** Indirect injection via markup. +- **Payload sketch:** An SVG file with injected instructions inside a `` element (or ``, `<metadata>`, `<foreignObject>`). +- **Owner:** `post-mcp-verify.mjs` / `entropy-scanner.mjs`. +- **Why it misses:** v7.0.0 expanded the entropy scanner to skip `.svg` files (reduces false positives on base64 PNG payloads). The HTML content trap detection in `post-mcp-verify` does not cover SVG-specific tags. +- **Fix:** Treat SVG as HTML-like for injection scanning, not as a binary asset. Parse `<desc>`, `<title>`, `<metadata>`, and `<foreignObject>` and scan their text content. + +### E6 — CSS `content:` property injection + +- **Class:** Indirect injection via markup. +- **Payload sketch:** A CSS rule using the `content:` property to embed text instructions that the model reads. +- **Owner:** `post-mcp-verify.mjs` / new `css-scanner.mjs`. +- **Why it misses:** The v7.0.0 entropy scanner skips `.css/.scss/.sass/.less`, and no other scanner inspects CSS. +- **Fix:** Add a targeted CSS scanner that extracts `content` property values and scans them. Limit scope to tool output, not source-code files. + +### E7 — HTML comment payload injection + +- **Class:** Indirect injection via markup. +- **Payload sketch:** HTML comment nodes containing injection text. +- **Owner:** `post-mcp-verify.mjs` (HTML content trap detection). +- **Why it misses:** The HTML trap detector inspects visible tags and scripts; it does not systematically scan comment content. +- **Fix:** Extend the HTML parser to extract all comment nodes and feed them to the injection matcher. + +### E8 — Bash T7: process substitution + +- **Class:** Command obfuscation. +- **Payload sketch:** A destructive command hidden inside a `<(...)` or `>(...)` process-substitution expression. +- **Owner:** `hooks/scripts/pre-bash-destructive.mjs`, `scanners/lib/bash-normalize.mjs`. +- **Why it misses:** v5.0's `bash-normalize.mjs` covers T1-T6 (empty quotes, `${}`, backslashes, tabs, `${IFS}`, ANSI-C hex). Process substitution (`<(...)`, `>(...)`) is not normalised; the hostile command is never re-constructed before the destructive-command matcher runs. +- **Fix:** Add a T7 rule: collapse `<(...)` and `>(...)` into their inner command text for matching purposes. + +### E9 — Bash T8: base64 indirect exec + +- **Class:** Command obfuscation. +- **Payload sketch:** A pipeline of the form `echo <base64-blob> | base64 -d | bash` where the decoded payload is never visible in the raw command string. +- **Owner:** `pre-bash-destructive.mjs`. +- **Why it misses:** The decoded payload is never present in the raw command string before the decode step runs. +- **Fix:** Detect the `base64 -d | <shell>` idiom as HIGH per se, independent of payload content. A legitimate use is rare and can be allowlisted. + +### E10 — Bash T9: eval via variable indirection + +- **Class:** Command obfuscation. +- **Payload sketch:** Assign the destructive command string to a variable, then run `eval "$VAR"`. +- **Owner:** `pre-bash-destructive.mjs`, `bash-normalize.mjs`. +- **Why it misses:** `eval` is detected directly; `eval "$VAR"` with `VAR` assigned earlier requires simple forward-flow variable tracking, which is not performed. +- **Fix:** Add one-level variable tracking in `bash-normalize`: when a variable is assigned a string and then passed to `eval`, substitute the literal before matching. + +### E11 — GitHub Actions expression injection + +- **Class:** Workflow injection. +- **Payload sketch:** A Git commit message or pull request title containing a shell-break payload consumed by a workflow that uses `${{ github.event.head_commit.message }}` or similar in a `run:` block. +- **Owner:** A new `workflow-scanner.mjs` or an extension of `scanners/git-forensics.mjs`. +- **Why it misses:** No scanner inspects `.github/workflows/*.yml` for user-controlled expressions in `run:` contexts. +- **Fix:** Scan workflow YAML files for `${{ github.event.* }}`, `${{ github.head_ref }}`, etc., inside `run:` blocks and emit HIGH. + +### E12 — `.gitattributes` filter driver post-clone + +- **Class:** Supply-chain post-clone escalation. +- **Payload sketch:** A repository is cloned with `lib/git-clone.mjs`'s sandbox enabled; a `.gitattributes` file inside the clone references a filter/smudge driver that triggers arbitrary code on later checkouts outside the sandbox. +- **Owner:** `lib/git-clone.mjs` / `post-mcp-verify.mjs`. +- **Why it misses:** The sandbox covers the initial clone; `.gitattributes` written to the temp clone can be carried forward if the user moves the clone outside the temp dir. +- **Fix:** Scan `.gitattributes` inside the clone for `filter=` / `diff=` / `merge=` directives and raise MEDIUM if present, regardless of destination. + +### E13 — npm scoped-package lifecycle injection + +- **Class:** Supply-chain. +- **Payload sketch:** `npm install @benign-looking-scope/popular-name` where the scope is registered by the attacker and the package name matches a popular unscoped package. +- **Owner:** `pre-install-supply-chain.mjs`. +- **Why it misses:** The blocklist and typosquat detector operate on the unscoped name; scope-hopping (registering `@attacker/<popular-name>` when the real is unscoped `<popular-name>`) is not explicitly flagged. +- **Fix:** Detect install of `@scope/name` where `name` matches an unscoped top-100 package name and the scope is not the package's known official scope. + +### E14 — MCP description drift under the Levenshtein 10% threshold + +- **Class:** MCP rug-pull / slow-burn. +- **Payload sketch:** A hostile MCP server publishes a description drift of <= 9% per update, repeated weekly. After 10 weeks, the description is substantially different, but no single diff triggered the Levenshtein-drift alert. +- **Owner:** `scanners/lib/mcp-description-cache.mjs` and `post-mcp-verify.mjs`. +- **Why it misses:** Drift is measured per-update, not cumulatively against the oldest cached description. +- **Fix:** Also compute drift against the earliest cached description (baseline). Raise MEDIUM if cumulative drift >= 25% over any 12-week window. + +### E15 — Memory poisoning via `.claude/agents/<name>.md` + +- **Class:** Memory poisoning. +- **Payload sketch:** A malicious agent file in `.claude/agents/` with embedded instructions targeting session state. +- **Owner:** `scanners/memory-poisoning.mjs`. +- **Why it misses:** The memory-poisoning scanner inspects `CLAUDE.md`, top-level memory files, and `.claude/rules`, but not `.claude/agents/*.md`. +- **Fix:** Add `.claude/agents/*.md` to the scanner's target glob. + +### E16 — Pre-compact homoglyph bypass + +- **Class:** Obfuscation. +- **Payload sketch:** An injection prompt that replaces Latin letters with visually identical Cyrillic letters (e.g., Cyrillic `o` U+043E for Latin `o` U+006F). +- **Owner:** `pre-prompt-inject-scan.mjs` / a pre-compact variant. +- **Why it misses:** Pattern matches for well-known injection phrases are done on the raw input before normalisation in some code paths; homoglyph-normalisation is MEDIUM-advisory in v5.0 but not integrated into every matcher. +- **Fix:** Run NFKC / homoglyph-folding before every injection-pattern match, not only before the obfuscation advisory. + +### E17 — Escalation-after-input 6-call bypass + +- **Class:** Delegation. +- **Payload sketch:** Untrusted input at call N, then wait 6 tool calls, then delegate to a sub-agent at call N+6. +- **Owner:** `post-session-guard.mjs` (`checkEscalationAfterInput`). +- **Why it misses:** The advisory fires only within a 5-call window after the input event. +- **Fix:** Make the window configurable (env var `LLM_SECURITY_ESCALATION_WINDOW`) with a documented default of 5; also add a secondary longer-window advisory at MEDIUM severity for delegation within 20 calls. + +### E18 — Secret exfiltration via Markdown image URL (rule 18 abuse) + +- **Class:** Data exfiltration. +- **Payload sketch:** A Markdown image reference whose URL query string carries a high-entropy credential value, constructed so that v7.0.0's new "markdown image URL" suppression rule dismisses the high-entropy string. +- **Owner:** `scanners/entropy-scanner.mjs`. +- **Why it misses:** Rule 18 was added in v7.0.0 to suppress false positives on CDN-hosted image URLs. It dismisses legitimate high-entropy strings and also dismisses embedded secrets in the same shape. +- **Fix:** Refine rule 18 to only suppress strings that match the host/path structure of known CDN patterns (e.g., `cdn.*`, `images.*`, `*.amazonaws.com/s3/*`) and not arbitrary `https://...?...` query strings. Alternatively, run an explicit secret-pattern match inside the URL's query before suppression takes effect. + +--- + +## 5. Scoring Model Critique + +The v7.0.0 scoring rework is a substantive improvement over v1. The v1 model (`scanners/lib/severity.mjs` `riskScoreV1`, kept for reference) summed weighted counts and capped at 100 — any scan with more than a handful of findings collapsed to 100, making the score useless as a signal. The v2 model is severity-dominated (one critical always lands in the 70-95 tier) and log-scaled within tier (additional findings of the same severity increase the score but with diminishing returns). The design decisions below are nonetheless worth flagging. + +### 5.1 Tipping points (verified) + +From `severity.mjs:32-46`: + +| Input | Score | Verdict | Band | +|---|---:|---|---| +| `{}` | 0 | ALLOW | Low | +| `{low: 1}` | 4 | ALLOW | Low | +| `{low: 10}` | 11 | ALLOW | Low | +| `{medium: 1}` | 20 | WARNING | Medium | +| `{medium: 5}` | 28 | WARNING | Medium | +| `{medium: 50}` | 33 | WARNING | Medium | +| `{high: 1}` | 48 | WARNING | High | +| `{high: 5}` | 60 | WARNING | High | +| `{high: 7}` | 64 | WARNING | High | +| `{high: 8}` | 65 | **BLOCK** | Critical | +| `{high: 17}` | 65 | BLOCK | Critical | +| `{critical: 1}` | 80 | BLOCK | Critical | +| `{critical: 2}` | 86 | BLOCK | Critical | +| `{critical: 4}` | **93** (docs say 90 — see B4) | BLOCK | Extreme | +| `{critical: 10}` | 95 | BLOCK | Extreme | + +Observations: + +- The high → critical verdict transition at exactly 8 high findings is a sharp step. A scan with 7 high + 5 medium findings receives WARNING, band High, score 64. One additional high pushes it to BLOCK. For an adversary who wants to avoid BLOCK while landing real attacks, the practical ceiling is 7 high findings. +- The medium tier has an effective ceiling of 35 (medium=50 → 33; medium=1000 → 33 after rounding). A project with hundreds of medium findings looks identical in score to a project with five. This is the log-scaling doing its job; it is also a volume-blindness that should be documented. +- Info is scoring-inert (see B3). + +### 5.2 Adversarial under-BLOCK landings + +An attacker who understands the tier structure can optimise for: + +- **Land in the High band without crossing the BLOCK line.** 7 high findings + any number of medium + any number of low. This gives score = 64, verdict = WARNING. A developer habituated to WARNING is likely to proceed. +- **Land 1 critical and accept the BLOCK.** One false-positive critical (a test fixture secret, a deliberate placeholder) can block legitimate work. In that case the operator's reaction is to tune, suppress, or disable the offending rule — which reduces detection capacity for future real criticals. + +### 5.3 Tier gap + +The medium tier ends at 35 and the high tier begins at 40. A medium-heavy scan cannot reach the high band by accumulation alone; it must have at least one high finding. This is the explicit design intent ("severity-dominated") but has a consequence: a project with systemically many medium findings (e.g., a large legacy codebase) is perpetually scored "Medium" even when the cumulative risk is substantial. The v2 formula is honest about this; the docs should be, too. + +**Fix suggestion:** Add an escape hatch. +```js +if (medium >= 20) base = Math.max(base, 40); // medium volume bridge +``` +and document it. The bridge only fires when medium count is volumetrically notable; it does not suppress the severity-dominated principle. + +### 5.4 Verdict / band co-monotonicity + +The verdict thresholds at `severity.mjs:74-79` and band thresholds at `severity.mjs:93-99` are aligned (BLOCK >= 65 co-occurs with Critical/Extreme band; WARNING >= 15 co-occurs with Medium/High band). This is good: operators can reason about a single number. The alignment should be asserted in a test to prevent future drift. + +### 5.5 Info dead-weight + +See B3. The three options (document, supplementary score, floor contribution) each have trade-offs. The simplest honest change is documenting the behaviour. The most useful extension is a supplementary info-volume trend per scan target, exposed in the dashboard aggregator. + +--- + +## 6. Feature-Gap Matrix vs. Commercial Competitors + +The matrix below is derived from the market-analysis agent (`ade6518f8a6fcc0c6`). Vendors compared: Snyk, Semgrep, GitGuardian, Protect AI, Lakera, HiddenLayer. Each cell is Y / N / partial / (—). Status column: Leading / Competitive / Behind / Missing. + +| Feature | `llm-security` | Snyk | Semgrep | GitGuardian | Protect AI | Lakera | HiddenLayer | Status | +|---|---|---|---|---|---|---|---|---| +| SAST / code scanning | N | Y | Y | N | N | N | N | N/A (out of scope) | +| SCA / dependency audit | Y (dep-audit + supply-chain-recheck) | Y | Y | N | Y | N | N | Competitive | +| Secrets detection | Y (entropy + patterns) | partial | Y | Y | N | N | N | Competitive | +| IaC / workflow scanning | N | Y | Y | N | N | N | N | Behind | +| Container / image scanning | partial (name-blocklist only) | Y | N | N | Y | N | N | Behind | +| Web dashboard | N | Y | Y | Y | Y | Y | Y | **Behind (critical gap)** | +| Fleet policy server | N | Y | Y | Y | Y | Y | Y | **Behind (critical gap)** | +| IDE real-time scanning | partial (extension scan) | Y | Y | N | N | N | N | Behind | +| IDE extension scanning | Y (VS Code + JetBrains) | N | N | N | N | N | N | **Leading** | +| MCP static audit | Y | N | N | N | N | N | N | **Leading** | +| MCP live inspection | Y | N | N | N | N | N | N | **Leading** | +| Bash obfuscation normalisation | Y (T1-T6) | N | N | N | N | N | N | **Leading** | +| Unicode Tag steganography detection | Y | N | N | N | partial | partial | N | **Leading** | +| Prompt injection static scan | Y | N | N | N | partial | Y | Y | Competitive | +| Runtime prompt firewall / filter | N | N | N | N | Y | Y | Y | **Behind (critical gap)** | +| Model weight scanning | N | N | N | N | Y | N | Y | Behind | +| AI-BOM generation | Y (CycloneDX 1.6) | partial | N | N | Y | N | N | Competitive | +| Adaptive red-team harness | Y (64 scenarios + mutations) | N | N | N | Y | Y | N | **Leading** | +| Trifecta / Rule of Two detection | Y | N | N | N | N | N | N | **Leading** | +| Interactive threat modelling (STRIDE/MAESTRO) | Y | N | N | N | N | N | N | **Leading** | +| Sandboxed clone / VSIX fetch | Y | N | N | N | N | N | N | **Leading** | +| False-positive ML feedback loop | N | Y | Y | Y | N | N | N | Behind | +| SIEM-native integration | partial (SARIF + JSONL) | Y | Y | Y | Y | Y | Y | Behind | +| Enterprise ticketing (Jira, ServiceNow) | N | Y | Y | Y | N | N | N | Behind | +| Chat integration (Slack, Teams, PagerDuty) | N | Y | Y | Y | N | N | N | Behind | +| Compliance PDF/DOCX reports | N | Y | Y | Y | Y | N | Y | **Behind (critical gap)** | +| EU AI Act audit-evidence pack | N | N | N | N | partial | N | partial | Behind | +| Offline / air-gapped operation | Y | N | partial | N | N | N | N | **Leading** | +| Open source + MIT | Y | N | partial (Semgrep CE) | N | N | N | N | **Leading** | + +**Summary:** `llm-security` leads on 11 features, is competitive on 5, behind on 8 (of which 4 are critical gaps: web dashboard, fleet policy server, runtime firewall, compliance reporting), and out of scope on 1 (SAST). + +The plugin is genuinely unique in its combination of MCP auditing, IDE extension prescan, trifecta detection, bash evasion normalisation, interactive threat modelling, sandboxed remote fetch, AI-BOM generation, and an adaptive red-team harness — no competitor in the matrix combines these. The gap is not capability *per se*; it is enterprise integration, central visibility, and compliance deliverables. + +--- + +## 7. Roadmap Recommendation — 10 Features Ranked + +Ranking criterion: **(market value) x (strategic differentiation) / complexity**. Ties broken by dependency-order (features that unblock others rank higher). + +### Rank 1 — Web dashboard + fleet policy server + +- **Problem:** The plugin is per-machine. A CISO with a fleet of developers cannot see aggregate posture or push policy updates. The most frequent objection from enterprise review is "we can't see it". +- **Complexity:** **L**. Requires a server (Node, Deno, or Go), a persistence layer (SQLite minimum, Postgres for multi-tenant), an auth model (OIDC / SAML), and a policy-push protocol. Dashboard UI is separate work. +- **Market value:** **Critical**. Without this, the plugin cannot be sold to regulated enterprise. +- **Dependencies:** None; the plugin already emits SARIF and JSONL that can be ingested. +- **Suggested phasing:** (a) Read-only dashboard consuming `reports/` JSONL + SARIF; (b) authenticated fleet policy server; (c) web UI. + +### Rank 2 — Runtime prompt firewall mode + +- **Problem:** All current protection is static. A running Claude Code session receives a prompt injection via tool output; `post-mcp-verify.mjs` detects and warns, but the content has already been ingested by the model. A *prompt firewall* would sit in the IO path and strip / rewrite injections before the model sees them. +- **Complexity:** **L**. Requires intercepting the tool-output stream, applying a fast classifier, and rewriting or refusing the content. Performance constraints are strict (<50 ms per intervention). +- **Market value:** **Critical**. Lakera Guard, Protect AI, and Prompt Guard sell primarily on this capability. +- **Dependencies:** Claude Code hook API stability; possibly a new hook event for intercepting tool output streams. +- **Suggested phasing:** (a) Offline batch mode (rewrite after the fact, for post-hoc demonstration); (b) online real-time mode. + +### Rank 3 — IDE real-time scanning (VS Code + JetBrains plugins) + +- **Problem:** The plugin can scan *installed extensions* but cannot scan the developer's *own code in the editor*. Snyk and Semgrep both ship IDE plugins that scan on save and annotate with squigglies. +- **Complexity:** **M**. Requires VS Code and JetBrains plugin shells, LSP-style integration with the existing scanners, and bidirectional config sync. +- **Market value:** **High**. Developer ergonomics; shifts scanning from an occasional CLI run to a continuous feedback loop. +- **Dependencies:** Stable CLI entry point (already present: `bin/llm-security.mjs`). + +### Rank 4 — Compliance reporting pack + +- **Problem:** EU AI Act Art. 15 audit evidence, NIST AI RMF and ISO 42001 mappings, and MITRE ATLAS heat-maps are all CISO-facing deliverables. Today the plugin emits JSONL and SARIF. +- **Complexity:** **M**. Template-driven PDF / DOCX generation with signed timestamps. +- **Market value:** **High**. Required for regulated verticals; unlocks health, finance, public sector. +- **Dependencies:** Audit-trail module (`lib/audit-trail.mjs`), posture scanner, OWASP mappings — all already present. + +### Rank 5 — Enterprise ticketing + chat integrations + +- **Problem:** Findings reach developers but not tracking systems. No Jira / ServiceNow / Slack / Teams / PagerDuty push today. +- **Complexity:** **S-M**. Each integration is small; the aggregate is medium. +- **Market value:** **High**. Operational requirement for any security team that runs ticket-driven workflows. +- **Dependencies:** Policy / routing configuration in `.llm-security/policy.json`. + +### Rank 6 — False-positive ML feedback loop + +- **Problem:** Suppressions today are static (`.llm-security/policy.json` rules, file-level skip, entropy rule allowlists). Snyk and Semgrep learn from user dismissals and re-rank findings. +- **Complexity:** **M**. Requires a feedback record format, a light ranker (gradient-boosted trees or a simple logistic model), and a privacy-preserving training loop (on-device preferred). +- **Market value:** **Medium-High**. Reduces noise, which is the biggest operational tax on static scanners. +- **Dependencies:** Dashboard (rank 1) helps with multi-machine aggregation, but a local-first version is possible. + +### Rank 7 — Multi-modal / EXIF / PDF injection scanner + +- **Problem:** Image and PDF inputs are not scanned. Multi-modal injection is a 2025-2026 growth vector. +- **Complexity:** **M**. Pure metadata scanning is small; pixel-level steganography detection is larger and probably out of scope for v1. +- **Market value:** **Medium-High**. Competitive differentiator if delivered early. +- **Dependencies:** `ExifTool` or a Node-native EXIF parser; a PDF object parser. + +### Rank 8 — MCP 2.0 OAuth audit + +- **Problem:** MCP 2.0 OAuth introduces new attack surfaces (consent phishing, token replay, scope creep). +- **Complexity:** **S-M**. Extension to `mcp-live-inspect.mjs` and `scanners/mcp-scanner-agent`. +- **Market value:** **Medium-High**. Aligned with MCP 2.0 adoption timeline. +- **Dependencies:** MCP 2.0 spec stability. + +### Rank 9 — Terminal / ANSI escape injection scanner + +- **Problem:** Tool output rendered in the terminal can hide instructions via ANSI escapes. Not covered today. +- **Complexity:** **S**. A regex-based stripper + allowlist for safe SGR sequences. +- **Market value:** **Medium**. Low-frequency vector but high-impact when it lands. +- **Dependencies:** None. + +### Rank 10 — Skill marketplace pre-deployment gate + +- **Problem:** A skill pulled from a public marketplace can be compromised. Today the plugin offers `/security plugin-audit` on demand; it does not hook skill installation. +- **Complexity:** **S-M**. Requires knowledge of Claude Code's skill install event (or a wrapper CLI). +- **Market value:** **Medium**. Protective against a vector that will grow as the marketplace grows. +- **Dependencies:** Claude Code skill install hook availability. + +--- + +## 8. Code-Quality Observations + +### 8.1 Documentation / arithmetic mismatch + +- **`CHANGELOG.md:11` and `severity.mjs:23` JSDoc** — see B4. The arithmetic disagrees with the code. The discrepancy undermines the v7.0.0 trustworthy-scoring headline. + +### 8.2 Hook configurability vs. documentation + +- **`post-session-guard.mjs:814-826`** — the `block` path is gated by `(mcpInfo.concentrated || sensitiveExfil)`. The CLAUDE.md Hooks table describes `block|warn|off` without that qualifier. See B2. + +### 8.3 Regex precision + +- **`pre-write-pathguard.mjs:21-25`** — the `ENV_PATTERNS` regex is too restrictive. See B1. + +### 8.4 Dead / low-value code + +- **`severity.mjs:55-63`** — `riskScoreV1` is kept for diff/comparison. The function is unused in production paths but exported. Consider marking `@deprecated` in JSDoc or moving to `docs/legacy/` to signal intent. + +### 8.5 Test gaps + +- The scoring tipping points in §5.1 are covered by existing tests per the CLAUDE.md "1487 tests" claim, but there is no test that pins **4 critical = 93** (the documented wrong value would not be caught by a test that asserts the documented value). +- No test that pins the verdict/band co-monotonicity invariant from §5.4. +- No mutation-testing coverage numbers are published (CLAUDE.md mentions unit/integration tests only). +- The destructuring / spread case in B6 has no test coverage in `taint-tracer`. +- The path-guard `.env.X.Y.Z` case in B1 has no test coverage in the pathguard test suite. +- The distributed-trifecta (non-MCP-concentrated, non-sensitive-path) BLOCK-mode behaviour in B2 is not asserted by a test. + +### 8.6 Duplication + +- **`lib/git-clone.mjs`** and **`lib/vsix-sandbox.mjs`** share the same sandbox-profile construction logic. v6.5.0's consolidation was partial; a small amount of duplication remains. A shared `lib/sandbox.mjs` would reduce the risk of divergence. + +### 8.7 Configuration surface + +- Multiple env vars (`LLM_SECURITY_INJECTION_MODE`, `LLM_SECURITY_TRIFECTA_MODE`, `LLM_SECURITY_UPDATE_CHECK`, etc.) are scattered across hooks. A single `.llm-security/policy.json` section that mirrors them would reduce surprise. + +### 8.8 Naming + +- `computeDataTag` at `post-session-guard.mjs:655` is technically correct but the surrounding comments (line 646: "CaMeL-inspired data flow tagging") set an expectation the function cannot meet. Either the function is renamed to `computeOutputFingerprint` or the comments are toned down. See B8. + +### 8.9 Error handling + +- The scanners consistently use JSON output, but error recovery is uneven. A malformed `.llm-security/policy.json` leads to scanner warnings in some paths and silent fallbacks in others. A single `readPolicy()` helper with a documented fallback chain would reduce ambiguity. + +### 8.10 Cross-reference bit-rot risk + +- `CLAUDE.md` contains a very detailed table of hooks, scanners, and knowledge files with line-level claims. A change to a single hook behaviour requires updates in four places (hook source, CLAUDE.md, README, CHANGELOG). A lightweight doc-consistency test (e.g., a script that asserts that the number of hooks listed in `.claude/hooks.json` matches the table in CLAUDE.md) would catch drift. + +--- + +## 9. Honesty Check + +The following quotes were verified in the referenced source files during this review. Each is paired with the behaviour the code actually implements, and with a suggested replacement that trades slightly more words for a substantially more accurate description. + +| # | Quote (source) | Reality | Suggested replacement | +|---|---|---|---| +| 1 | "enforces the Rule of Two" — CLAUDE.md:182 | `TRIFECTA_MODE` default is `warn`; `block` is opt-in; even in `block`, only high-confidence (MCP-concentrated OR sensitive-path) trifectas actually block. See B2. | "detects Rule of Two violations; blocks on high-confidence (MCP-concentrated or sensitive-path) trifectas in opt-in `block` mode; warns otherwise. Distributed trifectas are detected but not blocked by default." | +| 2 | "Fully Schrems II compatible" — CLAUDE.md:136, README.md | Standalone CLI is offline by default, but OSV.dev lookups in `supply-chain-recheck.mjs` transmit package identifiers to a Google-operated API. `ci-cd-guide.md` is accurate; CLAUDE.md is not. | "Schrems II compatible in default offline mode. The optional OSV.dev enrichment (supply-chain-recheck with `--online`) transmits package identifiers to a Google-operated API and is a separate compliance consideration." | +| 3 | "CaMeL-inspired data flow tagging (SHA-256 provenance tracking, output to input linking)" — CLAUDE.md:184 | `computeDataTag` at `post-session-guard.mjs:655` hashes the first 200 characters of tool output; `flowMatch` checks for a substring match in later inputs. This is byte-matching with early truncation, not semantic lineage. See B8. | "Opportunistic byte-matching of truncated output fingerprints (first 200 bytes of tool output, SHA-256/16-hex tag). This is a lightweight heuristic, not semantic data-flow lineage; it fails on any output mutation or summarisation." | +| 4 | "defense-in-depth" — multiple locations | Accurate in spirit. The plugin does layer prompt-scan, pathguard, trifecta-guard, and post-mcp-verify. The claim is not quantified. | "Three independent detection layers with documented bypass classes (see `docs/security-hardening-guide.md` §6). Each layer is individually bypassable; the design intent is to raise attack cost, not to be a single enforcement line." | +| 5 | "Hardened ZIP extractor (zip-slip, symlink, bomb, ratio caps)" — CLAUDE.md §IDE scanner | Verified via `lib/zip-extract.mjs`. Bombs, zip-slip, symlinks, absolute paths, drive-letter paths, encrypted entries, and ZIP64 are all rejected. **However** no public fuzz-testing results are published. | "Hardened ZIP extractor rejects zip-slip, symlink, absolute / drive-letter paths, encrypted entries, and ZIP64 bombs (capped at 10 000 entries, 500 MB uncompressed, 100 x ratio, depth 20). No fuzz-testing results published to date." | +| 6 | "1487 tests" — CLAUDE.md header | Accurate count. No mutation-testing coverage published. | "1487 unit and integration tests. Mutation-testing coverage is not published; the number is a test count, not a coverage metric." | +| 7 | "trustworthy scoring" — CHANGELOG v7.0.0 | Genuine improvement over v1, but the headline formula example is arithmetically wrong (4 critical = 93, not 90). See B4. | "Severity-dominated, log-scaled v2 scoring formula. Replaces the v1 sum-and-cap model that saturated all non-trivial scans to 100 / Extreme. See `severity.mjs` for the authoritative formula." | +| 8 | "1 critical = 80, 2 = 86, 4 = 90, 10 = 95" — CHANGELOG.md:11, severity.mjs:23 | `riskScore({critical: 4}) = 93`. See B4. | Fix the arithmetic: `1 = 80, 2 = 86, 4 = 93, 10 = 95`. | +| 9 | "Context-aware entropy scanner" — CLAUDE.md:7 | Extension skip + 8 line-suppression rules. Accurate description. "Context-aware" is a slightly generous framing for what is largely a rule-and-allowlist pipeline. | "Rule-based entropy scanner with file-extension skip, 8 line-level suppression rules, and configurable policy." | +| 10 | "calibration block reports skip counters" — CLAUDE.md:7 | Verified. Accurate. | No change. | + +The honest headline for v7.0.0 should read approximately: + +> v7.0.0 replaces a broken scoring formula with a severity-dominated log-scaled model, expands the entropy scanner's suppression rules, and extends the typosquat allowlist. Two residual hook bugs and three over-claimed docs items (CaMeL provenance, Schrems II, Rule of Two "enforcement") remain and should be addressed in v7.1. + +--- + +## 10. CISO Perspective + +**Question:** would a CISO in a regulated enterprise (financial services, healthcare, public sector, defence) purchase / install `llm-security` v7.0.0 today? + +**Answer in two parts.** + +### 10.1 Yes, conditionally + +The plugin is a credible baseline for: + +- **Individual developers and small teams** already using Claude Code who want a free, open-source, offline-first second line of defence. +- **Security research groups** studying prompt injection, trifecta detection, MCP auditing, and Claude-Code-specific threat surfaces. The adaptive red-team harness and interactive threat modeller have no commercial equivalent. +- **Air-gapped or Schrems-II-sensitive environments** where sending data to cloud providers (Snyk, Semgrep Cloud, GitGuardian) is the showstopper. Standalone CLI mode is genuinely offline in default config. +- **Norwegian public-sector pilots.** The `knowledge/norwegian-context.md` file aligns with Datatilsynet, NSM, and Digitaliseringsdirektoratet expectations. With the caveats in §9 corrected, the plugin is a defensible pilot baseline. + +In these contexts, the plugin clears the bar: install, configure, review, ship. + +### 10.2 No, not yet for regulated enterprise + +A CISO at a bank, insurer, hospital, national-security agency, or large public-sector body would decline the plugin in its current form. The blockers, in rough priority order: + +1. **No central dashboard.** Security must be observable across a fleet. The plugin's reports live on each developer's machine. Even the dashboard-aggregator caches to a local file. A CISO cannot prove to an auditor that all 300 developers ran the scan in the last 30 days. +2. **No fleet policy push.** Policy lives in per-machine `.llm-security/policy.json`. A change (e.g., raising the trifecta mode from `warn` to `block` after an incident) must be rolled out by developer action, not by central push. +3. **No SIEM-native integration.** The plugin writes JSONL (`lib/audit-trail.mjs`). Forwarding to Splunk, Sentinel, Elastic, or QRadar requires a custom collector. Commercial competitors ship native connectors. +4. **No compliance-ready reporting.** EU AI Act Art. 15 audit evidence, NIST AI RMF or ISO 42001 attestations, MITRE ATLAS heat-maps — none are produced today. SARIF and JSONL are technical artefacts, not audit evidence. +5. **No runtime protection.** All current protection is static; once a prompt injection lands, detection is post-hoc. Regulators increasingly expect runtime controls (prompt firewalls, content filters, output guardrails). +6. **Claim-precision issues.** The three honesty items in §9 (CaMeL provenance, Schrems II, Rule of Two "enforcement") would be challenged in a formal procurement review and would require written clarification. +7. **Bugs B1 and B2.** A deterministic regex hole in a sensitive-path guard and a block-mode bypass on distributed trifectas are both things a pen-testing firm would find in the first week of engagement. They would not block procurement outright, but they would reduce confidence in the maturity story. + +### 10.3 What a CISO would require before a production engagement + +In priority order: + +1. **Fix B1 (pathguard regex), B2 (distributed-trifecta block), B4 (CHANGELOG arithmetic), B8 (CaMeL docs), and the honesty quotes in §9.** These are low-cost and high-signal; they close the gap between documentation and code. +2. **Ship a read-only web dashboard consuming existing JSONL / SARIF.** Even v0.1 of a dashboard unblocks most of the fleet-visibility objection. +3. **Produce a compliance-evidence pack template.** A PDF that mirrors posture-scanner output, OWASP category mapping, and audit-trail events — signed, timestamped, and frameable. +4. **Document runtime-protection gap explicitly.** Add to Defense Philosophy: "This plugin is a static + hook-based layer. Runtime prompt filtering, model-level guardrails, and egress DLP are out of scope and must be addressed by complementary controls." +5. **Publish a security-architecture note.** Diagram showing how the hooks compose, what each hook can and cannot see, and the explicit defence layers. One page. This is the single most asked-for artefact in enterprise security review. +6. **Commit to a 90-day bug-disclosure window.** A named security contact (security@... or equivalent Forgejo path) and a documented handling SLA. + +### 10.4 One-paragraph verdict + +`llm-security` v7.0.0 is a serious contribution to open-source AI security tooling and is among the few plugins that address MCP, Claude Code hooks, and IDE extension provenance as first-class problems. It is not yet enterprise-ready. It is close enough that a B+ grade is within reach after one focused release cycle: fix B1 / B2 / B4 / B8, tone the docs to match the code, ship a minimal dashboard. Without that cycle, the plugin stays at B- and remains primarily a tool for individual developers, researchers, and pilot teams. + +--- + +## Appendix A — Verification Log + +The following claims in this report were verified by reading the referenced source directly during the review, not relying solely on the earlier review agents: + +- B1 pathguard regex — `hooks/scripts/pre-write-pathguard.mjs:21-25` read directly. +- B2 distributed-trifecta block gate — `hooks/scripts/post-session-guard.mjs:800-828` read directly. +- B4 arithmetic — `scanners/lib/severity.mjs:32-46` read directly; computation `70 + min(25, log2(5)*10) = 93.22 rounded to 93` performed manually; CHANGELOG.md:11 and severity.mjs:23 JSDoc read directly to confirm the "4=90" string. +- B8 CaMeL substring match — `hooks/scripts/post-session-guard.mjs:649-680` read directly; CLAUDE.md:184 read directly. +- Honesty quotes §9 items 1, 2, 3 — CLAUDE.md:136, 182, 184 read directly. + +Findings B3, B5, B6, B7 and the evasion-arsenal PoCs E1-E18 are reported here with the file:line anchors provided by the specialist review agents (`a12f1a90430b53a8c`, `a2c19b9c36b5b955f`, `ad7770d76bb7df1f5`, `af0552023740ad6b7`). They were not each re-verified by direct source inspection during the synthesis step; the reader should treat those anchors as inspection targets rather than fully re-verified facts. The top-5 most serious findings in the Executive Summary (B1, B2, B8, "Fully Schrems II", B4) were all directly verified. + +## Appendix B — What was not reviewed + +- Performance characteristics of the scanners under large repositories (>1M LOC). +- Windows behaviour (the reviewer is on macOS / Darwin). The plugin documents Windows fallbacks; they were not exercised. +- Cross-platform compatibility of the sandbox (bwrap on Ubuntu 24.04+ is documented as flaky; not exercised). +- Deep inspection of the AI-BOM output against CycloneDX 1.6 schema validators. +- Inter-operation with MCP servers running under different authentication schemes. +- The `threat-modeler-agent` interactive flow beyond a single synthetic run. +- The `/security harden` auto-generation output against a diverse set of starting configurations. + +These are candidates for a follow-up review cycle.