ktg-plugin-marketplace/plugins/llm-security/docs/critical-review-2026-04-20.md
Kjell Tore Guttormsen a6e2c939ef docs(llm-security): add critical review 2026-04-20 (v7.0.0 adversarial audit)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-19 23:27:52 +02:00

62 KiB
Raw Blame History

Critical Review — llm-security v7.0.0

Date: 2026-04-20 Scope: Adversarial audit of the llm-security Claude Code plugin, version 7.0.0 (released 2026-04-19). Method: Six parallel specialist review agents (scanner bug hunt, hook-bypass arsenal, evasion PoC arsenal, honesty check, market analysis, scoring-model adversarial), followed by manual verification of the most consequential claims by reading the referenced source lines directly. Subject context: The plugin packages 5 OWASP-style taxonomies (LLM / ASI / AST / MCP / DeepMind Agent Traps), 10 orchestrated scanners, 8 hooks, an interactive threat modeller, an attack simulator, a CLI, a dashboard aggregator, an AI-BOM generator, and a new v7.0.0 scoring model that replaces the v1 "sum-and-cap" formula. The claim under review: that v7.0.0 delivers "trustworthy scoring" and a defensible security posture for Claude Code environments.


1. Executive Summary

Overall grade: B-.

llm-security v7.0.0 is a capable, well-architected, offline-first developer-facing security tool. The breadth is unusual: the project ships functionality (MCP live inspection, IDE extension scanning, bash evasion normalization, Unicode Tag steganography detection, trifecta detection, AI-BOM generation, adaptive red-team harness) that no single commercial competitor combines in one package. The v7.0.0 scoring rework is a genuine improvement over v1 — the old "sum-and-cap" formula did collapse every non-trivial scan to 100 / Extreme, and the new log-scaled, severity-dominated model produces defensible bands for realistic scans.

The grade falls short of B+ or A-minus for four specific reasons:

  1. Two real HIGH-severity hook bugs let hostile writes and distributed trifectas slip past what the documentation promises as blocking behaviour.
  2. Three honesty issues inflate claims beyond what the code delivers: the "SHA-256 provenance tracking" is a 200-byte substring fingerprint, "Fully Schrems II compatible" ignores the Google-operated OSV.dev API, and "Rule of Two enforcement" is an opt-in warning in default config.
  3. Scoring doc arithmetic is wrong in a way that undercuts the "trustworthy scoring" headline: the formula yields 93 for 4 criticals, the documentation says 90.
  4. Coverage gaps against 2026 threats (A2A injection, multi-modal / EXIF, MCP 2.0 OAuth, terminal ANSI injection, skill marketplace poisoning) are not acknowledged; the plugin is honest about general limitations of prompt-injection defence but silent about these specific vectors.

Top 5 most serious findings (detail in §2, §4, §9)

  1. HIGH — pre-write-pathguard.mjs:23 regex hole lets *.env.production.local.backup and any *.env.X.Y.Z variant through unblocked.
  2. HIGH — post-session-guard.mjs:816 gated block downgrades distributed (non-MCP-concentrated, non-sensitive-path) trifectas to WARN even when LLM_SECURITY_TRIFECTA_MODE=block is set.
  3. HIGH (honesty) — post-session-guard.mjs:655 "CaMeL-inspired provenance" is a 200-byte SHA-256 substring fingerprint, not data-flow lineage. Trivially bypassed by appending one byte.
  4. MEDIUM (honesty) — CLAUDE.md:136 / README.md "Fully Schrems II compatible" ignores OSV.dev (Google-operated) opt-in.
  5. LOW (arithmetic / honesty) — CHANGELOG.md:11 + severity.mjs:23 JSDoc state 4 critical = 90 when the formula evaluates to 93.

Top 5 most valuable missing features

  1. Web dashboard + fleet policy server — the plugin is machine-local; enterprise security teams require central visibility and policy push.
  2. Runtime prompt firewall / filter — all current protection is static; Lakera and Protect AI ship runtime filters.
  3. IDE real-time scanning (VS Code + JetBrains) — the plugin can scan installed extensions, but not offer live in-editor scanning of the developer's own code, which is table stakes for Snyk and Semgrep.
  4. Compliance reporting pack (PDF/DOCX, EU AI Act Art. 15 audit evidence) — CISO-facing deliverables are absent; only SARIF / JSONL exist today.
  5. Enterprise incident integrations (Jira, ServiceNow, Slack, Teams, PagerDuty) — today only SARIF upload is supported.

2. Critical Bugs and Vulnerabilities

This section lists verified findings with file:line references, proof-of-concept payloads, suggested fixes, and severity. Findings B1, B2, B4, and B8 were verified by reading the referenced source directly during this review. B3, B5, B6, B7 were verified by the scanner-bug agent (a12f1a90430b53a8c) and are reported here with its file:line anchors.

B1 — Pathguard regex miss on multi-segment .env.*.local.* — HIGH

  • File: hooks/scripts/pre-write-pathguard.mjs:21-25
  • Code:
    const ENV_PATTERNS = [
      /[\\/]\.env$/,
      /[\\/]\.env\.[a-z]+$/,          // matches .env.X only when X is a single [a-z]+ segment
      /[\\/]\.env\.local$/,
    ];
    
  • Defect: The second pattern anchors $ immediately after [a-z]+, so any file name with more than one segment after .env is not matched. Digits, dots, hyphens, and uppercase characters in the suffix also fail to match.
  • PoC payloads that slip past the hook:
    • Write to .env.production.local.backup
    • Write to .env.development.local.old
    • Write to .env.prod.local.bak
    • Write to .env.stage-1.local
    • Write to .env.CI.secret
  • Impact: A prompt-injected agent can exfiltrate or overwrite environment secrets by choosing any of the variants above. The hook's purpose is to be a last-line path guard; this regex undercuts that.
  • Fix: replace with
    /[\\/]\.env(\.[A-Za-z0-9._-]+)*$/
    
    or, cleaner, match by basename prefix:
    if (basename(path).toLowerCase().startsWith('.env')) { block(); }
    
  • Severity: HIGH. Deterministic bypass of a hook that is documented as a gate for environment secrets.

B2 — Distributed trifecta is advisory even in block mode — HIGH

  • File: hooks/scripts/post-session-guard.mjs:814-826
  • Code:
    if (TRIFECTA_MODE === 'block' && (mcpInfo.concentrated || sensitiveExfil)) {
      process.stderr.write('BLOCKED: Rule of Two violation ...');
      process.stdout.write(JSON.stringify({ decision: 'block' }));
      process.exit(2);
    }
    
  • Defect: The block path is gated on a high-confidence sub-condition — MCP-concentrated (all three legs via the same MCP server) OR a sensitive-path + exfil pair. A trifecta that distributes its three legs across different sources (e.g., untrusted leg from WebFetch, sensitive data leg from Read to a non-sensitive path, exfiltration leg from Bash curl to a non-sensitive sink) is detected (the warning is emitted at line 803), but never blocked.
  • PoC scenario:
    1. Agent reads untrusted content from a GitHub issue via MCP server A.
    2. Agent reads user data from ./user-db.sqlite (not a sensitive path per the scanner's static list).
    3. Agent exfiltrates via curl to a newly registered domain (not on the known-sink allowlist). All three legs are detected. The hook emits formatWarning(...). mcpInfo.concentrated is false (different servers), sensitiveExfil is false (no ~/.ssh, ~/.aws, etc., and no obvious cred exfil). Result: the block branch is skipped, the caller proceeds.
  • Impact: Users who configure LLM_SECURITY_TRIFECTA_MODE=block reasonably expect that any detected lethal trifecta is blocked. In the current code, only a subset is. The documentation (CLAUDE.md §"Hooks", line 58) describes block|warn|off without qualification. This is a mismatch between documented behaviour and code behaviour.
  • Fix options:
    1. Strict: remove the (mcpInfo.concentrated || sensitiveExfil) gate inside the block branch — block on any detected trifecta in block mode.
    2. Tiered: expose a second env var, e.g., LLM_SECURITY_TRIFECTA_BLOCK_STRICTNESS=high|all, and document that block today implements high only.
    3. Update the documentation in CLAUDE.md and README.md to make the high-confidence gate explicit, so the mismatch is removed.
  • Severity: HIGH. False sense of security for any operator who enables block.

B3 — riskScore({info: N}) = 0 silently masks info-volume findings — MEDIUM

  • File: scanners/lib/severity.mjs:32-46
  • Code: The riskScore function inspects critical, high, medium, low. The info count is ignored.
  • Defect: Any scanner that mis-tiers findings as info contributes nothing to the risk score, the verdict, or the band. A scanner configured incorrectly (or an adversary who targets a scanner's tiering logic, e.g., by crafting strings that match a tier_downgrade heuristic) can accumulate arbitrary numbers of findings without affecting the verdict.
  • Honest characterisation: Ignoring info in a risk aggregate is a reasonable design choice on its own. The problem is the combination with (a) the info severity being a legitimate tier in SEVERITY (line 4-10), (b) the owaspCategorize function (line 218) counting info findings, and (c) no documentation anywhere stating that info is scoring-inert. An operator looking at a report that counts 400 info findings has no way to know these contribute zero to the final band.
  • Fix options:
    1. Document explicitly in severity.mjs JSDoc and CLAUDE.md that info is excluded from scoring.
    2. Add an infoScore() helper that returns a supplementary 0-10 score — useful for trend monitoring without affecting verdicts.
    3. Add a floor contribution: e.g., score = max(score, 1 + min(5, log2(info + 1) * 1.5)) when info >= 50, so large info volumes produce at least a Low band.
  • Severity: MEDIUM. This is primarily a honesty / observability issue, not a detection issue.

B4 — CHANGELOG / JSDoc tier example arithmetic is wrong — LOW (honesty)

  • Files: CHANGELOG.md:11, scanners/lib/severity.mjs:23 (JSDoc), CLAUDE.md:7
  • Claim: "Critical present → 7095 (1=80, 2=86, 4=90, 10=95)"
  • Verification: From riskScore at severity.mjs:32-46:
    base = 70 + min(25, log2(critical + 1) * 10)
    critical=1 → 70 + min(25, log2(2)*10)  = 70 + 10.00   = 80.00  → 80   ok
    critical=2 → 70 + min(25, log2(3)*10)  = 70 + 15.85   = 85.85  → 86   ok
    critical=4 → 70 + min(25, log2(5)*10)  = 70 + 23.22   = 93.22  → 93   mismatch (docs say 90)
    critical=10 → 70 + min(25, log2(11)*10) = 70 + 34.59   → capped → 95  ok
    
  • Defect: The "4 = 90" entry in the CHANGELOG, the JSDoc at severity.mjs:23, and the CLAUDE.md summary at line 7 all misstate the formula's output. The formula returns 93.
  • Fix: Either (a) correct all three doc locations to 4=93, or (b) adjust the formula (e.g., log2(critical+1) * 9 or a different carry) to actually yield 90 for 4 criticals. Option (a) is strongly preferred; the formula is the ground truth and the docs follow.
  • Severity: LOW. But corrosive: the v7.0.0 pitch is "trustworthy scoring", and the flagship documentation example is arithmetically wrong.

B5 — Entropy scanner can miss secrets inside code-heavy files that look GLSL-shaped — MEDIUM

  • File: scanners/entropy-scanner.mjs:237-238, suppression list at lines 30-34.
  • Defect: The v7.0.0 entropy scanner skips files with extensions .glsl|.frag|.vert|.shader|.wgsl|.css|.scss|.sass|.less|.svg|.min.*|.map to cut shader / CSS / minified-JS false positives. The per-line suppression rules (inside the scanner) include GLSL-keyword heuristics. Where this goes wrong: a .ts file containing inline shader source or CSS-in-JS templates can accumulate line-level suppressions while a genuine high-entropy secret embedded in the same file is dismissed because the context "reads GLSL-like".
  • PoC sketch: A TypeScript file containing an inline GLSL-shaped block and a credential-looking high-entropy string on the next line. The line containing the credential has a GLSL-keyword-bearing neighbour; the line suppression heuristic can short-circuit the classification.
  • Fix: Replace line-proximity suppression with a two-stage pipeline: first classify the file context (shader-dominant vs code-dominant vs markup-dominant), then apply per-line rules scoped to that context. Do not allow GLSL-suppression rules to fire inside .ts / .js / .py / .go files.
  • Severity: MEDIUM. Real false-negative risk in polyglot files common in modern frontend projects.

B6 — Taint-tracer ignores destructured and spread assignments — MEDIUM

  • File: scanners/taint-tracer.mjs:175-182 (extraction of assigned variable name)
  • Defect: The tracer's extractAssignedVariable recognises plain assignments (const x = req.body, let y = process.argv[1]), but not destructuring or spread:
    const { secret: userInput } = req.body;   // userInput untainted per current tracer
    const [input, ...rest] = process.argv;    // input / rest untainted
    const { a, b: { c } } = req.body;          // c untainted
    
    Sinks that use any of userInput, input, rest, c downstream will not be flagged.
  • Fix: Extend the extractor to recognise object-destructuring, array-destructuring, and rest patterns. This is a pure parser-level change; the taint propagation downstream is already correct.
  • Severity: MEDIUM. Common modern JS/TS pattern; the gap yields false negatives rather than false positives.

B7 — Levenshtein <= 2 threshold lets many real typosquats through — MEDIUM

  • File: scanners/dep-audit.mjs:307, 326, and scanners/supply-chain-recheck.mjs
  • Defect: The dep-audit gate flags distance=1 as HIGH and distance=2 as MEDIUM; distance >= 3 is not flagged. But many real-world typosquats have distance >= 3: lodash vs lodash-utils (distance = 6), react vs reactjs-utils (distance = 8), express vs expresss (distance = 1, captured) versus express vs expressjs-wrapper (distance > 2, missed). Common token-injection typosquats (-utils, -helper, -core, -plus) are exactly the attack pattern that distance-based matching fails on.
  • Fix: Combine Levenshtein with tokenisation:
    1. Split package names on - and _.
    2. Flag if any token set is a strict subset (or the top-N-overlap) of a known popular package's token set.
    3. Keep Levenshtein <= 2 as a complementary signal, not the sole gate.
  • Severity: MEDIUM. There is an existing allowlist (v7.0.0 expansion: 22 npm + 5 PyPI) that partially compensates by reducing false positives for short-name tools; this fix targets the false negative side.

B8 — "CaMeL-inspired SHA-256 provenance tracking" is a 200-byte substring fingerprint — HIGH (honesty)

  • File: hooks/scripts/post-session-guard.mjs:655-658 (computeDataTag), CLAUDE.md:184.
  • Code:
    function computeDataTag(text) {
      const sample = text.slice(0, 200);
      return createHash('sha256').update(sample).digest('hex').slice(0, 16);
    }
    
  • Claim (CLAUDE.md:184): "CaMeL-inspired data flow tagging (SHA-256 provenance tracking, output to input linking)."
  • Reality: The mechanism hashes the first 200 characters of a tool's output, truncates to a 16-hex tag, and looks for a byte-wise substring of that output in a later input. This is neither semantic lineage nor robust provenance:
    • Append one byte to the head of the output: new hash, no match.
    • Insert a whitespace character in the first 200 characters: no match.
    • Summarise or translate the output before passing it onward: no match.
    • Encode the output (base64, hex, quoted-printable) before using it: no match.
  • Contrast with CaMeL (DeepMind, 2025): CaMeL uses typed capability objects, explicit control-vs-data-channel separation, and policy-checked re-entry of data into privileged sinks. The present implementation shares the ambition but not the mechanism.
  • Fix options:
    1. Rename in docs: "opportunistic byte-matching of truncated output fingerprints (first 200 bytes, SHA-256/16-hex tag; not semantic lineage)."
    2. Strengthen the matching: use n-gram fingerprints over the full output, or content-defined chunking (CDC), or Rabin fingerprints, so that small edits still match.
    3. Split the feature: keep the current lightweight tag as session-guard provenance hint, and scope any future CaMeL-style tracking as a separate, clearly labelled module.
  • Severity: HIGH (honesty). The current phrasing promises behaviour the code cannot deliver. Operators building threat models on that promise are mis-calibrated.

3. Coverage Gaps

This section lists threat vectors that the plugin does not attempt to cover, and where a reasonable operator might expect it to. The plugin's Defense Philosophy section (CLAUDE.md §"Defense Philosophy (v5.0)") is commendable for acknowledging that prompt injection is structurally unsolvable; what follows is narrower: specific vectors that are tractable but absent.

3.1 Agent-to-Agent (A2A) injection

  • Vector: Two or more agents delegating work to each other via Task/Agent tools. An A2A injection attack plants malicious instructions in the output of one agent that a second agent then ingests as trusted context.
  • Where it should be covered: A new hook or an extension of post-session-guard.mjs that tracks inter-agent delegation chains. Today, post-session-guard tracks delegation for the "escalation-after-input" advisory within a 5-call window (post-session-guard.mjs:830-840), but does not model the sub-agent's outputs as a potential injection source when re-entering the parent context.
  • Reference: OWASP ASI02 (Agent orchestration abuse), DeepMind Agent Traps category 4 (delegation).
  • Suggested extension: Hash the outputs of sub-agent invocations, scan them with the same injection-pattern matcher used by post-mcp-verify.mjs, and emit a MEDIUM advisory if matches are found in the parent context's next decision.

3.2 Multi-modal injection (EXIF / image steganography / PDF)

  • Vector: Hidden instructions in image metadata (EXIF, XMP, IPTC), in image pixels (steganography), in PDF object streams, or in audio metadata (ID3 tags). An image attached to a prompt is not scanned today.
  • Where it should be covered: A pre-tool-use hook that intercepts Read on binary formats and runs a targeted metadata scanner.
  • Reference: 2025 research (prompt-injection-research-2025-2026.md would ideally cite the multi-modal injection work from OpenAI and DeepMind; today the knowledge file focuses on text vectors).
  • Suggested extension: Add an exif-scanner.mjs / pdf-object-scanner.mjs that runs on Read of matching extensions; extract and scan text-bearing fields.

3.3 MCP 2.0 OAuth attacks

  • Vector: MCP 2.0 introduces OAuth flows for MCP-server authentication. Client-side RBAC, consent phishing, and stale-token replay are the attack surfaces.
  • Where it should be covered: scanners/mcp-live-inspect.mjs and scanners/mcp-scanner-agent system prompt.
  • Reference: OWASP MCP10 (Insecure authentication), current MCP spec updates.
  • Suggested extension: A static checker for OAuth config in .mcp.json and live-inspect probe for authorization server metadata endpoints.

3.4 Skill marketplace poisoning (pre-deployment gate)

  • Vector: A compromised skill in a public marketplace (Claude Code plugin marketplace, Anthropic skill store, etc.) is installed by a developer. The payload is dormant until specific conditions are met.
  • Where it should be covered: A pre-install-skill.mjs hook that scans skill manifests before installation, analogous to pre-install-supply-chain.mjs for packages.
  • Reference: OWASP AST04 (Skill tampering), OWASP AST06 (Skill supply chain).
  • Suggested extension: Integrate with the plugin-audit command to run on install, not only on demand.

3.5 Terminal UI / ANSI escape injection

  • Vector: Tool output containing ANSI escape sequences that, when rendered in the developer's terminal, hide instructions (e.g., using cursor-move codes, colour-matching background, or OSC 52 to inject clipboard content).
  • Where it should be covered: post-mcp-verify.mjs should strip or flag ANSI escape sequences before any output is passed to the developer. Today, the scanner checks for HTML and Unicode Tag steganography but not ANSI.
  • Reference: 2025 terminal-injection work (iTerm2, Windows Terminal advisories).
  • Suggested extension: Add an ansi-strip step in post-mcp-verify that either strips or raises MEDIUM on all sequences beyond a safe allowlist (SGR colour only).

3.6 OAuth token exfiltration via MCP tools

  • Vector: A compromised MCP tool description instructs the model to read OAuth tokens from the environment or from keychain-mounted paths and transmit them via a subsequent tool call.
  • Where it should be covered: scanners/mcp-scanner-agent and post-mcp-verify.mjs.
  • Reference: DeepMind Agent Traps category 6 (exfiltration). The trifecta detection partially covers this when the three legs land within the 20-call window, but slow-burn exfiltration can span longer.
  • Suggested extension: Tag MCP tools that request environment variables or credential-adjacent paths and enforce a per-session audit of their subsequent outputs.

3.7 Prompt-cache poisoning

  • Vector: Shared prompt caches (Anthropic's prompt cache, OpenAI's cached prompts) can be seeded with attacker-controlled content that is then served to later queries in the same cache key.
  • Where it should be covered: Out of scope for a local hook-based tool in a strict sense, but the plugin does not acknowledge this vector in its Defense Philosophy or Known Limitations.
  • Reference: 2025 prompt-cache research.
  • Suggested extension: Add a note in docs/security-hardening-guide.md and in the Defense Philosophy section explicitly noting that shared prompt caches are out of scope.

3.8 Supply chain — Docker image layer inspection

  • Vector: A Dockerfile pulls an image; a malicious layer within the image contains secrets-harvesting or reverse-shell code. pre-install-supply-chain.mjs checks the image name against blocklists, not the image contents.
  • Where it should be covered: Depth extension in pre-install-supply-chain.mjs or a dedicated docker-layer-scanner.mjs.
  • Reference: 2024-2025 Sysdig and Snyk reports on malicious Docker Hub images.
  • Suggested extension: Optional Trivy / Grype integration for image content scanning when the user opts in.

3.9 Web dashboard / fleet policy server

  • Vector: Not a threat vector per se, but a capability gap. The plugin is per-machine; a CISO with 100 developers cannot see posture across the fleet, cannot push policy updates centrally, and cannot confirm that a security bulletin has propagated.
  • Reference: Snyk, Semgrep Cloud, GitGuardian, Protect AI. All ship fleet-level consoles.
  • Suggested extension: See §7 roadmap.

3.10 Compliance evidence pack

  • Vector: EU AI Act Art. 15 (accuracy, robustness, cybersecurity) requires audit-ready evidence. The plugin produces JSONL audit trails (lib/audit-trail.mjs) and can emit SARIF, but does not produce compliance-ready PDF or DOCX reports with NIST AI RMF / ISO 42001 / EU AI Act mappings, timestamps, and signed attestation.
  • Reference: EU AI Act Art. 15, NIST AI RMF 1.0, ISO 42001, MITRE ATLAS.
  • Suggested extension: A /security compliance-report [--framework eu-ai-act|nist|iso42001] command that emits a signed PDF.

4. Evasion Arsenal — 18 Proof-of-Concept Attacks

The 18 PoCs below are summarised from the evasion-arsenal and hook-bypass review agents (a2c19b9c36b5b955f, ad7770d76bb7df1f5). Each entry states the attack class, a compact payload sketch (literal payload bytes intentionally redacted to avoid triggering the plugin's own hooks during documentation review), which hook or scanner should own the detection, why it misses today, and a fix. The PoCs are intended as defensive test cases; the attack-simulator (scanners/attack-simulator.mjs) is the right home for the full corpus.

E1 — U+F0000 Private Use Area steganography

  • Class: Unicode steganography.
  • Payload sketch: Instructions encoded with characters from the Supplementary Private Use Area (U+F0000-F0FFF or U+100000-10FFFD) embedded in a prompt or tool output.
  • Owner: hooks/scripts/pre-prompt-inject-scan.mjs (containsUnicodeTags or equivalent).
  • Why it misses: The current detector covers U+E0000-E007F (Unicode Tag block) but not PUA-A / PUA-B. PUA characters are invisible in most terminals and often pass through normalisation.
  • Fix: Extend the Unicode-range match to cover U+E0000-E007F, U+F0000-FFFFD, and U+100000-10FFFD. Emit MEDIUM for PUA presence in prompt or tool output.

E2 — Multi-turn temporal evasion

  • Class: Split injection.
  • Payload sketch: Part 1 in prompt N establishes a rare trigger token. Part 2 in prompt N+5 associates that trigger token with a destructive action.
  • Owner: post-session-guard.mjs (long-horizon window).
  • Why it misses: The 100-call long-horizon window detects slow-burn trifectas (leg A and leg C > 50 calls apart) but does not correlate semantic instruction fragments across turns.
  • Fix: Add an n-gram / keyword-memory tracker in post-session-guard that flags the re-use of a rare token across turns near a sink.

E3 — Base64-encoded instructions in source comments

  • Class: Indirect injection.
  • Payload sketch: A source-code comment block whose content is a base64 blob; the LLM decodes the comment while reading the file and executes the decoded instructions.
  • Owner: post-mcp-verify.mjs / pre-prompt-inject-scan.mjs.
  • Why it misses: The current injection-pattern matcher inspects literal strings but does not attempt base64 / hex / rot13 decoding.
  • Fix: Add an opportunistic decoder that attempts base64/hex/rot13 on long comment blocks and re-scans the decoded content.
  • Class: Indirect injection via markup.
  • Payload sketch: A Markdown link of the form [text](url "injected instructions") where the attacker's payload lives inside the title attribute (the quoted string after the URL).
  • Owner: post-mcp-verify.mjs.
  • Why it misses: The scanner inspects link URLs and anchor text, not the title attribute inside the quotes after the URL.
  • Fix: Extend the markdown parser to extract link titles and scan them with the same injection matcher.

E5 — SVG <desc> inline injection

  • Class: Indirect injection via markup.
  • Payload sketch: An SVG file with injected instructions inside a <desc> element (or <title>, <metadata>, <foreignObject>).
  • Owner: post-mcp-verify.mjs / entropy-scanner.mjs.
  • Why it misses: v7.0.0 expanded the entropy scanner to skip .svg files (reduces false positives on base64 PNG payloads). The HTML content trap detection in post-mcp-verify does not cover SVG-specific tags.
  • Fix: Treat SVG as HTML-like for injection scanning, not as a binary asset. Parse <desc>, <title>, <metadata>, and <foreignObject> and scan their text content.

E6 — CSS content: property injection

  • Class: Indirect injection via markup.
  • Payload sketch: A CSS rule using the content: property to embed text instructions that the model reads.
  • Owner: post-mcp-verify.mjs / new css-scanner.mjs.
  • Why it misses: The v7.0.0 entropy scanner skips .css/.scss/.sass/.less, and no other scanner inspects CSS.
  • Fix: Add a targeted CSS scanner that extracts content property values and scans them. Limit scope to tool output, not source-code files.

E7 — HTML comment payload injection

  • Class: Indirect injection via markup.
  • Payload sketch: HTML comment nodes containing injection text.
  • Owner: post-mcp-verify.mjs (HTML content trap detection).
  • Why it misses: The HTML trap detector inspects visible tags and scripts; it does not systematically scan comment content.
  • Fix: Extend the HTML parser to extract all comment nodes and feed them to the injection matcher.

E8 — Bash T7: process substitution

  • Class: Command obfuscation.
  • Payload sketch: A destructive command hidden inside a <(...) or >(...) process-substitution expression.
  • Owner: hooks/scripts/pre-bash-destructive.mjs, scanners/lib/bash-normalize.mjs.
  • Why it misses: v5.0's bash-normalize.mjs covers T1-T6 (empty quotes, ${}, backslashes, tabs, ${IFS}, ANSI-C hex). Process substitution (<(...), >(...)) is not normalised; the hostile command is never re-constructed before the destructive-command matcher runs.
  • Fix: Add a T7 rule: collapse <(...) and >(...) into their inner command text for matching purposes.

E9 — Bash T8: base64 indirect exec

  • Class: Command obfuscation.
  • Payload sketch: A pipeline of the form echo <base64-blob> | base64 -d | bash where the decoded payload is never visible in the raw command string.
  • Owner: pre-bash-destructive.mjs.
  • Why it misses: The decoded payload is never present in the raw command string before the decode step runs.
  • Fix: Detect the base64 -d | <shell> idiom as HIGH per se, independent of payload content. A legitimate use is rare and can be allowlisted.

E10 — Bash T9: eval via variable indirection

  • Class: Command obfuscation.
  • Payload sketch: Assign the destructive command string to a variable, then run eval "$VAR".
  • Owner: pre-bash-destructive.mjs, bash-normalize.mjs.
  • Why it misses: eval is detected directly; eval "$VAR" with VAR assigned earlier requires simple forward-flow variable tracking, which is not performed.
  • Fix: Add one-level variable tracking in bash-normalize: when a variable is assigned a string and then passed to eval, substitute the literal before matching.

E11 — GitHub Actions expression injection

  • Class: Workflow injection.
  • Payload sketch: A Git commit message or pull request title containing a shell-break payload consumed by a workflow that uses ${{ github.event.head_commit.message }} or similar in a run: block.
  • Owner: A new workflow-scanner.mjs or an extension of scanners/git-forensics.mjs.
  • Why it misses: No scanner inspects .github/workflows/*.yml for user-controlled expressions in run: contexts.
  • Fix: Scan workflow YAML files for ${{ github.event.* }}, ${{ github.head_ref }}, etc., inside run: blocks and emit HIGH.

E12 — .gitattributes filter driver post-clone

  • Class: Supply-chain post-clone escalation.
  • Payload sketch: A repository is cloned with lib/git-clone.mjs's sandbox enabled; a .gitattributes file inside the clone references a filter/smudge driver that triggers arbitrary code on later checkouts outside the sandbox.
  • Owner: lib/git-clone.mjs / post-mcp-verify.mjs.
  • Why it misses: The sandbox covers the initial clone; .gitattributes written to the temp clone can be carried forward if the user moves the clone outside the temp dir.
  • Fix: Scan .gitattributes inside the clone for filter= / diff= / merge= directives and raise MEDIUM if present, regardless of destination.

E13 — npm scoped-package lifecycle injection

  • Class: Supply-chain.
  • Payload sketch: npm install @benign-looking-scope/popular-name where the scope is registered by the attacker and the package name matches a popular unscoped package.
  • Owner: pre-install-supply-chain.mjs.
  • Why it misses: The blocklist and typosquat detector operate on the unscoped name; scope-hopping (registering @attacker/<popular-name> when the real is unscoped <popular-name>) is not explicitly flagged.
  • Fix: Detect install of @scope/name where name matches an unscoped top-100 package name and the scope is not the package's known official scope.

E14 — MCP description drift under the Levenshtein 10% threshold

  • Class: MCP rug-pull / slow-burn.
  • Payload sketch: A hostile MCP server publishes a description drift of <= 9% per update, repeated weekly. After 10 weeks, the description is substantially different, but no single diff triggered the Levenshtein-drift alert.
  • Owner: scanners/lib/mcp-description-cache.mjs and post-mcp-verify.mjs.
  • Why it misses: Drift is measured per-update, not cumulatively against the oldest cached description.
  • Fix: Also compute drift against the earliest cached description (baseline). Raise MEDIUM if cumulative drift >= 25% over any 12-week window.

E15 — Memory poisoning via .claude/agents/<name>.md

  • Class: Memory poisoning.
  • Payload sketch: A malicious agent file in .claude/agents/ with embedded instructions targeting session state.
  • Owner: scanners/memory-poisoning.mjs.
  • Why it misses: The memory-poisoning scanner inspects CLAUDE.md, top-level memory files, and .claude/rules, but not .claude/agents/*.md.
  • Fix: Add .claude/agents/*.md to the scanner's target glob.

E16 — Pre-compact homoglyph bypass

  • Class: Obfuscation.
  • Payload sketch: An injection prompt that replaces Latin letters with visually identical Cyrillic letters (e.g., Cyrillic o U+043E for Latin o U+006F).
  • Owner: pre-prompt-inject-scan.mjs / a pre-compact variant.
  • Why it misses: Pattern matches for well-known injection phrases are done on the raw input before normalisation in some code paths; homoglyph-normalisation is MEDIUM-advisory in v5.0 but not integrated into every matcher.
  • Fix: Run NFKC / homoglyph-folding before every injection-pattern match, not only before the obfuscation advisory.

E17 — Escalation-after-input 6-call bypass

  • Class: Delegation.
  • Payload sketch: Untrusted input at call N, then wait 6 tool calls, then delegate to a sub-agent at call N+6.
  • Owner: post-session-guard.mjs (checkEscalationAfterInput).
  • Why it misses: The advisory fires only within a 5-call window after the input event.
  • Fix: Make the window configurable (env var LLM_SECURITY_ESCALATION_WINDOW) with a documented default of 5; also add a secondary longer-window advisory at MEDIUM severity for delegation within 20 calls.

E18 — Secret exfiltration via Markdown image URL (rule 18 abuse)

  • Class: Data exfiltration.
  • Payload sketch: A Markdown image reference whose URL query string carries a high-entropy credential value, constructed so that v7.0.0's new "markdown image URL" suppression rule dismisses the high-entropy string.
  • Owner: scanners/entropy-scanner.mjs.
  • Why it misses: Rule 18 was added in v7.0.0 to suppress false positives on CDN-hosted image URLs. It dismisses legitimate high-entropy strings and also dismisses embedded secrets in the same shape.
  • Fix: Refine rule 18 to only suppress strings that match the host/path structure of known CDN patterns (e.g., cdn.*, images.*, *.amazonaws.com/s3/*) and not arbitrary https://...?... query strings. Alternatively, run an explicit secret-pattern match inside the URL's query before suppression takes effect.

5. Scoring Model Critique

The v7.0.0 scoring rework is a substantive improvement over v1. The v1 model (scanners/lib/severity.mjs riskScoreV1, kept for reference) summed weighted counts and capped at 100 — any scan with more than a handful of findings collapsed to 100, making the score useless as a signal. The v2 model is severity-dominated (one critical always lands in the 70-95 tier) and log-scaled within tier (additional findings of the same severity increase the score but with diminishing returns). The design decisions below are nonetheless worth flagging.

5.1 Tipping points (verified)

From severity.mjs:32-46:

Input Score Verdict Band
{} 0 ALLOW Low
{low: 1} 4 ALLOW Low
{low: 10} 11 ALLOW Low
{medium: 1} 20 WARNING Medium
{medium: 5} 28 WARNING Medium
{medium: 50} 33 WARNING Medium
{high: 1} 48 WARNING High
{high: 5} 60 WARNING High
{high: 7} 64 WARNING High
{high: 8} 65 BLOCK Critical
{high: 17} 65 BLOCK Critical
{critical: 1} 80 BLOCK Critical
{critical: 2} 86 BLOCK Critical
{critical: 4} 93 (docs say 90 — see B4) BLOCK Extreme
{critical: 10} 95 BLOCK Extreme

Observations:

  • The high → critical verdict transition at exactly 8 high findings is a sharp step. A scan with 7 high + 5 medium findings receives WARNING, band High, score 64. One additional high pushes it to BLOCK. For an adversary who wants to avoid BLOCK while landing real attacks, the practical ceiling is 7 high findings.
  • The medium tier has an effective ceiling of 35 (medium=50 → 33; medium=1000 → 33 after rounding). A project with hundreds of medium findings looks identical in score to a project with five. This is the log-scaling doing its job; it is also a volume-blindness that should be documented.
  • Info is scoring-inert (see B3).

5.2 Adversarial under-BLOCK landings

An attacker who understands the tier structure can optimise for:

  • Land in the High band without crossing the BLOCK line. 7 high findings + any number of medium + any number of low. This gives score = 64, verdict = WARNING. A developer habituated to WARNING is likely to proceed.
  • Land 1 critical and accept the BLOCK. One false-positive critical (a test fixture secret, a deliberate placeholder) can block legitimate work. In that case the operator's reaction is to tune, suppress, or disable the offending rule — which reduces detection capacity for future real criticals.

5.3 Tier gap

The medium tier ends at 35 and the high tier begins at 40. A medium-heavy scan cannot reach the high band by accumulation alone; it must have at least one high finding. This is the explicit design intent ("severity-dominated") but has a consequence: a project with systemically many medium findings (e.g., a large legacy codebase) is perpetually scored "Medium" even when the cumulative risk is substantial. The v2 formula is honest about this; the docs should be, too.

Fix suggestion: Add an escape hatch.

if (medium >= 20) base = Math.max(base, 40); // medium volume bridge

and document it. The bridge only fires when medium count is volumetrically notable; it does not suppress the severity-dominated principle.

5.4 Verdict / band co-monotonicity

The verdict thresholds at severity.mjs:74-79 and band thresholds at severity.mjs:93-99 are aligned (BLOCK >= 65 co-occurs with Critical/Extreme band; WARNING >= 15 co-occurs with Medium/High band). This is good: operators can reason about a single number. The alignment should be asserted in a test to prevent future drift.

5.5 Info dead-weight

See B3. The three options (document, supplementary score, floor contribution) each have trade-offs. The simplest honest change is documenting the behaviour. The most useful extension is a supplementary info-volume trend per scan target, exposed in the dashboard aggregator.


6. Feature-Gap Matrix vs. Commercial Competitors

The matrix below is derived from the market-analysis agent (ade6518f8a6fcc0c6). Vendors compared: Snyk, Semgrep, GitGuardian, Protect AI, Lakera, HiddenLayer. Each cell is Y / N / partial / (—). Status column: Leading / Competitive / Behind / Missing.

Feature llm-security Snyk Semgrep GitGuardian Protect AI Lakera HiddenLayer Status
SAST / code scanning N Y Y N N N N N/A (out of scope)
SCA / dependency audit Y (dep-audit + supply-chain-recheck) Y Y N Y N N Competitive
Secrets detection Y (entropy + patterns) partial Y Y N N N Competitive
IaC / workflow scanning N Y Y N N N N Behind
Container / image scanning partial (name-blocklist only) Y N N Y N N Behind
Web dashboard N Y Y Y Y Y Y Behind (critical gap)
Fleet policy server N Y Y Y Y Y Y Behind (critical gap)
IDE real-time scanning partial (extension scan) Y Y N N N N Behind
IDE extension scanning Y (VS Code + JetBrains) N N N N N N Leading
MCP static audit Y N N N N N N Leading
MCP live inspection Y N N N N N N Leading
Bash obfuscation normalisation Y (T1-T6) N N N N N N Leading
Unicode Tag steganography detection Y N N N partial partial N Leading
Prompt injection static scan Y N N N partial Y Y Competitive
Runtime prompt firewall / filter N N N N Y Y Y Behind (critical gap)
Model weight scanning N N N N Y N Y Behind
AI-BOM generation Y (CycloneDX 1.6) partial N N Y N N Competitive
Adaptive red-team harness Y (64 scenarios + mutations) N N N Y Y N Leading
Trifecta / Rule of Two detection Y N N N N N N Leading
Interactive threat modelling (STRIDE/MAESTRO) Y N N N N N N Leading
Sandboxed clone / VSIX fetch Y N N N N N N Leading
False-positive ML feedback loop N Y Y Y N N N Behind
SIEM-native integration partial (SARIF + JSONL) Y Y Y Y Y Y Behind
Enterprise ticketing (Jira, ServiceNow) N Y Y Y N N N Behind
Chat integration (Slack, Teams, PagerDuty) N Y Y Y N N N Behind
Compliance PDF/DOCX reports N Y Y Y Y N Y Behind (critical gap)
EU AI Act audit-evidence pack N N N N partial N partial Behind
Offline / air-gapped operation Y N partial N N N N Leading
Open source + MIT Y N partial (Semgrep CE) N N N N Leading

Summary: llm-security leads on 11 features, is competitive on 5, behind on 8 (of which 4 are critical gaps: web dashboard, fleet policy server, runtime firewall, compliance reporting), and out of scope on 1 (SAST).

The plugin is genuinely unique in its combination of MCP auditing, IDE extension prescan, trifecta detection, bash evasion normalisation, interactive threat modelling, sandboxed remote fetch, AI-BOM generation, and an adaptive red-team harness — no competitor in the matrix combines these. The gap is not capability per se; it is enterprise integration, central visibility, and compliance deliverables.


7. Roadmap Recommendation — 10 Features Ranked

Ranking criterion: (market value) x (strategic differentiation) / complexity. Ties broken by dependency-order (features that unblock others rank higher).

Rank 1 — Web dashboard + fleet policy server

  • Problem: The plugin is per-machine. A CISO with a fleet of developers cannot see aggregate posture or push policy updates. The most frequent objection from enterprise review is "we can't see it".
  • Complexity: L. Requires a server (Node, Deno, or Go), a persistence layer (SQLite minimum, Postgres for multi-tenant), an auth model (OIDC / SAML), and a policy-push protocol. Dashboard UI is separate work.
  • Market value: Critical. Without this, the plugin cannot be sold to regulated enterprise.
  • Dependencies: None; the plugin already emits SARIF and JSONL that can be ingested.
  • Suggested phasing: (a) Read-only dashboard consuming reports/ JSONL + SARIF; (b) authenticated fleet policy server; (c) web UI.

Rank 2 — Runtime prompt firewall mode

  • Problem: All current protection is static. A running Claude Code session receives a prompt injection via tool output; post-mcp-verify.mjs detects and warns, but the content has already been ingested by the model. A prompt firewall would sit in the IO path and strip / rewrite injections before the model sees them.
  • Complexity: L. Requires intercepting the tool-output stream, applying a fast classifier, and rewriting or refusing the content. Performance constraints are strict (<50 ms per intervention).
  • Market value: Critical. Lakera Guard, Protect AI, and Prompt Guard sell primarily on this capability.
  • Dependencies: Claude Code hook API stability; possibly a new hook event for intercepting tool output streams.
  • Suggested phasing: (a) Offline batch mode (rewrite after the fact, for post-hoc demonstration); (b) online real-time mode.

Rank 3 — IDE real-time scanning (VS Code + JetBrains plugins)

  • Problem: The plugin can scan installed extensions but cannot scan the developer's own code in the editor. Snyk and Semgrep both ship IDE plugins that scan on save and annotate with squigglies.
  • Complexity: M. Requires VS Code and JetBrains plugin shells, LSP-style integration with the existing scanners, and bidirectional config sync.
  • Market value: High. Developer ergonomics; shifts scanning from an occasional CLI run to a continuous feedback loop.
  • Dependencies: Stable CLI entry point (already present: bin/llm-security.mjs).

Rank 4 — Compliance reporting pack

  • Problem: EU AI Act Art. 15 audit evidence, NIST AI RMF and ISO 42001 mappings, and MITRE ATLAS heat-maps are all CISO-facing deliverables. Today the plugin emits JSONL and SARIF.
  • Complexity: M. Template-driven PDF / DOCX generation with signed timestamps.
  • Market value: High. Required for regulated verticals; unlocks health, finance, public sector.
  • Dependencies: Audit-trail module (lib/audit-trail.mjs), posture scanner, OWASP mappings — all already present.

Rank 5 — Enterprise ticketing + chat integrations

  • Problem: Findings reach developers but not tracking systems. No Jira / ServiceNow / Slack / Teams / PagerDuty push today.
  • Complexity: S-M. Each integration is small; the aggregate is medium.
  • Market value: High. Operational requirement for any security team that runs ticket-driven workflows.
  • Dependencies: Policy / routing configuration in .llm-security/policy.json.

Rank 6 — False-positive ML feedback loop

  • Problem: Suppressions today are static (.llm-security/policy.json rules, file-level skip, entropy rule allowlists). Snyk and Semgrep learn from user dismissals and re-rank findings.
  • Complexity: M. Requires a feedback record format, a light ranker (gradient-boosted trees or a simple logistic model), and a privacy-preserving training loop (on-device preferred).
  • Market value: Medium-High. Reduces noise, which is the biggest operational tax on static scanners.
  • Dependencies: Dashboard (rank 1) helps with multi-machine aggregation, but a local-first version is possible.

Rank 7 — Multi-modal / EXIF / PDF injection scanner

  • Problem: Image and PDF inputs are not scanned. Multi-modal injection is a 2025-2026 growth vector.
  • Complexity: M. Pure metadata scanning is small; pixel-level steganography detection is larger and probably out of scope for v1.
  • Market value: Medium-High. Competitive differentiator if delivered early.
  • Dependencies: ExifTool or a Node-native EXIF parser; a PDF object parser.

Rank 8 — MCP 2.0 OAuth audit

  • Problem: MCP 2.0 OAuth introduces new attack surfaces (consent phishing, token replay, scope creep).
  • Complexity: S-M. Extension to mcp-live-inspect.mjs and scanners/mcp-scanner-agent.
  • Market value: Medium-High. Aligned with MCP 2.0 adoption timeline.
  • Dependencies: MCP 2.0 spec stability.

Rank 9 — Terminal / ANSI escape injection scanner

  • Problem: Tool output rendered in the terminal can hide instructions via ANSI escapes. Not covered today.
  • Complexity: S. A regex-based stripper + allowlist for safe SGR sequences.
  • Market value: Medium. Low-frequency vector but high-impact when it lands.
  • Dependencies: None.

Rank 10 — Skill marketplace pre-deployment gate

  • Problem: A skill pulled from a public marketplace can be compromised. Today the plugin offers /security plugin-audit on demand; it does not hook skill installation.
  • Complexity: S-M. Requires knowledge of Claude Code's skill install event (or a wrapper CLI).
  • Market value: Medium. Protective against a vector that will grow as the marketplace grows.
  • Dependencies: Claude Code skill install hook availability.

8. Code-Quality Observations

8.1 Documentation / arithmetic mismatch

  • CHANGELOG.md:11 and severity.mjs:23 JSDoc — see B4. The arithmetic disagrees with the code. The discrepancy undermines the v7.0.0 trustworthy-scoring headline.

8.2 Hook configurability vs. documentation

  • post-session-guard.mjs:814-826 — the block path is gated by (mcpInfo.concentrated || sensitiveExfil). The CLAUDE.md Hooks table describes block|warn|off without that qualifier. See B2.

8.3 Regex precision

  • pre-write-pathguard.mjs:21-25 — the ENV_PATTERNS regex is too restrictive. See B1.

8.4 Dead / low-value code

  • severity.mjs:55-63riskScoreV1 is kept for diff/comparison. The function is unused in production paths but exported. Consider marking @deprecated in JSDoc or moving to docs/legacy/ to signal intent.

8.5 Test gaps

  • The scoring tipping points in §5.1 are covered by existing tests per the CLAUDE.md "1487 tests" claim, but there is no test that pins 4 critical = 93 (the documented wrong value would not be caught by a test that asserts the documented value).
  • No test that pins the verdict/band co-monotonicity invariant from §5.4.
  • No mutation-testing coverage numbers are published (CLAUDE.md mentions unit/integration tests only).
  • The destructuring / spread case in B6 has no test coverage in taint-tracer.
  • The path-guard .env.X.Y.Z case in B1 has no test coverage in the pathguard test suite.
  • The distributed-trifecta (non-MCP-concentrated, non-sensitive-path) BLOCK-mode behaviour in B2 is not asserted by a test.

8.6 Duplication

  • lib/git-clone.mjs and lib/vsix-sandbox.mjs share the same sandbox-profile construction logic. v6.5.0's consolidation was partial; a small amount of duplication remains. A shared lib/sandbox.mjs would reduce the risk of divergence.

8.7 Configuration surface

  • Multiple env vars (LLM_SECURITY_INJECTION_MODE, LLM_SECURITY_TRIFECTA_MODE, LLM_SECURITY_UPDATE_CHECK, etc.) are scattered across hooks. A single .llm-security/policy.json section that mirrors them would reduce surprise.

8.8 Naming

  • computeDataTag at post-session-guard.mjs:655 is technically correct but the surrounding comments (line 646: "CaMeL-inspired data flow tagging") set an expectation the function cannot meet. Either the function is renamed to computeOutputFingerprint or the comments are toned down. See B8.

8.9 Error handling

  • The scanners consistently use JSON output, but error recovery is uneven. A malformed .llm-security/policy.json leads to scanner warnings in some paths and silent fallbacks in others. A single readPolicy() helper with a documented fallback chain would reduce ambiguity.

8.10 Cross-reference bit-rot risk

  • CLAUDE.md contains a very detailed table of hooks, scanners, and knowledge files with line-level claims. A change to a single hook behaviour requires updates in four places (hook source, CLAUDE.md, README, CHANGELOG). A lightweight doc-consistency test (e.g., a script that asserts that the number of hooks listed in .claude/hooks.json matches the table in CLAUDE.md) would catch drift.

9. Honesty Check

The following quotes were verified in the referenced source files during this review. Each is paired with the behaviour the code actually implements, and with a suggested replacement that trades slightly more words for a substantially more accurate description.

# Quote (source) Reality Suggested replacement
1 "enforces the Rule of Two" — CLAUDE.md:182 TRIFECTA_MODE default is warn; block is opt-in; even in block, only high-confidence (MCP-concentrated OR sensitive-path) trifectas actually block. See B2. "detects Rule of Two violations; blocks on high-confidence (MCP-concentrated or sensitive-path) trifectas in opt-in block mode; warns otherwise. Distributed trifectas are detected but not blocked by default."
2 "Fully Schrems II compatible" — CLAUDE.md:136, README.md Standalone CLI is offline by default, but OSV.dev lookups in supply-chain-recheck.mjs transmit package identifiers to a Google-operated API. ci-cd-guide.md is accurate; CLAUDE.md is not. "Schrems II compatible in default offline mode. The optional OSV.dev enrichment (supply-chain-recheck with --online) transmits package identifiers to a Google-operated API and is a separate compliance consideration."
3 "CaMeL-inspired data flow tagging (SHA-256 provenance tracking, output to input linking)" — CLAUDE.md:184 computeDataTag at post-session-guard.mjs:655 hashes the first 200 characters of tool output; flowMatch checks for a substring match in later inputs. This is byte-matching with early truncation, not semantic lineage. See B8. "Opportunistic byte-matching of truncated output fingerprints (first 200 bytes of tool output, SHA-256/16-hex tag). This is a lightweight heuristic, not semantic data-flow lineage; it fails on any output mutation or summarisation."
4 "defense-in-depth" — multiple locations Accurate in spirit. The plugin does layer prompt-scan, pathguard, trifecta-guard, and post-mcp-verify. The claim is not quantified. "Three independent detection layers with documented bypass classes (see docs/security-hardening-guide.md §6). Each layer is individually bypassable; the design intent is to raise attack cost, not to be a single enforcement line."
5 "Hardened ZIP extractor (zip-slip, symlink, bomb, ratio caps)" — CLAUDE.md §IDE scanner Verified via lib/zip-extract.mjs. Bombs, zip-slip, symlinks, absolute paths, drive-letter paths, encrypted entries, and ZIP64 are all rejected. However no public fuzz-testing results are published. "Hardened ZIP extractor rejects zip-slip, symlink, absolute / drive-letter paths, encrypted entries, and ZIP64 bombs (capped at 10 000 entries, 500 MB uncompressed, 100 x ratio, depth 20). No fuzz-testing results published to date."
6 "1487 tests" — CLAUDE.md header Accurate count. No mutation-testing coverage published. "1487 unit and integration tests. Mutation-testing coverage is not published; the number is a test count, not a coverage metric."
7 "trustworthy scoring" — CHANGELOG v7.0.0 Genuine improvement over v1, but the headline formula example is arithmetically wrong (4 critical = 93, not 90). See B4. "Severity-dominated, log-scaled v2 scoring formula. Replaces the v1 sum-and-cap model that saturated all non-trivial scans to 100 / Extreme. See severity.mjs for the authoritative formula."
8 "1 critical = 80, 2 = 86, 4 = 90, 10 = 95" — CHANGELOG.md:11, severity.mjs:23 riskScore({critical: 4}) = 93. See B4. Fix the arithmetic: 1 = 80, 2 = 86, 4 = 93, 10 = 95.
9 "Context-aware entropy scanner" — CLAUDE.md:7 Extension skip + 8 line-suppression rules. Accurate description. "Context-aware" is a slightly generous framing for what is largely a rule-and-allowlist pipeline. "Rule-based entropy scanner with file-extension skip, 8 line-level suppression rules, and configurable policy."
10 "calibration block reports skip counters" — CLAUDE.md:7 Verified. Accurate. No change.

The honest headline for v7.0.0 should read approximately:

v7.0.0 replaces a broken scoring formula with a severity-dominated log-scaled model, expands the entropy scanner's suppression rules, and extends the typosquat allowlist. Two residual hook bugs and three over-claimed docs items (CaMeL provenance, Schrems II, Rule of Two "enforcement") remain and should be addressed in v7.1.


10. CISO Perspective

Question: would a CISO in a regulated enterprise (financial services, healthcare, public sector, defence) purchase / install llm-security v7.0.0 today?

Answer in two parts.

10.1 Yes, conditionally

The plugin is a credible baseline for:

  • Individual developers and small teams already using Claude Code who want a free, open-source, offline-first second line of defence.
  • Security research groups studying prompt injection, trifecta detection, MCP auditing, and Claude-Code-specific threat surfaces. The adaptive red-team harness and interactive threat modeller have no commercial equivalent.
  • Air-gapped or Schrems-II-sensitive environments where sending data to cloud providers (Snyk, Semgrep Cloud, GitGuardian) is the showstopper. Standalone CLI mode is genuinely offline in default config.
  • Norwegian public-sector pilots. The knowledge/norwegian-context.md file aligns with Datatilsynet, NSM, and Digitaliseringsdirektoratet expectations. With the caveats in §9 corrected, the plugin is a defensible pilot baseline.

In these contexts, the plugin clears the bar: install, configure, review, ship.

10.2 No, not yet for regulated enterprise

A CISO at a bank, insurer, hospital, national-security agency, or large public-sector body would decline the plugin in its current form. The blockers, in rough priority order:

  1. No central dashboard. Security must be observable across a fleet. The plugin's reports live on each developer's machine. Even the dashboard-aggregator caches to a local file. A CISO cannot prove to an auditor that all 300 developers ran the scan in the last 30 days.
  2. No fleet policy push. Policy lives in per-machine .llm-security/policy.json. A change (e.g., raising the trifecta mode from warn to block after an incident) must be rolled out by developer action, not by central push.
  3. No SIEM-native integration. The plugin writes JSONL (lib/audit-trail.mjs). Forwarding to Splunk, Sentinel, Elastic, or QRadar requires a custom collector. Commercial competitors ship native connectors.
  4. No compliance-ready reporting. EU AI Act Art. 15 audit evidence, NIST AI RMF or ISO 42001 attestations, MITRE ATLAS heat-maps — none are produced today. SARIF and JSONL are technical artefacts, not audit evidence.
  5. No runtime protection. All current protection is static; once a prompt injection lands, detection is post-hoc. Regulators increasingly expect runtime controls (prompt firewalls, content filters, output guardrails).
  6. Claim-precision issues. The three honesty items in §9 (CaMeL provenance, Schrems II, Rule of Two "enforcement") would be challenged in a formal procurement review and would require written clarification.
  7. Bugs B1 and B2. A deterministic regex hole in a sensitive-path guard and a block-mode bypass on distributed trifectas are both things a pen-testing firm would find in the first week of engagement. They would not block procurement outright, but they would reduce confidence in the maturity story.

10.3 What a CISO would require before a production engagement

In priority order:

  1. Fix B1 (pathguard regex), B2 (distributed-trifecta block), B4 (CHANGELOG arithmetic), B8 (CaMeL docs), and the honesty quotes in §9. These are low-cost and high-signal; they close the gap between documentation and code.
  2. Ship a read-only web dashboard consuming existing JSONL / SARIF. Even v0.1 of a dashboard unblocks most of the fleet-visibility objection.
  3. Produce a compliance-evidence pack template. A PDF that mirrors posture-scanner output, OWASP category mapping, and audit-trail events — signed, timestamped, and frameable.
  4. Document runtime-protection gap explicitly. Add to Defense Philosophy: "This plugin is a static + hook-based layer. Runtime prompt filtering, model-level guardrails, and egress DLP are out of scope and must be addressed by complementary controls."
  5. Publish a security-architecture note. Diagram showing how the hooks compose, what each hook can and cannot see, and the explicit defence layers. One page. This is the single most asked-for artefact in enterprise security review.
  6. Commit to a 90-day bug-disclosure window. A named security contact (security@... or equivalent Forgejo path) and a documented handling SLA.

10.4 One-paragraph verdict

llm-security v7.0.0 is a serious contribution to open-source AI security tooling and is among the few plugins that address MCP, Claude Code hooks, and IDE extension provenance as first-class problems. It is not yet enterprise-ready. It is close enough that a B+ grade is within reach after one focused release cycle: fix B1 / B2 / B4 / B8, tone the docs to match the code, ship a minimal dashboard. Without that cycle, the plugin stays at B- and remains primarily a tool for individual developers, researchers, and pilot teams.


Appendix A — Verification Log

The following claims in this report were verified by reading the referenced source directly during the review, not relying solely on the earlier review agents:

  • B1 pathguard regex — hooks/scripts/pre-write-pathguard.mjs:21-25 read directly.
  • B2 distributed-trifecta block gate — hooks/scripts/post-session-guard.mjs:800-828 read directly.
  • B4 arithmetic — scanners/lib/severity.mjs:32-46 read directly; computation 70 + min(25, log2(5)*10) = 93.22 rounded to 93 performed manually; CHANGELOG.md:11 and severity.mjs:23 JSDoc read directly to confirm the "4=90" string.
  • B8 CaMeL substring match — hooks/scripts/post-session-guard.mjs:649-680 read directly; CLAUDE.md:184 read directly.
  • Honesty quotes §9 items 1, 2, 3 — CLAUDE.md:136, 182, 184 read directly.

Findings B3, B5, B6, B7 and the evasion-arsenal PoCs E1-E18 are reported here with the file:line anchors provided by the specialist review agents (a12f1a90430b53a8c, a2c19b9c36b5b955f, ad7770d76bb7df1f5, af0552023740ad6b7). They were not each re-verified by direct source inspection during the synthesis step; the reader should treat those anchors as inspection targets rather than fully re-verified facts. The top-5 most serious findings in the Executive Summary (B1, B2, B8, "Fully Schrems II", B4) were all directly verified.

Appendix B — What was not reviewed

  • Performance characteristics of the scanners under large repositories (>1M LOC).
  • Windows behaviour (the reviewer is on macOS / Darwin). The plugin documents Windows fallbacks; they were not exercised.
  • Cross-platform compatibility of the sandbox (bwrap on Ubuntu 24.04+ is documented as flaky; not exercised).
  • Deep inspection of the AI-BOM output against CycloneDX 1.6 schema validators.
  • Inter-operation with MCP servers running under different authentication schemes.
  • The threat-modeler-agent interactive flow beyond a single synthetic run.
  • The /security harden auto-generation output against a diverse set of starting configurations.

These are candidates for a follow-up review cycle.