62 KiB
Critical Review — llm-security v7.0.0
Date: 2026-04-20
Scope: Adversarial audit of the llm-security Claude Code plugin, version 7.0.0 (released 2026-04-19).
Method: Six parallel specialist review agents (scanner bug hunt, hook-bypass arsenal, evasion PoC arsenal, honesty check, market analysis, scoring-model adversarial), followed by manual verification of the most consequential claims by reading the referenced source lines directly.
Subject context: The plugin packages 5 OWASP-style taxonomies (LLM / ASI / AST / MCP / DeepMind Agent Traps), 10 orchestrated scanners, 8 hooks, an interactive threat modeller, an attack simulator, a CLI, a dashboard aggregator, an AI-BOM generator, and a new v7.0.0 scoring model that replaces the v1 "sum-and-cap" formula. The claim under review: that v7.0.0 delivers "trustworthy scoring" and a defensible security posture for Claude Code environments.
1. Executive Summary
Overall grade: B-.
llm-security v7.0.0 is a capable, well-architected, offline-first developer-facing security tool. The breadth is unusual: the project ships functionality (MCP live inspection, IDE extension scanning, bash evasion normalization, Unicode Tag steganography detection, trifecta detection, AI-BOM generation, adaptive red-team harness) that no single commercial competitor combines in one package. The v7.0.0 scoring rework is a genuine improvement over v1 — the old "sum-and-cap" formula did collapse every non-trivial scan to 100 / Extreme, and the new log-scaled, severity-dominated model produces defensible bands for realistic scans.
The grade falls short of B+ or A-minus for four specific reasons:
- Two real HIGH-severity hook bugs let hostile writes and distributed trifectas slip past what the documentation promises as blocking behaviour.
- Three honesty issues inflate claims beyond what the code delivers: the "SHA-256 provenance tracking" is a 200-byte substring fingerprint, "Fully Schrems II compatible" ignores the Google-operated OSV.dev API, and "Rule of Two enforcement" is an opt-in warning in default config.
- Scoring doc arithmetic is wrong in a way that undercuts the "trustworthy scoring" headline: the formula yields 93 for 4 criticals, the documentation says 90.
- Coverage gaps against 2026 threats (A2A injection, multi-modal / EXIF, MCP 2.0 OAuth, terminal ANSI injection, skill marketplace poisoning) are not acknowledged; the plugin is honest about general limitations of prompt-injection defence but silent about these specific vectors.
Top 5 most serious findings (detail in §2, §4, §9)
- HIGH —
pre-write-pathguard.mjs:23regex hole lets*.env.production.local.backupand any*.env.X.Y.Zvariant through unblocked. - HIGH —
post-session-guard.mjs:816gated block downgrades distributed (non-MCP-concentrated, non-sensitive-path) trifectas to WARN even whenLLM_SECURITY_TRIFECTA_MODE=blockis set. - HIGH (honesty) —
post-session-guard.mjs:655"CaMeL-inspired provenance" is a 200-byte SHA-256 substring fingerprint, not data-flow lineage. Trivially bypassed by appending one byte. - MEDIUM (honesty) —
CLAUDE.md:136/README.md"Fully Schrems II compatible" ignores OSV.dev (Google-operated) opt-in. - LOW (arithmetic / honesty) —
CHANGELOG.md:11+severity.mjs:23JSDoc state4 critical = 90when the formula evaluates to 93.
Top 5 most valuable missing features
- Web dashboard + fleet policy server — the plugin is machine-local; enterprise security teams require central visibility and policy push.
- Runtime prompt firewall / filter — all current protection is static; Lakera and Protect AI ship runtime filters.
- IDE real-time scanning (VS Code + JetBrains) — the plugin can scan installed extensions, but not offer live in-editor scanning of the developer's own code, which is table stakes for Snyk and Semgrep.
- Compliance reporting pack (PDF/DOCX, EU AI Act Art. 15 audit evidence) — CISO-facing deliverables are absent; only SARIF / JSONL exist today.
- Enterprise incident integrations (Jira, ServiceNow, Slack, Teams, PagerDuty) — today only SARIF upload is supported.
2. Critical Bugs and Vulnerabilities
This section lists verified findings with file:line references, proof-of-concept payloads, suggested fixes, and severity. Findings B1, B2, B4, and B8 were verified by reading the referenced source directly during this review. B3, B5, B6, B7 were verified by the scanner-bug agent (a12f1a90430b53a8c) and are reported here with its file:line anchors.
B1 — Pathguard regex miss on multi-segment .env.*.local.* — HIGH
- File:
hooks/scripts/pre-write-pathguard.mjs:21-25 - Code:
const ENV_PATTERNS = [ /[\\/]\.env$/, /[\\/]\.env\.[a-z]+$/, // matches .env.X only when X is a single [a-z]+ segment /[\\/]\.env\.local$/, ]; - Defect: The second pattern anchors
$immediately after[a-z]+, so any file name with more than one segment after.envis not matched. Digits, dots, hyphens, and uppercase characters in the suffix also fail to match. - PoC payloads that slip past the hook:
Writeto.env.production.local.backupWriteto.env.development.local.oldWriteto.env.prod.local.bakWriteto.env.stage-1.localWriteto.env.CI.secret
- Impact: A prompt-injected agent can exfiltrate or overwrite environment secrets by choosing any of the variants above. The hook's purpose is to be a last-line path guard; this regex undercuts that.
- Fix: replace with
or, cleaner, match by/[\\/]\.env(\.[A-Za-z0-9._-]+)*$/basenameprefix:if (basename(path).toLowerCase().startsWith('.env')) { block(); } - Severity: HIGH. Deterministic bypass of a hook that is documented as a gate for environment secrets.
B2 — Distributed trifecta is advisory even in block mode — HIGH
- File:
hooks/scripts/post-session-guard.mjs:814-826 - Code:
if (TRIFECTA_MODE === 'block' && (mcpInfo.concentrated || sensitiveExfil)) { process.stderr.write('BLOCKED: Rule of Two violation ...'); process.stdout.write(JSON.stringify({ decision: 'block' })); process.exit(2); } - Defect: The
blockpath is gated on a high-confidence sub-condition — MCP-concentrated (all three legs via the same MCP server) OR a sensitive-path + exfil pair. A trifecta that distributes its three legs across different sources (e.g., untrusted leg from WebFetch, sensitive data leg from Read to a non-sensitive path, exfiltration leg from Bashcurlto a non-sensitive sink) is detected (the warning is emitted at line 803), but never blocked. - PoC scenario:
- Agent reads untrusted content from a GitHub issue via MCP server A.
- Agent reads user data from
./user-db.sqlite(not a sensitive path per the scanner's static list). - Agent exfiltrates via
curlto a newly registered domain (not on the known-sink allowlist). All three legs are detected. The hook emitsformatWarning(...).mcpInfo.concentratedisfalse(different servers),sensitiveExfilisfalse(no~/.ssh,~/.aws, etc., and no obvious cred exfil). Result: theblockbranch is skipped, the caller proceeds.
- Impact: Users who configure
LLM_SECURITY_TRIFECTA_MODE=blockreasonably expect that any detected lethal trifecta is blocked. In the current code, only a subset is. The documentation (CLAUDE.md §"Hooks", line 58) describesblock|warn|offwithout qualification. This is a mismatch between documented behaviour and code behaviour. - Fix options:
- Strict: remove the
(mcpInfo.concentrated || sensitiveExfil)gate inside theblockbranch — block on any detected trifecta in block mode. - Tiered: expose a second env var, e.g.,
LLM_SECURITY_TRIFECTA_BLOCK_STRICTNESS=high|all, and document thatblocktoday implementshighonly. - Update the documentation in CLAUDE.md and README.md to make the high-confidence gate explicit, so the mismatch is removed.
- Strict: remove the
- Severity: HIGH. False sense of security for any operator who enables
block.
B3 — riskScore({info: N}) = 0 silently masks info-volume findings — MEDIUM
- File:
scanners/lib/severity.mjs:32-46 - Code: The
riskScorefunction inspectscritical,high,medium,low. Theinfocount is ignored. - Defect: Any scanner that mis-tiers findings as
infocontributes nothing to the risk score, the verdict, or the band. A scanner configured incorrectly (or an adversary who targets a scanner's tiering logic, e.g., by crafting strings that match atier_downgradeheuristic) can accumulate arbitrary numbers of findings without affecting the verdict. - Honest characterisation: Ignoring info in a risk aggregate is a reasonable design choice on its own. The problem is the combination with (a) the
infoseverity being a legitimate tier inSEVERITY(line 4-10), (b) theowaspCategorizefunction (line 218) countinginfofindings, and (c) no documentation anywhere stating that info is scoring-inert. An operator looking at a report that counts 400 info findings has no way to know these contribute zero to the final band. - Fix options:
- Document explicitly in
severity.mjsJSDoc and CLAUDE.md thatinfois excluded from scoring. - Add an
infoScore()helper that returns a supplementary 0-10 score — useful for trend monitoring without affecting verdicts. - Add a floor contribution: e.g.,
score = max(score, 1 + min(5, log2(info + 1) * 1.5))wheninfo >= 50, so large info volumes produce at least a Low band.
- Document explicitly in
- Severity: MEDIUM. This is primarily a honesty / observability issue, not a detection issue.
B4 — CHANGELOG / JSDoc tier example arithmetic is wrong — LOW (honesty)
- Files:
CHANGELOG.md:11,scanners/lib/severity.mjs:23(JSDoc),CLAUDE.md:7 - Claim: "Critical present → 70–95 (1=80, 2=86, 4=90, 10=95)"
- Verification: From
riskScoreatseverity.mjs:32-46:base = 70 + min(25, log2(critical + 1) * 10) critical=1 → 70 + min(25, log2(2)*10) = 70 + 10.00 = 80.00 → 80 ok critical=2 → 70 + min(25, log2(3)*10) = 70 + 15.85 = 85.85 → 86 ok critical=4 → 70 + min(25, log2(5)*10) = 70 + 23.22 = 93.22 → 93 mismatch (docs say 90) critical=10 → 70 + min(25, log2(11)*10) = 70 + 34.59 → capped → 95 ok - Defect: The "4 = 90" entry in the CHANGELOG, the JSDoc at
severity.mjs:23, and the CLAUDE.md summary at line 7 all misstate the formula's output. The formula returns 93. - Fix: Either (a) correct all three doc locations to
4=93, or (b) adjust the formula (e.g.,log2(critical+1) * 9or a different carry) to actually yield 90 for 4 criticals. Option (a) is strongly preferred; the formula is the ground truth and the docs follow. - Severity: LOW. But corrosive: the v7.0.0 pitch is "trustworthy scoring", and the flagship documentation example is arithmetically wrong.
B5 — Entropy scanner can miss secrets inside code-heavy files that look GLSL-shaped — MEDIUM
- File:
scanners/entropy-scanner.mjs:237-238, suppression list at lines 30-34. - Defect: The v7.0.0 entropy scanner skips files with extensions
.glsl|.frag|.vert|.shader|.wgsl|.css|.scss|.sass|.less|.svg|.min.*|.mapto cut shader / CSS / minified-JS false positives. The per-line suppression rules (inside the scanner) include GLSL-keyword heuristics. Where this goes wrong: a.tsfile containing inline shader source or CSS-in-JS templates can accumulate line-level suppressions while a genuine high-entropy secret embedded in the same file is dismissed because the context "reads GLSL-like". - PoC sketch: A TypeScript file containing an inline GLSL-shaped block and a credential-looking high-entropy string on the next line. The line containing the credential has a GLSL-keyword-bearing neighbour; the line suppression heuristic can short-circuit the classification.
- Fix: Replace line-proximity suppression with a two-stage pipeline: first classify the file context (shader-dominant vs code-dominant vs markup-dominant), then apply per-line rules scoped to that context. Do not allow GLSL-suppression rules to fire inside
.ts/.js/.py/.gofiles. - Severity: MEDIUM. Real false-negative risk in polyglot files common in modern frontend projects.
B6 — Taint-tracer ignores destructured and spread assignments — MEDIUM
- File:
scanners/taint-tracer.mjs:175-182(extraction of assigned variable name) - Defect: The tracer's
extractAssignedVariablerecognises plain assignments (const x = req.body,let y = process.argv[1]), but not destructuring or spread:
Sinks that use any ofconst { secret: userInput } = req.body; // userInput untainted per current tracer const [input, ...rest] = process.argv; // input / rest untainted const { a, b: { c } } = req.body; // c untainteduserInput,input,rest,cdownstream will not be flagged. - Fix: Extend the extractor to recognise object-destructuring, array-destructuring, and rest patterns. This is a pure parser-level change; the taint propagation downstream is already correct.
- Severity: MEDIUM. Common modern JS/TS pattern; the gap yields false negatives rather than false positives.
B7 — Levenshtein <= 2 threshold lets many real typosquats through — MEDIUM
- File:
scanners/dep-audit.mjs:307, 326, andscanners/supply-chain-recheck.mjs - Defect: The dep-audit gate flags distance=1 as HIGH and distance=2 as MEDIUM; distance >= 3 is not flagged. But many real-world typosquats have distance >= 3:
lodashvslodash-utils(distance = 6),reactvsreactjs-utils(distance = 8),expressvsexpresss(distance = 1, captured) versusexpressvsexpressjs-wrapper(distance > 2, missed). Common token-injection typosquats (-utils,-helper,-core,-plus) are exactly the attack pattern that distance-based matching fails on. - Fix: Combine Levenshtein with tokenisation:
- Split package names on
-and_. - Flag if any token set is a strict subset (or the top-N-overlap) of a known popular package's token set.
- Keep Levenshtein <= 2 as a complementary signal, not the sole gate.
- Split package names on
- Severity: MEDIUM. There is an existing allowlist (v7.0.0 expansion: 22 npm + 5 PyPI) that partially compensates by reducing false positives for short-name tools; this fix targets the false negative side.
B8 — "CaMeL-inspired SHA-256 provenance tracking" is a 200-byte substring fingerprint — HIGH (honesty)
- File:
hooks/scripts/post-session-guard.mjs:655-658(computeDataTag), CLAUDE.md:184. - Code:
function computeDataTag(text) { const sample = text.slice(0, 200); return createHash('sha256').update(sample).digest('hex').slice(0, 16); } - Claim (CLAUDE.md:184): "CaMeL-inspired data flow tagging (SHA-256 provenance tracking, output to input linking)."
- Reality: The mechanism hashes the first 200 characters of a tool's output, truncates to a 16-hex tag, and looks for a byte-wise substring of that output in a later input. This is neither semantic lineage nor robust provenance:
- Append one byte to the head of the output: new hash, no match.
- Insert a whitespace character in the first 200 characters: no match.
- Summarise or translate the output before passing it onward: no match.
- Encode the output (base64, hex, quoted-printable) before using it: no match.
- Contrast with CaMeL (DeepMind, 2025): CaMeL uses typed capability objects, explicit control-vs-data-channel separation, and policy-checked re-entry of data into privileged sinks. The present implementation shares the ambition but not the mechanism.
- Fix options:
- Rename in docs: "opportunistic byte-matching of truncated output fingerprints (first 200 bytes, SHA-256/16-hex tag; not semantic lineage)."
- Strengthen the matching: use n-gram fingerprints over the full output, or content-defined chunking (CDC), or Rabin fingerprints, so that small edits still match.
- Split the feature: keep the current lightweight tag as
session-guard provenance hint, and scope any future CaMeL-style tracking as a separate, clearly labelled module.
- Severity: HIGH (honesty). The current phrasing promises behaviour the code cannot deliver. Operators building threat models on that promise are mis-calibrated.
3. Coverage Gaps
This section lists threat vectors that the plugin does not attempt to cover, and where a reasonable operator might expect it to. The plugin's Defense Philosophy section (CLAUDE.md §"Defense Philosophy (v5.0)") is commendable for acknowledging that prompt injection is structurally unsolvable; what follows is narrower: specific vectors that are tractable but absent.
3.1 Agent-to-Agent (A2A) injection
- Vector: Two or more agents delegating work to each other via Task/Agent tools. An A2A injection attack plants malicious instructions in the output of one agent that a second agent then ingests as trusted context.
- Where it should be covered: A new hook or an extension of
post-session-guard.mjsthat tracks inter-agent delegation chains. Today,post-session-guardtracks delegation for the "escalation-after-input" advisory within a 5-call window (post-session-guard.mjs:830-840), but does not model the sub-agent's outputs as a potential injection source when re-entering the parent context. - Reference: OWASP ASI02 (Agent orchestration abuse), DeepMind Agent Traps category 4 (delegation).
- Suggested extension: Hash the outputs of sub-agent invocations, scan them with the same injection-pattern matcher used by
post-mcp-verify.mjs, and emit a MEDIUM advisory if matches are found in the parent context's next decision.
3.2 Multi-modal injection (EXIF / image steganography / PDF)
- Vector: Hidden instructions in image metadata (EXIF, XMP, IPTC), in image pixels (steganography), in PDF object streams, or in audio metadata (ID3 tags). An image attached to a prompt is not scanned today.
- Where it should be covered: A pre-tool-use hook that intercepts
Readon binary formats and runs a targeted metadata scanner. - Reference: 2025 research (
prompt-injection-research-2025-2026.mdwould ideally cite the multi-modal injection work from OpenAI and DeepMind; today the knowledge file focuses on text vectors). - Suggested extension: Add an
exif-scanner.mjs/pdf-object-scanner.mjsthat runs onReadof matching extensions; extract and scan text-bearing fields.
3.3 MCP 2.0 OAuth attacks
- Vector: MCP 2.0 introduces OAuth flows for MCP-server authentication. Client-side RBAC, consent phishing, and stale-token replay are the attack surfaces.
- Where it should be covered:
scanners/mcp-live-inspect.mjsandscanners/mcp-scanner-agentsystem prompt. - Reference: OWASP MCP10 (Insecure authentication), current MCP spec updates.
- Suggested extension: A static checker for OAuth config in
.mcp.jsonand live-inspect probe for authorization server metadata endpoints.
3.4 Skill marketplace poisoning (pre-deployment gate)
- Vector: A compromised skill in a public marketplace (Claude Code plugin marketplace, Anthropic skill store, etc.) is installed by a developer. The payload is dormant until specific conditions are met.
- Where it should be covered: A
pre-install-skill.mjshook that scans skill manifests before installation, analogous topre-install-supply-chain.mjsfor packages. - Reference: OWASP AST04 (Skill tampering), OWASP AST06 (Skill supply chain).
- Suggested extension: Integrate with the
plugin-auditcommand to run on install, not only on demand.
3.5 Terminal UI / ANSI escape injection
- Vector: Tool output containing ANSI escape sequences that, when rendered in the developer's terminal, hide instructions (e.g., using cursor-move codes, colour-matching background, or OSC 52 to inject clipboard content).
- Where it should be covered:
post-mcp-verify.mjsshould strip or flag ANSI escape sequences before any output is passed to the developer. Today, the scanner checks for HTML and Unicode Tag steganography but not ANSI. - Reference: 2025 terminal-injection work (iTerm2, Windows Terminal advisories).
- Suggested extension: Add an
ansi-stripstep inpost-mcp-verifythat either strips or raises MEDIUM on all sequences beyond a safe allowlist (SGR colour only).
3.6 OAuth token exfiltration via MCP tools
- Vector: A compromised MCP tool description instructs the model to read OAuth tokens from the environment or from keychain-mounted paths and transmit them via a subsequent tool call.
- Where it should be covered:
scanners/mcp-scanner-agentandpost-mcp-verify.mjs. - Reference: DeepMind Agent Traps category 6 (exfiltration). The trifecta detection partially covers this when the three legs land within the 20-call window, but slow-burn exfiltration can span longer.
- Suggested extension: Tag MCP tools that request environment variables or credential-adjacent paths and enforce a per-session audit of their subsequent outputs.
3.7 Prompt-cache poisoning
- Vector: Shared prompt caches (Anthropic's prompt cache, OpenAI's cached prompts) can be seeded with attacker-controlled content that is then served to later queries in the same cache key.
- Where it should be covered: Out of scope for a local hook-based tool in a strict sense, but the plugin does not acknowledge this vector in its Defense Philosophy or Known Limitations.
- Reference: 2025 prompt-cache research.
- Suggested extension: Add a note in
docs/security-hardening-guide.mdand in the Defense Philosophy section explicitly noting that shared prompt caches are out of scope.
3.8 Supply chain — Docker image layer inspection
- Vector: A Dockerfile pulls an image; a malicious layer within the image contains secrets-harvesting or reverse-shell code.
pre-install-supply-chain.mjschecks the image name against blocklists, not the image contents. - Where it should be covered: Depth extension in
pre-install-supply-chain.mjsor a dedicateddocker-layer-scanner.mjs. - Reference: 2024-2025 Sysdig and Snyk reports on malicious Docker Hub images.
- Suggested extension: Optional Trivy / Grype integration for image content scanning when the user opts in.
3.9 Web dashboard / fleet policy server
- Vector: Not a threat vector per se, but a capability gap. The plugin is per-machine; a CISO with 100 developers cannot see posture across the fleet, cannot push policy updates centrally, and cannot confirm that a security bulletin has propagated.
- Reference: Snyk, Semgrep Cloud, GitGuardian, Protect AI. All ship fleet-level consoles.
- Suggested extension: See §7 roadmap.
3.10 Compliance evidence pack
- Vector: EU AI Act Art. 15 (accuracy, robustness, cybersecurity) requires audit-ready evidence. The plugin produces JSONL audit trails (
lib/audit-trail.mjs) and can emit SARIF, but does not produce compliance-ready PDF or DOCX reports with NIST AI RMF / ISO 42001 / EU AI Act mappings, timestamps, and signed attestation. - Reference: EU AI Act Art. 15, NIST AI RMF 1.0, ISO 42001, MITRE ATLAS.
- Suggested extension: A
/security compliance-report [--framework eu-ai-act|nist|iso42001]command that emits a signed PDF.
4. Evasion Arsenal — 18 Proof-of-Concept Attacks
The 18 PoCs below are summarised from the evasion-arsenal and hook-bypass review agents (a2c19b9c36b5b955f, ad7770d76bb7df1f5). Each entry states the attack class, a compact payload sketch (literal payload bytes intentionally redacted to avoid triggering the plugin's own hooks during documentation review), which hook or scanner should own the detection, why it misses today, and a fix. The PoCs are intended as defensive test cases; the attack-simulator (scanners/attack-simulator.mjs) is the right home for the full corpus.
E1 — U+F0000 Private Use Area steganography
- Class: Unicode steganography.
- Payload sketch: Instructions encoded with characters from the Supplementary Private Use Area (U+F0000-F0FFF or U+100000-10FFFD) embedded in a prompt or tool output.
- Owner:
hooks/scripts/pre-prompt-inject-scan.mjs(containsUnicodeTagsor equivalent). - Why it misses: The current detector covers U+E0000-E007F (Unicode Tag block) but not PUA-A / PUA-B. PUA characters are invisible in most terminals and often pass through normalisation.
- Fix: Extend the Unicode-range match to cover U+E0000-E007F, U+F0000-FFFFD, and U+100000-10FFFD. Emit MEDIUM for PUA presence in prompt or tool output.
E2 — Multi-turn temporal evasion
- Class: Split injection.
- Payload sketch: Part 1 in prompt N establishes a rare trigger token. Part 2 in prompt N+5 associates that trigger token with a destructive action.
- Owner:
post-session-guard.mjs(long-horizon window). - Why it misses: The 100-call long-horizon window detects slow-burn trifectas (leg A and leg C > 50 calls apart) but does not correlate semantic instruction fragments across turns.
- Fix: Add an n-gram / keyword-memory tracker in
post-session-guardthat flags the re-use of a rare token across turns near a sink.
E3 — Base64-encoded instructions in source comments
- Class: Indirect injection.
- Payload sketch: A source-code comment block whose content is a base64 blob; the LLM decodes the comment while reading the file and executes the decoded instructions.
- Owner:
post-mcp-verify.mjs/pre-prompt-inject-scan.mjs. - Why it misses: The current injection-pattern matcher inspects literal strings but does not attempt base64 / hex / rot13 decoding.
- Fix: Add an opportunistic decoder that attempts base64/hex/rot13 on long comment blocks and re-scans the decoded content.
E4 — Markdown link title-attribute injection
- Class: Indirect injection via markup.
- Payload sketch: A Markdown link of the form
[text](url "injected instructions")where the attacker's payload lives inside the title attribute (the quoted string after the URL). - Owner:
post-mcp-verify.mjs. - Why it misses: The scanner inspects link URLs and anchor text, not the title attribute inside the quotes after the URL.
- Fix: Extend the markdown parser to extract link titles and scan them with the same injection matcher.
E5 — SVG <desc> inline injection
- Class: Indirect injection via markup.
- Payload sketch: An SVG file with injected instructions inside a
<desc>element (or<title>,<metadata>,<foreignObject>). - Owner:
post-mcp-verify.mjs/entropy-scanner.mjs. - Why it misses: v7.0.0 expanded the entropy scanner to skip
.svgfiles (reduces false positives on base64 PNG payloads). The HTML content trap detection inpost-mcp-verifydoes not cover SVG-specific tags. - Fix: Treat SVG as HTML-like for injection scanning, not as a binary asset. Parse
<desc>,<title>,<metadata>, and<foreignObject>and scan their text content.
E6 — CSS content: property injection
- Class: Indirect injection via markup.
- Payload sketch: A CSS rule using the
content:property to embed text instructions that the model reads. - Owner:
post-mcp-verify.mjs/ newcss-scanner.mjs. - Why it misses: The v7.0.0 entropy scanner skips
.css/.scss/.sass/.less, and no other scanner inspects CSS. - Fix: Add a targeted CSS scanner that extracts
contentproperty values and scans them. Limit scope to tool output, not source-code files.
E7 — HTML comment payload injection
- Class: Indirect injection via markup.
- Payload sketch: HTML comment nodes containing injection text.
- Owner:
post-mcp-verify.mjs(HTML content trap detection). - Why it misses: The HTML trap detector inspects visible tags and scripts; it does not systematically scan comment content.
- Fix: Extend the HTML parser to extract all comment nodes and feed them to the injection matcher.
E8 — Bash T7: process substitution
- Class: Command obfuscation.
- Payload sketch: A destructive command hidden inside a
<(...)or>(...)process-substitution expression. - Owner:
hooks/scripts/pre-bash-destructive.mjs,scanners/lib/bash-normalize.mjs. - Why it misses: v5.0's
bash-normalize.mjscovers T1-T6 (empty quotes,${}, backslashes, tabs,${IFS}, ANSI-C hex). Process substitution (<(...),>(...)) is not normalised; the hostile command is never re-constructed before the destructive-command matcher runs. - Fix: Add a T7 rule: collapse
<(...)and>(...)into their inner command text for matching purposes.
E9 — Bash T8: base64 indirect exec
- Class: Command obfuscation.
- Payload sketch: A pipeline of the form
echo <base64-blob> | base64 -d | bashwhere the decoded payload is never visible in the raw command string. - Owner:
pre-bash-destructive.mjs. - Why it misses: The decoded payload is never present in the raw command string before the decode step runs.
- Fix: Detect the
base64 -d | <shell>idiom as HIGH per se, independent of payload content. A legitimate use is rare and can be allowlisted.
E10 — Bash T9: eval via variable indirection
- Class: Command obfuscation.
- Payload sketch: Assign the destructive command string to a variable, then run
eval "$VAR". - Owner:
pre-bash-destructive.mjs,bash-normalize.mjs. - Why it misses:
evalis detected directly;eval "$VAR"withVARassigned earlier requires simple forward-flow variable tracking, which is not performed. - Fix: Add one-level variable tracking in
bash-normalize: when a variable is assigned a string and then passed toeval, substitute the literal before matching.
E11 — GitHub Actions expression injection
- Class: Workflow injection.
- Payload sketch: A Git commit message or pull request title containing a shell-break payload consumed by a workflow that uses
${{ github.event.head_commit.message }}or similar in arun:block. - Owner: A new
workflow-scanner.mjsor an extension ofscanners/git-forensics.mjs. - Why it misses: No scanner inspects
.github/workflows/*.ymlfor user-controlled expressions inrun:contexts. - Fix: Scan workflow YAML files for
${{ github.event.* }},${{ github.head_ref }}, etc., insiderun:blocks and emit HIGH.
E12 — .gitattributes filter driver post-clone
- Class: Supply-chain post-clone escalation.
- Payload sketch: A repository is cloned with
lib/git-clone.mjs's sandbox enabled; a.gitattributesfile inside the clone references a filter/smudge driver that triggers arbitrary code on later checkouts outside the sandbox. - Owner:
lib/git-clone.mjs/post-mcp-verify.mjs. - Why it misses: The sandbox covers the initial clone;
.gitattributeswritten to the temp clone can be carried forward if the user moves the clone outside the temp dir. - Fix: Scan
.gitattributesinside the clone forfilter=/diff=/merge=directives and raise MEDIUM if present, regardless of destination.
E13 — npm scoped-package lifecycle injection
- Class: Supply-chain.
- Payload sketch:
npm install @benign-looking-scope/popular-namewhere the scope is registered by the attacker and the package name matches a popular unscoped package. - Owner:
pre-install-supply-chain.mjs. - Why it misses: The blocklist and typosquat detector operate on the unscoped name; scope-hopping (registering
@attacker/<popular-name>when the real is unscoped<popular-name>) is not explicitly flagged. - Fix: Detect install of
@scope/namewherenamematches an unscoped top-100 package name and the scope is not the package's known official scope.
E14 — MCP description drift under the Levenshtein 10% threshold
- Class: MCP rug-pull / slow-burn.
- Payload sketch: A hostile MCP server publishes a description drift of <= 9% per update, repeated weekly. After 10 weeks, the description is substantially different, but no single diff triggered the Levenshtein-drift alert.
- Owner:
scanners/lib/mcp-description-cache.mjsandpost-mcp-verify.mjs. - Why it misses: Drift is measured per-update, not cumulatively against the oldest cached description.
- Fix: Also compute drift against the earliest cached description (baseline). Raise MEDIUM if cumulative drift >= 25% over any 12-week window.
E15 — Memory poisoning via .claude/agents/<name>.md
- Class: Memory poisoning.
- Payload sketch: A malicious agent file in
.claude/agents/with embedded instructions targeting session state. - Owner:
scanners/memory-poisoning.mjs. - Why it misses: The memory-poisoning scanner inspects
CLAUDE.md, top-level memory files, and.claude/rules, but not.claude/agents/*.md. - Fix: Add
.claude/agents/*.mdto the scanner's target glob.
E16 — Pre-compact homoglyph bypass
- Class: Obfuscation.
- Payload sketch: An injection prompt that replaces Latin letters with visually identical Cyrillic letters (e.g., Cyrillic
oU+043E for LatinoU+006F). - Owner:
pre-prompt-inject-scan.mjs/ a pre-compact variant. - Why it misses: Pattern matches for well-known injection phrases are done on the raw input before normalisation in some code paths; homoglyph-normalisation is MEDIUM-advisory in v5.0 but not integrated into every matcher.
- Fix: Run NFKC / homoglyph-folding before every injection-pattern match, not only before the obfuscation advisory.
E17 — Escalation-after-input 6-call bypass
- Class: Delegation.
- Payload sketch: Untrusted input at call N, then wait 6 tool calls, then delegate to a sub-agent at call N+6.
- Owner:
post-session-guard.mjs(checkEscalationAfterInput). - Why it misses: The advisory fires only within a 5-call window after the input event.
- Fix: Make the window configurable (env var
LLM_SECURITY_ESCALATION_WINDOW) with a documented default of 5; also add a secondary longer-window advisory at MEDIUM severity for delegation within 20 calls.
E18 — Secret exfiltration via Markdown image URL (rule 18 abuse)
- Class: Data exfiltration.
- Payload sketch: A Markdown image reference whose URL query string carries a high-entropy credential value, constructed so that v7.0.0's new "markdown image URL" suppression rule dismisses the high-entropy string.
- Owner:
scanners/entropy-scanner.mjs. - Why it misses: Rule 18 was added in v7.0.0 to suppress false positives on CDN-hosted image URLs. It dismisses legitimate high-entropy strings and also dismisses embedded secrets in the same shape.
- Fix: Refine rule 18 to only suppress strings that match the host/path structure of known CDN patterns (e.g.,
cdn.*,images.*,*.amazonaws.com/s3/*) and not arbitraryhttps://...?...query strings. Alternatively, run an explicit secret-pattern match inside the URL's query before suppression takes effect.
5. Scoring Model Critique
The v7.0.0 scoring rework is a substantive improvement over v1. The v1 model (scanners/lib/severity.mjs riskScoreV1, kept for reference) summed weighted counts and capped at 100 — any scan with more than a handful of findings collapsed to 100, making the score useless as a signal. The v2 model is severity-dominated (one critical always lands in the 70-95 tier) and log-scaled within tier (additional findings of the same severity increase the score but with diminishing returns). The design decisions below are nonetheless worth flagging.
5.1 Tipping points (verified)
From severity.mjs:32-46:
| Input | Score | Verdict | Band |
|---|---|---|---|
{} |
0 | ALLOW | Low |
{low: 1} |
4 | ALLOW | Low |
{low: 10} |
11 | ALLOW | Low |
{medium: 1} |
20 | WARNING | Medium |
{medium: 5} |
28 | WARNING | Medium |
{medium: 50} |
33 | WARNING | Medium |
{high: 1} |
48 | WARNING | High |
{high: 5} |
60 | WARNING | High |
{high: 7} |
64 | WARNING | High |
{high: 8} |
65 | BLOCK | Critical |
{high: 17} |
65 | BLOCK | Critical |
{critical: 1} |
80 | BLOCK | Critical |
{critical: 2} |
86 | BLOCK | Critical |
{critical: 4} |
93 (docs say 90 — see B4) | BLOCK | Extreme |
{critical: 10} |
95 | BLOCK | Extreme |
Observations:
- The high → critical verdict transition at exactly 8 high findings is a sharp step. A scan with 7 high + 5 medium findings receives WARNING, band High, score 64. One additional high pushes it to BLOCK. For an adversary who wants to avoid BLOCK while landing real attacks, the practical ceiling is 7 high findings.
- The medium tier has an effective ceiling of 35 (medium=50 → 33; medium=1000 → 33 after rounding). A project with hundreds of medium findings looks identical in score to a project with five. This is the log-scaling doing its job; it is also a volume-blindness that should be documented.
- Info is scoring-inert (see B3).
5.2 Adversarial under-BLOCK landings
An attacker who understands the tier structure can optimise for:
- Land in the High band without crossing the BLOCK line. 7 high findings + any number of medium + any number of low. This gives score = 64, verdict = WARNING. A developer habituated to WARNING is likely to proceed.
- Land 1 critical and accept the BLOCK. One false-positive critical (a test fixture secret, a deliberate placeholder) can block legitimate work. In that case the operator's reaction is to tune, suppress, or disable the offending rule — which reduces detection capacity for future real criticals.
5.3 Tier gap
The medium tier ends at 35 and the high tier begins at 40. A medium-heavy scan cannot reach the high band by accumulation alone; it must have at least one high finding. This is the explicit design intent ("severity-dominated") but has a consequence: a project with systemically many medium findings (e.g., a large legacy codebase) is perpetually scored "Medium" even when the cumulative risk is substantial. The v2 formula is honest about this; the docs should be, too.
Fix suggestion: Add an escape hatch.
if (medium >= 20) base = Math.max(base, 40); // medium volume bridge
and document it. The bridge only fires when medium count is volumetrically notable; it does not suppress the severity-dominated principle.
5.4 Verdict / band co-monotonicity
The verdict thresholds at severity.mjs:74-79 and band thresholds at severity.mjs:93-99 are aligned (BLOCK >= 65 co-occurs with Critical/Extreme band; WARNING >= 15 co-occurs with Medium/High band). This is good: operators can reason about a single number. The alignment should be asserted in a test to prevent future drift.
5.5 Info dead-weight
See B3. The three options (document, supplementary score, floor contribution) each have trade-offs. The simplest honest change is documenting the behaviour. The most useful extension is a supplementary info-volume trend per scan target, exposed in the dashboard aggregator.
6. Feature-Gap Matrix vs. Commercial Competitors
The matrix below is derived from the market-analysis agent (ade6518f8a6fcc0c6). Vendors compared: Snyk, Semgrep, GitGuardian, Protect AI, Lakera, HiddenLayer. Each cell is Y / N / partial / (—). Status column: Leading / Competitive / Behind / Missing.
| Feature | llm-security |
Snyk | Semgrep | GitGuardian | Protect AI | Lakera | HiddenLayer | Status |
|---|---|---|---|---|---|---|---|---|
| SAST / code scanning | N | Y | Y | N | N | N | N | N/A (out of scope) |
| SCA / dependency audit | Y (dep-audit + supply-chain-recheck) | Y | Y | N | Y | N | N | Competitive |
| Secrets detection | Y (entropy + patterns) | partial | Y | Y | N | N | N | Competitive |
| IaC / workflow scanning | N | Y | Y | N | N | N | N | Behind |
| Container / image scanning | partial (name-blocklist only) | Y | N | N | Y | N | N | Behind |
| Web dashboard | N | Y | Y | Y | Y | Y | Y | Behind (critical gap) |
| Fleet policy server | N | Y | Y | Y | Y | Y | Y | Behind (critical gap) |
| IDE real-time scanning | partial (extension scan) | Y | Y | N | N | N | N | Behind |
| IDE extension scanning | Y (VS Code + JetBrains) | N | N | N | N | N | N | Leading |
| MCP static audit | Y | N | N | N | N | N | N | Leading |
| MCP live inspection | Y | N | N | N | N | N | N | Leading |
| Bash obfuscation normalisation | Y (T1-T6) | N | N | N | N | N | N | Leading |
| Unicode Tag steganography detection | Y | N | N | N | partial | partial | N | Leading |
| Prompt injection static scan | Y | N | N | N | partial | Y | Y | Competitive |
| Runtime prompt firewall / filter | N | N | N | N | Y | Y | Y | Behind (critical gap) |
| Model weight scanning | N | N | N | N | Y | N | Y | Behind |
| AI-BOM generation | Y (CycloneDX 1.6) | partial | N | N | Y | N | N | Competitive |
| Adaptive red-team harness | Y (64 scenarios + mutations) | N | N | N | Y | Y | N | Leading |
| Trifecta / Rule of Two detection | Y | N | N | N | N | N | N | Leading |
| Interactive threat modelling (STRIDE/MAESTRO) | Y | N | N | N | N | N | N | Leading |
| Sandboxed clone / VSIX fetch | Y | N | N | N | N | N | N | Leading |
| False-positive ML feedback loop | N | Y | Y | Y | N | N | N | Behind |
| SIEM-native integration | partial (SARIF + JSONL) | Y | Y | Y | Y | Y | Y | Behind |
| Enterprise ticketing (Jira, ServiceNow) | N | Y | Y | Y | N | N | N | Behind |
| Chat integration (Slack, Teams, PagerDuty) | N | Y | Y | Y | N | N | N | Behind |
| Compliance PDF/DOCX reports | N | Y | Y | Y | Y | N | Y | Behind (critical gap) |
| EU AI Act audit-evidence pack | N | N | N | N | partial | N | partial | Behind |
| Offline / air-gapped operation | Y | N | partial | N | N | N | N | Leading |
| Open source + MIT | Y | N | partial (Semgrep CE) | N | N | N | N | Leading |
Summary: llm-security leads on 11 features, is competitive on 5, behind on 8 (of which 4 are critical gaps: web dashboard, fleet policy server, runtime firewall, compliance reporting), and out of scope on 1 (SAST).
The plugin is genuinely unique in its combination of MCP auditing, IDE extension prescan, trifecta detection, bash evasion normalisation, interactive threat modelling, sandboxed remote fetch, AI-BOM generation, and an adaptive red-team harness — no competitor in the matrix combines these. The gap is not capability per se; it is enterprise integration, central visibility, and compliance deliverables.
7. Roadmap Recommendation — 10 Features Ranked
Ranking criterion: (market value) x (strategic differentiation) / complexity. Ties broken by dependency-order (features that unblock others rank higher).
Rank 1 — Web dashboard + fleet policy server
- Problem: The plugin is per-machine. A CISO with a fleet of developers cannot see aggregate posture or push policy updates. The most frequent objection from enterprise review is "we can't see it".
- Complexity: L. Requires a server (Node, Deno, or Go), a persistence layer (SQLite minimum, Postgres for multi-tenant), an auth model (OIDC / SAML), and a policy-push protocol. Dashboard UI is separate work.
- Market value: Critical. Without this, the plugin cannot be sold to regulated enterprise.
- Dependencies: None; the plugin already emits SARIF and JSONL that can be ingested.
- Suggested phasing: (a) Read-only dashboard consuming
reports/JSONL + SARIF; (b) authenticated fleet policy server; (c) web UI.
Rank 2 — Runtime prompt firewall mode
- Problem: All current protection is static. A running Claude Code session receives a prompt injection via tool output;
post-mcp-verify.mjsdetects and warns, but the content has already been ingested by the model. A prompt firewall would sit in the IO path and strip / rewrite injections before the model sees them. - Complexity: L. Requires intercepting the tool-output stream, applying a fast classifier, and rewriting or refusing the content. Performance constraints are strict (<50 ms per intervention).
- Market value: Critical. Lakera Guard, Protect AI, and Prompt Guard sell primarily on this capability.
- Dependencies: Claude Code hook API stability; possibly a new hook event for intercepting tool output streams.
- Suggested phasing: (a) Offline batch mode (rewrite after the fact, for post-hoc demonstration); (b) online real-time mode.
Rank 3 — IDE real-time scanning (VS Code + JetBrains plugins)
- Problem: The plugin can scan installed extensions but cannot scan the developer's own code in the editor. Snyk and Semgrep both ship IDE plugins that scan on save and annotate with squigglies.
- Complexity: M. Requires VS Code and JetBrains plugin shells, LSP-style integration with the existing scanners, and bidirectional config sync.
- Market value: High. Developer ergonomics; shifts scanning from an occasional CLI run to a continuous feedback loop.
- Dependencies: Stable CLI entry point (already present:
bin/llm-security.mjs).
Rank 4 — Compliance reporting pack
- Problem: EU AI Act Art. 15 audit evidence, NIST AI RMF and ISO 42001 mappings, and MITRE ATLAS heat-maps are all CISO-facing deliverables. Today the plugin emits JSONL and SARIF.
- Complexity: M. Template-driven PDF / DOCX generation with signed timestamps.
- Market value: High. Required for regulated verticals; unlocks health, finance, public sector.
- Dependencies: Audit-trail module (
lib/audit-trail.mjs), posture scanner, OWASP mappings — all already present.
Rank 5 — Enterprise ticketing + chat integrations
- Problem: Findings reach developers but not tracking systems. No Jira / ServiceNow / Slack / Teams / PagerDuty push today.
- Complexity: S-M. Each integration is small; the aggregate is medium.
- Market value: High. Operational requirement for any security team that runs ticket-driven workflows.
- Dependencies: Policy / routing configuration in
.llm-security/policy.json.
Rank 6 — False-positive ML feedback loop
- Problem: Suppressions today are static (
.llm-security/policy.jsonrules, file-level skip, entropy rule allowlists). Snyk and Semgrep learn from user dismissals and re-rank findings. - Complexity: M. Requires a feedback record format, a light ranker (gradient-boosted trees or a simple logistic model), and a privacy-preserving training loop (on-device preferred).
- Market value: Medium-High. Reduces noise, which is the biggest operational tax on static scanners.
- Dependencies: Dashboard (rank 1) helps with multi-machine aggregation, but a local-first version is possible.
Rank 7 — Multi-modal / EXIF / PDF injection scanner
- Problem: Image and PDF inputs are not scanned. Multi-modal injection is a 2025-2026 growth vector.
- Complexity: M. Pure metadata scanning is small; pixel-level steganography detection is larger and probably out of scope for v1.
- Market value: Medium-High. Competitive differentiator if delivered early.
- Dependencies:
ExifToolor a Node-native EXIF parser; a PDF object parser.
Rank 8 — MCP 2.0 OAuth audit
- Problem: MCP 2.0 OAuth introduces new attack surfaces (consent phishing, token replay, scope creep).
- Complexity: S-M. Extension to
mcp-live-inspect.mjsandscanners/mcp-scanner-agent. - Market value: Medium-High. Aligned with MCP 2.0 adoption timeline.
- Dependencies: MCP 2.0 spec stability.
Rank 9 — Terminal / ANSI escape injection scanner
- Problem: Tool output rendered in the terminal can hide instructions via ANSI escapes. Not covered today.
- Complexity: S. A regex-based stripper + allowlist for safe SGR sequences.
- Market value: Medium. Low-frequency vector but high-impact when it lands.
- Dependencies: None.
Rank 10 — Skill marketplace pre-deployment gate
- Problem: A skill pulled from a public marketplace can be compromised. Today the plugin offers
/security plugin-auditon demand; it does not hook skill installation. - Complexity: S-M. Requires knowledge of Claude Code's skill install event (or a wrapper CLI).
- Market value: Medium. Protective against a vector that will grow as the marketplace grows.
- Dependencies: Claude Code skill install hook availability.
8. Code-Quality Observations
8.1 Documentation / arithmetic mismatch
CHANGELOG.md:11andseverity.mjs:23JSDoc — see B4. The arithmetic disagrees with the code. The discrepancy undermines the v7.0.0 trustworthy-scoring headline.
8.2 Hook configurability vs. documentation
post-session-guard.mjs:814-826— theblockpath is gated by(mcpInfo.concentrated || sensitiveExfil). The CLAUDE.md Hooks table describesblock|warn|offwithout that qualifier. See B2.
8.3 Regex precision
pre-write-pathguard.mjs:21-25— theENV_PATTERNSregex is too restrictive. See B1.
8.4 Dead / low-value code
severity.mjs:55-63—riskScoreV1is kept for diff/comparison. The function is unused in production paths but exported. Consider marking@deprecatedin JSDoc or moving todocs/legacy/to signal intent.
8.5 Test gaps
- The scoring tipping points in §5.1 are covered by existing tests per the CLAUDE.md "1487 tests" claim, but there is no test that pins 4 critical = 93 (the documented wrong value would not be caught by a test that asserts the documented value).
- No test that pins the verdict/band co-monotonicity invariant from §5.4.
- No mutation-testing coverage numbers are published (CLAUDE.md mentions unit/integration tests only).
- The destructuring / spread case in B6 has no test coverage in
taint-tracer. - The path-guard
.env.X.Y.Zcase in B1 has no test coverage in the pathguard test suite. - The distributed-trifecta (non-MCP-concentrated, non-sensitive-path) BLOCK-mode behaviour in B2 is not asserted by a test.
8.6 Duplication
lib/git-clone.mjsandlib/vsix-sandbox.mjsshare the same sandbox-profile construction logic. v6.5.0's consolidation was partial; a small amount of duplication remains. A sharedlib/sandbox.mjswould reduce the risk of divergence.
8.7 Configuration surface
- Multiple env vars (
LLM_SECURITY_INJECTION_MODE,LLM_SECURITY_TRIFECTA_MODE,LLM_SECURITY_UPDATE_CHECK, etc.) are scattered across hooks. A single.llm-security/policy.jsonsection that mirrors them would reduce surprise.
8.8 Naming
computeDataTagatpost-session-guard.mjs:655is technically correct but the surrounding comments (line 646: "CaMeL-inspired data flow tagging") set an expectation the function cannot meet. Either the function is renamed tocomputeOutputFingerprintor the comments are toned down. See B8.
8.9 Error handling
- The scanners consistently use JSON output, but error recovery is uneven. A malformed
.llm-security/policy.jsonleads to scanner warnings in some paths and silent fallbacks in others. A singlereadPolicy()helper with a documented fallback chain would reduce ambiguity.
8.10 Cross-reference bit-rot risk
CLAUDE.mdcontains a very detailed table of hooks, scanners, and knowledge files with line-level claims. A change to a single hook behaviour requires updates in four places (hook source, CLAUDE.md, README, CHANGELOG). A lightweight doc-consistency test (e.g., a script that asserts that the number of hooks listed in.claude/hooks.jsonmatches the table in CLAUDE.md) would catch drift.
9. Honesty Check
The following quotes were verified in the referenced source files during this review. Each is paired with the behaviour the code actually implements, and with a suggested replacement that trades slightly more words for a substantially more accurate description.
| # | Quote (source) | Reality | Suggested replacement |
|---|---|---|---|
| 1 | "enforces the Rule of Two" — CLAUDE.md:182 | TRIFECTA_MODE default is warn; block is opt-in; even in block, only high-confidence (MCP-concentrated OR sensitive-path) trifectas actually block. See B2. |
"detects Rule of Two violations; blocks on high-confidence (MCP-concentrated or sensitive-path) trifectas in opt-in block mode; warns otherwise. Distributed trifectas are detected but not blocked by default." |
| 2 | "Fully Schrems II compatible" — CLAUDE.md:136, README.md | Standalone CLI is offline by default, but OSV.dev lookups in supply-chain-recheck.mjs transmit package identifiers to a Google-operated API. ci-cd-guide.md is accurate; CLAUDE.md is not. |
"Schrems II compatible in default offline mode. The optional OSV.dev enrichment (supply-chain-recheck with --online) transmits package identifiers to a Google-operated API and is a separate compliance consideration." |
| 3 | "CaMeL-inspired data flow tagging (SHA-256 provenance tracking, output to input linking)" — CLAUDE.md:184 | computeDataTag at post-session-guard.mjs:655 hashes the first 200 characters of tool output; flowMatch checks for a substring match in later inputs. This is byte-matching with early truncation, not semantic lineage. See B8. |
"Opportunistic byte-matching of truncated output fingerprints (first 200 bytes of tool output, SHA-256/16-hex tag). This is a lightweight heuristic, not semantic data-flow lineage; it fails on any output mutation or summarisation." |
| 4 | "defense-in-depth" — multiple locations | Accurate in spirit. The plugin does layer prompt-scan, pathguard, trifecta-guard, and post-mcp-verify. The claim is not quantified. | "Three independent detection layers with documented bypass classes (see docs/security-hardening-guide.md §6). Each layer is individually bypassable; the design intent is to raise attack cost, not to be a single enforcement line." |
| 5 | "Hardened ZIP extractor (zip-slip, symlink, bomb, ratio caps)" — CLAUDE.md §IDE scanner | Verified via lib/zip-extract.mjs. Bombs, zip-slip, symlinks, absolute paths, drive-letter paths, encrypted entries, and ZIP64 are all rejected. However no public fuzz-testing results are published. |
"Hardened ZIP extractor rejects zip-slip, symlink, absolute / drive-letter paths, encrypted entries, and ZIP64 bombs (capped at 10 000 entries, 500 MB uncompressed, 100 x ratio, depth 20). No fuzz-testing results published to date." |
| 6 | "1487 tests" — CLAUDE.md header | Accurate count. No mutation-testing coverage published. | "1487 unit and integration tests. Mutation-testing coverage is not published; the number is a test count, not a coverage metric." |
| 7 | "trustworthy scoring" — CHANGELOG v7.0.0 | Genuine improvement over v1, but the headline formula example is arithmetically wrong (4 critical = 93, not 90). See B4. | "Severity-dominated, log-scaled v2 scoring formula. Replaces the v1 sum-and-cap model that saturated all non-trivial scans to 100 / Extreme. See severity.mjs for the authoritative formula." |
| 8 | "1 critical = 80, 2 = 86, 4 = 90, 10 = 95" — CHANGELOG.md:11, severity.mjs:23 | riskScore({critical: 4}) = 93. See B4. |
Fix the arithmetic: 1 = 80, 2 = 86, 4 = 93, 10 = 95. |
| 9 | "Context-aware entropy scanner" — CLAUDE.md:7 | Extension skip + 8 line-suppression rules. Accurate description. "Context-aware" is a slightly generous framing for what is largely a rule-and-allowlist pipeline. | "Rule-based entropy scanner with file-extension skip, 8 line-level suppression rules, and configurable policy." |
| 10 | "calibration block reports skip counters" — CLAUDE.md:7 | Verified. Accurate. | No change. |
The honest headline for v7.0.0 should read approximately:
v7.0.0 replaces a broken scoring formula with a severity-dominated log-scaled model, expands the entropy scanner's suppression rules, and extends the typosquat allowlist. Two residual hook bugs and three over-claimed docs items (CaMeL provenance, Schrems II, Rule of Two "enforcement") remain and should be addressed in v7.1.
10. CISO Perspective
Question: would a CISO in a regulated enterprise (financial services, healthcare, public sector, defence) purchase / install llm-security v7.0.0 today?
Answer in two parts.
10.1 Yes, conditionally
The plugin is a credible baseline for:
- Individual developers and small teams already using Claude Code who want a free, open-source, offline-first second line of defence.
- Security research groups studying prompt injection, trifecta detection, MCP auditing, and Claude-Code-specific threat surfaces. The adaptive red-team harness and interactive threat modeller have no commercial equivalent.
- Air-gapped or Schrems-II-sensitive environments where sending data to cloud providers (Snyk, Semgrep Cloud, GitGuardian) is the showstopper. Standalone CLI mode is genuinely offline in default config.
- Norwegian public-sector pilots. The
knowledge/norwegian-context.mdfile aligns with Datatilsynet, NSM, and Digitaliseringsdirektoratet expectations. With the caveats in §9 corrected, the plugin is a defensible pilot baseline.
In these contexts, the plugin clears the bar: install, configure, review, ship.
10.2 No, not yet for regulated enterprise
A CISO at a bank, insurer, hospital, national-security agency, or large public-sector body would decline the plugin in its current form. The blockers, in rough priority order:
- No central dashboard. Security must be observable across a fleet. The plugin's reports live on each developer's machine. Even the dashboard-aggregator caches to a local file. A CISO cannot prove to an auditor that all 300 developers ran the scan in the last 30 days.
- No fleet policy push. Policy lives in per-machine
.llm-security/policy.json. A change (e.g., raising the trifecta mode fromwarntoblockafter an incident) must be rolled out by developer action, not by central push. - No SIEM-native integration. The plugin writes JSONL (
lib/audit-trail.mjs). Forwarding to Splunk, Sentinel, Elastic, or QRadar requires a custom collector. Commercial competitors ship native connectors. - No compliance-ready reporting. EU AI Act Art. 15 audit evidence, NIST AI RMF or ISO 42001 attestations, MITRE ATLAS heat-maps — none are produced today. SARIF and JSONL are technical artefacts, not audit evidence.
- No runtime protection. All current protection is static; once a prompt injection lands, detection is post-hoc. Regulators increasingly expect runtime controls (prompt firewalls, content filters, output guardrails).
- Claim-precision issues. The three honesty items in §9 (CaMeL provenance, Schrems II, Rule of Two "enforcement") would be challenged in a formal procurement review and would require written clarification.
- Bugs B1 and B2. A deterministic regex hole in a sensitive-path guard and a block-mode bypass on distributed trifectas are both things a pen-testing firm would find in the first week of engagement. They would not block procurement outright, but they would reduce confidence in the maturity story.
10.3 What a CISO would require before a production engagement
In priority order:
- Fix B1 (pathguard regex), B2 (distributed-trifecta block), B4 (CHANGELOG arithmetic), B8 (CaMeL docs), and the honesty quotes in §9. These are low-cost and high-signal; they close the gap between documentation and code.
- Ship a read-only web dashboard consuming existing JSONL / SARIF. Even v0.1 of a dashboard unblocks most of the fleet-visibility objection.
- Produce a compliance-evidence pack template. A PDF that mirrors posture-scanner output, OWASP category mapping, and audit-trail events — signed, timestamped, and frameable.
- Document runtime-protection gap explicitly. Add to Defense Philosophy: "This plugin is a static + hook-based layer. Runtime prompt filtering, model-level guardrails, and egress DLP are out of scope and must be addressed by complementary controls."
- Publish a security-architecture note. Diagram showing how the hooks compose, what each hook can and cannot see, and the explicit defence layers. One page. This is the single most asked-for artefact in enterprise security review.
- Commit to a 90-day bug-disclosure window. A named security contact (security@... or equivalent Forgejo path) and a documented handling SLA.
10.4 One-paragraph verdict
llm-security v7.0.0 is a serious contribution to open-source AI security tooling and is among the few plugins that address MCP, Claude Code hooks, and IDE extension provenance as first-class problems. It is not yet enterprise-ready. It is close enough that a B+ grade is within reach after one focused release cycle: fix B1 / B2 / B4 / B8, tone the docs to match the code, ship a minimal dashboard. Without that cycle, the plugin stays at B- and remains primarily a tool for individual developers, researchers, and pilot teams.
Appendix A — Verification Log
The following claims in this report were verified by reading the referenced source directly during the review, not relying solely on the earlier review agents:
- B1 pathguard regex —
hooks/scripts/pre-write-pathguard.mjs:21-25read directly. - B2 distributed-trifecta block gate —
hooks/scripts/post-session-guard.mjs:800-828read directly. - B4 arithmetic —
scanners/lib/severity.mjs:32-46read directly; computation70 + min(25, log2(5)*10) = 93.22 rounded to 93performed manually; CHANGELOG.md:11 and severity.mjs:23 JSDoc read directly to confirm the "4=90" string. - B8 CaMeL substring match —
hooks/scripts/post-session-guard.mjs:649-680read directly; CLAUDE.md:184 read directly. - Honesty quotes §9 items 1, 2, 3 — CLAUDE.md:136, 182, 184 read directly.
Findings B3, B5, B6, B7 and the evasion-arsenal PoCs E1-E18 are reported here with the file:line anchors provided by the specialist review agents (a12f1a90430b53a8c, a2c19b9c36b5b955f, ad7770d76bb7df1f5, af0552023740ad6b7). They were not each re-verified by direct source inspection during the synthesis step; the reader should treat those anchors as inspection targets rather than fully re-verified facts. The top-5 most serious findings in the Executive Summary (B1, B2, B8, "Fully Schrems II", B4) were all directly verified.
Appendix B — What was not reviewed
- Performance characteristics of the scanners under large repositories (>1M LOC).
- Windows behaviour (the reviewer is on macOS / Darwin). The plugin documents Windows fallbacks; they were not exercised.
- Cross-platform compatibility of the sandbox (bwrap on Ubuntu 24.04+ is documented as flaky; not exercised).
- Deep inspection of the AI-BOM output against CycloneDX 1.6 schema validators.
- Inter-operation with MCP servers running under different authentication schemes.
- The
threat-modeler-agentinteractive flow beyond a single synthetic run. - The
/security hardenauto-generation output against a diverse set of starting configurations.
These are candidates for a follow-up review cycle.