Kjell Tore Guttormsen 8df5d5c70e feat(llm-security): add lethal-trifecta + mcp-rug-pull examples [skip-docs]

Two new self-contained, runnable threat demonstrations under examples/:

- lethal-trifecta-walkthrough/ — feeds 5 hook calls (WebFetch, Read .env,
  Bash curl POST + suppression follow-ups) into post-session-guard and
  verifies the Rule-of-Two advisory fires exactly on leg 3. State
  isolated via run-script PID so /tmp/llm-security-session-*.jsonl is
  not polluted. Treffer post-session-guard, ASI01/ASI02, LLM01/LLM02.

- mcp-rug-pull/ — mutates an MCP tool description across 8 stages.
  Each per-update <10% Levenshtein, cumulative reaches 32.2% by stage
  7 — proves the v7.3.0 (E14) mcp-cumulative-drift MEDIUM advisory
  catches slow-burn rug-pulls that the per-update detection would
  miss. Uses LLM_SECURITY_MCP_CACHE_FILE to isolate cache. Treffer
  post-mcp-verify, mcp-description-cache.mjs, OWASP MCP05/LLM03/ASI04.

Each example: README.md + run-*.mjs + expected-findings.md.
Plugin README "Other runnable examples" section + CHANGELOG
[Unreleased] Added bullets + plugin CLAUDE.md "Examples" section
all updated in this commit. Marketplace root README unchanged
since plugin's outward coverage is unchanged ([skip-docs]
covers the marketplace-level gate).

2026-05-05 14:45:15 +02:00

69 KiB

Raw Blame History

Changelog

All notable changes to the LLM Security Plugin are documented in this file.

The format is based on Keep a Changelog.

[Unreleased]

Added

tests/e2e/ — three dedicated end-to-end suites that prove the framework works as a coordinated system, not just as isolated units:
- attack-chain.test.mjs (17 tests) — full hook stack against attack payloads in sequence: prompt injection at the gate; T1/T5/T8 bash evasion; pathguard on .env/.ssh; secrets hook on AWS-shaped keys and PEM headers; markdown link-title and HTML-comment poisoning in tool output; trifecta accumulation over a single session.
- multi-session.test.mjs (9 tests) — state persistence across simulated session boundaries: slow-burn trifecta with legs spread over 50+ calls; MCP cumulative description drift across small per-update changes that each fall under the 10% threshold but cumulatively cross 25% from baseline; pre-compact-scan blocking poisoned transcripts in block mode.
- scan-pipeline.test.mjs (19 tests) — orchestrator + all 10 scanners
  - toxic-flow correlator against the poisoned-project and grade-a-project fixtures: verdict, risk_score, risk_band, severity counts, OWASP coverage, scanner enumeration, and a narrative-coherence cross-check that BLOCK is genuinely worse than WARNING along every axis.
Test count: 1777 → 1822 (+45). All payloads matching credential regexes are assembled at runtime via concatenation, so test files contain no literal credential-shaped strings (compatible with pre-edit-secrets).
examples/lethal-trifecta-walkthrough/ — runnable demonstration of post-session-guard's Rule-of-Two advisory firing when a 5-call sequence (WebFetch → Read .env → Bash curl POST + suppressed follow-ups) closes the trifecta in a single 20-call window. State isolated via the run script's PID; the user's real /tmp/llm-security- session-* files are never touched. README explains the Rule of Two, the configurable mode (block/warn/off), and the OWASP mapping (LLM01/LLM02, ASI01/ASI02). expected-findings.md documents the testable contract.
examples/mcp-rug-pull/ — runnable demonstration of the v7.3.0 cumulative-drift advisory (E14, OWASP MCP05) on post-mcp-verify. Mutates an MCP tool description across 8 stages — each step under the 10% per-update Levenshtein threshold, but cumulatively crossing 25% from baseline at stage 7. Uses LLM_SECURITY_MCP_CACHE_FILE env override to isolate the cache to a per-run tempdir; the user's real ~/.cache/llm-security/mcp-descriptions.json is never touched. README enumerates the drift profile, points to /security mcp-baseline-reset for legitimate upgrades, and maps to MCP05 / LLM03 / ASI04.

[7.3.1] - 2026-05-01

Stabilization patch. No behavior changes. Sets the public stance, tightens documentation, and removes coherence drift so forkers and downstream organizations get a consistent starting point.

Added

CONTRIBUTING.md — public fork-and-own guide. Explains why PRs are not accepted on the upstream repo, how to fork well (rename plugin, change security contact, preserve LICENSE, re-establish trust), what is welcome via issues, and the bar for inline-diff suggestions the maintainer may apply directly.
README.md "Project scope" section — public statement of stabilization mode (effective 2026-05-01) plus an out-of-scope table naming what is fork-and-own territory: web dashboard, fleet policy server, runtime prompt firewall, IDE LSP, compliance PDF/DOCX pack, enterprise ticketing connectors, multi-tenancy, ML-based detectors, marketplace UI, SSO/SCIM/RBAC. Each row points at the commercial alternative (Snyk, Lakera, Vanta, Splunk SOAR, parry-guard, etc.).
package.json: bugs.url field, CONTRIBUTING.md / SECURITY.md / CHANGELOG.md added to the files whitelist so npm-published artifacts ship with full project documentation.

Changed

SECURITY.md rewritten. Supported-versions table moves from 5.1.x (stale since v6.0.0) to current reality: 7.3.x active, 7.0–7.2 best-effort, < 7.0 EOL. Adds explicit best-effort solo-project response timeline (7 days ack, 14 days triage, 30 days fix for High/Critical), expands scope list to cover bin/llm-security.mjs, and notes that out-of-scope vulnerabilities (e.g., adaptive ML-based bypass) get an explanatory response rather than silent ignore.
README.md "Feedback & contributing" section now links to CONTRIBUTING.md and the new "Project scope" section.
package.json URL fields corrected to point at the ktg-plugin-marketplace monorepo (the canonical home for this plugin). homepage now deep-links to plugins/llm-security/, repository.url uses the marketplace repo with a directory: "plugins/llm-security" field (npm convention for monorepo plugins), and bugs.url routes to the marketplace issue tracker. Earlier values referenced a standalone claude-code-llm-security repo that was never published — the plugin is distributed via the marketplace mechanism, not as an independent package.
CLAUDE.md "Public Repository" section replaced with a "Distribution" section that documents the marketplace install path and removes the stale standalone-repo references.
Scanner VERSION constants synced to plugin version. Previously dashboard-aggregator.mjs and posture-scanner.mjs reported 6.0.0 in scan output and SARIF, mismatching the actual plugin version. All three standalone scanners (dashboard-aggregator, posture-scanner, ide-extension-scanner) now report 7.3.1.

Fixed

tests/hooks/pre-compact-scan.test.mjs size-cap timing test ceiling raised from 500 ms to 1000 ms. The 500 ms hard cap was a flake source on Intel Mac and CI runners under load, while the design target (documented in CLAUDE.md) remains <500 ms. The test now catches order-of-magnitude regressions without breaking on hardware/CI noise.

Notes

This is the first patch on the stabilization line. Future 7.3.x releases will be limited to bug + security fixes and small knowledge-base refreshes that fit the existing deterministic architecture. v8.0.0 remains scheduled as the deprecation cleanup for the env vars and riskScoreV1 constant deprecated in v7.3.0; see "Project scope" in README.md for the longer-term direction.
Wave E (additional attack-simulator scenarios mentioned in the v7.3.0 changelog as "deferred to v7.3.1") is now deferred indefinitely. Coverage remains at 72 scenarios. Forkers who want broader red-team coverage are encouraged to extend knowledge/attack-scenarios.json.

[7.3.0] - 2026-05-01

Batch C release. Closes 12 implementation tasks (E3, E8-E14, 8.4, 8.6, 8.7, 8.10) across four execution waves: Wave A (bash evasion + decoder), Wave B (supply chain + workflow scanner), Wave C (MCP cumulative drift), Wave D (code quality). Wave E (9 new attack-simulator scenarios for the new defenses) deferred to v7.3.1 — the defenses themselves are unit-tested per wave; the deferred work adds attack-simulator regression coverage on top.

Added

E8 — T7 process-substitution normalization in scanners/lib/bash-normalize.mjs. Collapses <(cmd) and >(cmd) process-substitution wrappers so the inner command name is surfaced to downstream destructive-command name matchers in pre-bash-destructive.mjs. Defends against split-command evasion. Nested wrappers handled up to depth 3. Single-quoted literals masked before T7 runs to avoid corrupting string content.
E10 — T9 eval-via-variable normalization in scanners/lib/bash-normalize.mjs. Substitutes one-level variable assignments before destructive-name matching. One-level forward-flow only: chained-var attacks intentionally not followed (documented limit). Bare-form, curly-form, and double-quoted forms supported; single-quoted literals preserved.
E9 — T8 base64-pipe-shell BLOCK rule in hooks/scripts/pre-bash-destructive.mjs. Direct match on the base64-decode-pipe-into-shell loader idiom — blocks the encoded-payload runner pattern that bypasses static name-matching by delivering the destructive command as encoded text.
E3 — rot13 layer for hidden-imperative comment-block detection in scanners/lib/injection-patterns.mjs. The decoder is bounded in length to keep accidental rot13-look-alike short strings out of scope. Base64/hex/URL/HTML decoding is already done by normalizeForScan; the rot13 pass is the only genuinely new layer.
E12 — .gitattributes filter/diff/merge driver advisory in scanners/lib/git-clone.mjs. New scanGitAttributes(repoDir) exported helper plus post-clone integration in the clone CLI branch — surfaces filter, diff, and merge driver directives as MEDIUM advisories so downstream consumers see the supply-chain surface that survives even a sandboxed clone.
E13 — npm scope-hopping typosquat detection in hooks/scripts/pre-install-supply-chain.mjs. New shared NPM_OFFICIAL_SCOPES export from scanners/lib/supply-chain-data.mjs. When an install targets @<scope>/<name> where <scope> is unknown but <name> matches a popular unscoped package, the hook emits a MEDIUM advisory. Allowlist of legitimate scopes drives suppression. Configurable via policy.json supply_chain.allowed_scopes.
E11 — workflow-injection scanner (scanners/workflow-scanner.mjs). Scans .github/workflows/*.{yml,yaml} and .forgejo/workflows/*.{yml,yaml} for dangerous expression interpolations inside run: step blocks. 23-field canonical blacklist (GHSL Security Lab 17 + GlueStack-class 6) targeting attacker-controlled fields. Sink-restricted: only run: steps are shell sinks; if:, with:, env:, name:, runs-on: are evaluated by the runner's expression engine, not the shell, and are suppressed. Severity matrix: privileged triggers → HIGH; semi-privileged → MEDIUM; safe fields (numeric / hex / fixed-string) → INFO. State machine extracted to scanners/lib/workflow-yaml-state.mjs for unit-level testability. Re-interpolation tracking — env-block bindings sourced from blacklisted fields, then read back inside run:, are flagged at MEDIUM as the Appsmith GHSL-2024-277 stealth pattern. Auth-bypass detection — (github|forgejo).actor compared against bot identities in if: conditions flagged at MEDIUM (Synacktiv 2023 Dependabot spoofing class). New WFL prefix in scanners/lib/severity.mjs OWASP map. Registered in scanners/scan-orchestrator.mjs.
E14 — MCP cumulative-drift baseline in scanners/lib/mcp-description-cache.mjs. Sticky baseline slot per tool plus a 10-event rolling history array (FIFO). Cumulative drift = levenshtein(current, baseline.description) / max(|current|, |baseline|); when ratio ≥ mcp.cumulative_drift_threshold (default 0.25), post-mcp-verify.mjs emits a MEDIUM mcp-cumulative-drift advisory independent of the existing per-update >10% drift signal — both fire independently. Slow-burn rug-pulls that keep each update under the per-update threshold but cumulatively diverge from baseline are now caught. Baseline survives the 7-day TTL purge so detection persists across the full window. New /security mcp-baseline-reset slash command (plus scanners/mcp-baseline-reset.mjs CLI: --list, --target <tool>, or no-args clear-all) lets the user acknowledge a legitimate MCP server upgrade. New LLM_SECURITY_MCP_CACHE_FILE env var overrides the cache path for end-to-end testing without polluting the user's real ~/.cache/llm-security/mcp-descriptions.json. Migration logic in loadCache() seeds baseline from existing entries on first read post-upgrade.
8.7 — env-var deprecation warnings in scanners/lib/policy-loader.mjs. New getPolicyValueWithEnvWarn(section, key, envVarName, defaultValue) helper. Env-var still wins per existing Preferences, but when BOTH the env-var AND the policy.json key are explicitly set, the helper emits a single per-process stderr deprecation line pointing to v8.0.0 removal. Module-scoped Set dedupes per env-var name across call-sites. DEFAULT_POLICY gains trifecta.escalation_window: 5 (closes the gap where LLM_SECURITY_ESCALATION_WINDOW had no policy.json equivalent). Wired through 4 hook call-sites: pre-prompt-inject-scan, post-session-guard (×2), and audit-trail. Env-only vars without policy.json equivalents are unchanged.

Changed

8.10 — CLAUDE.md hooks count corrected from ## Hooks (8) to ## Hooks (9). Adds pre-compact-scan.mjs row to the hooks table (PreCompact — transcript scan before context compaction). The hook itself shipped in v6.2.0 but the count and table row drifted. New Hooks count consistency describe block in tests/lib/doc-consistency.test.mjs parses hooks/hooks.json, reads the CLAUDE.md ## Hooks (\d+) header and the table rows, and asserts all three counts agree — locks in the fix and prevents future drift.

Documentation

8.4 — riskScoreV1 annotated @deprecated in scanners/lib/severity.mjs. JSDoc explicitly tags v7.0.0 as the introduction of the v2 model and v8.0.0 as the removal target for v1, so library consumers see the deprecation in IDE tooling and not just in release notes. The function remains exported and functional for users who relied on it.
8.6 — sandbox-architecture rationale in docs/security-hardening-guide.md §7. Documents why lib/git-clone.mjs and lib/vsix-sandbox.mjs remain separate rather than being collapsed into a single shared sandbox helper. Brief Preferences explicitly rejected the consolidation as premature abstraction over safety-critical code; the rationale is recorded so future maintainers see the deliberate decision.

Tests

1665+ → 1777 (Wave A-D cumulative; ~+112 tests). Includes new files (tests/scanners/bash-normalize-t7-t9.test.mjs, tests/lib/git-clone-gitattributes.test.mjs, tests/scanners/workflow-scanner.test.mjs, tests/lib/workflow-yaml-state.test.mjs, tests/scanners/mcp-baseline-reset.test.mjs) plus extensions to tests/lib/injection-patterns.test.mjs, tests/hooks/pre-bash-destructive.test.mjs, tests/hooks/pre-install-supply-chain.test.mjs, tests/scanners/scan-orchestrator.test.mjs, tests/lib/mcp-description-cache.test.mjs, tests/hooks/post-mcp-verify.test.mjs, tests/lib/severity.test.mjs, tests/lib/policy-loader.test.mjs, tests/lib/doc-consistency.test.mjs. One pre-existing size-cap timing flake at tests/hooks/pre-compact-scan.test.mjs passes in isolation, fails sporadically under full-suite load — unchanged across Wave A-D, not a Batch C blocker.

Notes

Wave E deferred (red-team coverage). The plan called for 9 new attack-simulator scenarios covering every Wave A-D defense. The work was deferred from v7.3.0 because two of the scenarios test scanners (workflow-scanner, git-clone scanGitAttributes) that don't fit the existing hook-spawn model used by attack-simulator and would have required a new scanner_test execution mode. Tracked for v7.3.1. Defenses are unit-tested per wave; this is regression coverage on top of unit coverage, not the primary safety net.
Hooks runtime behavior unchanged for existing setups. Every Wave A-D addition is either purely additive (new advisories at MEDIUM) or layered before existing detection (T7/T9 normalize before existing destructive-name matching; rot13 inside the existing decoder loop; cumulative-drift independent of per-update drift). Users who set neither the new policy.json keys nor the new env-vars see identical behavior.

[7.2.0] - 2026-04-29

Batch B release. Closes the remaining critical-review B-tier scanner defects (B3, B5, B6, B7), lands the v7.2.0 evasion-arsenal hardening patches (E1, E4, E5, E7, E15, E16, E17, E18), unifies the v1→v2 risk-score formula across documentation surfaces, and ships 8 new red-team scenarios (64 → 72) plus a polyglot fixture for the entropy two-stage pipeline.

Added

B6 destructuring/spread taint propagation (scanners/taint-tracer.mjs). extractAssignedVariable now recognises const { secret: userInput } = req.body and const [input, ...rest] = process.argv — destructured and spread bindings carry their tainted source into downstream usage. extractAssignedVariable exported for direct unit testing. +19 tests.
B7 token-overlap typosquat fallback (scanners/lib/string-utils.mjs, scanners/dep-auditor.mjs, scanners/supply-chain-recheck.mjs). New tokenize / tokenOverlap helpers + TYPOSQUAT_SUSPICIOUS_TOKENS list catch typosquats that Levenshtein distance misses (e.g. chalk-color-utility vs chalk). +21 tests.
E15 .claude/agents/*.md memory-poisoning glob (scanners/memory-poisoning-scanner.mjs). Agent definitions are now scanned alongside CLAUDE.md and rules. New fixture + +3 tests.
E1 hidden-Unicode coverage extended to PUA-A and PUA-B (scanners/lib/string-utils.mjs). containsUnicodeTags now flags U+F0000–U+FFFFD (Supplementary Private Use Area-A) and U+100000–U+10FFFD (Supplementary Private Use Area-B) in addition to the U+E0000 Tag block. PUA characters do not decode to ASCII (they have no standard mapping) but their presence is suspicious enough to emit a HIGH advisory. +21 tests.
E16 homoglyph fold before pattern matching (scanners/lib/string-utils.mjs, scanners/lib/injection-patterns.mjs). New foldHomoglyphs (NFKC + targeted Cyrillic/Greek → Latin map) runs before every pattern match in scanForInjection. Attacks like ignоre previous instructions (Cyrillic о) now trigger the same CRITICAL pattern as the Latin form. ASCII fast-path keeps the helper zero-cost on plain text. +27 tests.
E17 configurable escalation window + 20-call MEDIUM advisory (hooks/scripts/post-session-guard.mjs). The LLM_SECURITY_ESCALATION_WINDOW env-var now overrides the primary escalation-after-input window (default 5). A secondary 20-call MEDIUM advisory catches slow-burn variants outside the primary window. +5 tests.
E4 markdown link-title injection scan (hooks/scripts/post-mcp-verify.mjs). Every [text](url "title") title is HTML-entity-decoded and run through scanForInjection. Bypassed the existing HTML-tag-gated checks pre-E4. +3 tests.
E5 SVG <desc> / <title> / <metadata> / <foreignObject> extractor (hooks/scripts/post-mcp-verify.mjs). Adversarial text inside SVG containers is invisible in the rendered image but parsed by an agent reading the source. +3 tests.
E7 generalized HTML comment scan (hooks/scripts/post-mcp-verify.mjs). Pre-E7 the  keyword-restricted CRITICAL pattern fired only on marked comments. Now every  body is decoded and scanned. The keyword pattern still fires (defense-in-depth). +3 tests.
8 new red-team scenarios (knowledge/attack-scenarios.json). UNI-007/008 (E1 PUA-A/PUA-B), UNI-009 (E16 Greek-Latin homoglyph fold blocks), MCP-005 (E4), MCP-006/007 (E5 desc/foreignObject), MCP-008 (E7), TRI-004 (E17 escalation-after-input). attack-simulator.mjs baseline: 64 → 72, 100 % pass.

Changed

B5 entropy two-stage pipeline (scanners/entropy-scanner.mjs). New classifyFileContext(absPath, lines) returns 'shader-dominant' | 'markup-dominant' | 'code-dominant' | 'mixed', keyed off file extension with a content-density fallback for code-extension files (≥50 % sampled lines matching GLSL/inline-markup → downgrade to mixed). isFalsePositive now accepts the context and gates rules 11-13 (GLSL / CSS-in-JS / inline-markup line-proximity) on context !== 'code-dominant'. Polyglot .ts files with embedded GLSL blocks no longer suppress credentials adjacent to shader keywords (the v7.0.0 false-negative class). Conservative defaults preserve existing rule-11 / 12 / 13 behaviour for the single-line .js / .jsx test fixtures. New fixture tests/fixtures/entropy/polyglot-ts-with-glsl.ts. +3 tests.
E18 entropy rule 18 — markdown-image CDN-aware + secret pre-check (scanners/entropy-scanner.mjs). Pre-E18, every ![…](https?://…) line was suppressed regardless of host or query. Now suppression requires (host matches MARKDOWN_IMAGE_CDN_HOSTS allowlist) AND (no secret-shaped token in query). Non-CDN hosts and CDN hosts carrying ?token=… / ?api_key=… / AWS / GitHub / npm prefixes fall through to entropy classification. +4 tests.
v1 → v2 risk-formula constants unified across docs (commands/scan.md, commands/audit.md, agents/mcp-scanner-agent.md, agents/posture-assessor-agent.md). The four files referenced the legacy v1 score >= 61 / score >= 21 / Critical × 25 constants; authoritative implementation in scanners/lib/severity.mjs has been v2 (BLOCK ≥65, WARNING ≥15, severity-dominated log-scaled tiers) since v7.0.0. tests/lib/doc-consistency.test.mjs adds a guard so these surfaces cannot drift back. +28 tests.

Documentation

B3 info severity is scoring-inert (scanners/lib/severity.mjs JSDoc, CLAUDE.md). Documents the long-standing implementation: info findings appear in OWASP aggregates but contribute zero to risk_score, verdict, and riskBand. +1 anchor test.

Tests

1522 → 1665+ (Wave 1 +29, Wave 2 +43, Wave 3 +53, Wave 4 +9, Wave 5 +7, Wave 6 attack scenarios). All green except the documented pre-compact-scan perf-flake (passes 6/6 in isolation, fluctuates around the 500 ms ceiling under full-suite parallelism). attack-simulator: 64 → 72 scenarios, 100 % pass.

Notes

E15 (.claude/agents/*.md glob) and E18 (entropy rule 18 CDN allowlist) are scanner-only — they have unit / integration coverage in their respective scanner test files and no attack-simulator.mjs scenario.

[7.1.1] - 2026-04-29

Patch release. Closes the narrative-coherence gap that survived v7.0.0: the severity-dominated risk score corrected the numbers, but the agent prompt continued to emit raw signals and walk them back as "false positive" in prose, producing whiplash in the rendered report. v7.1.1 makes severity assignment context-first at the prompt level and adds a structural counter for suppressed signals.

Fixed

Agent prompt context-first severity (agents/skill-scanner-agent.md). New Step 2.5 mandates that every signal has exactly one disposition — suppressed (counted only) or reported (full finding) — with the split happening before severity is assigned. The phrases "false positive", "legitimate framework", and "no action required" are forbidden in finding-body text and reserved for the new ## Suppressed Signals section. Verdict Logic section was also updated to reference v2 tiers and cutoffs from severity.mjs (BLOCK ≥65, WARNING ≥15) — replaces the stale v1 sum-and-cap formula that had been left in place after the v7.0.0 numeric overhaul.
Template v1 → v2 risk constants (templates/unified-report.md). HTML-comment header at lines 55-66 now describes the v2 tiers and cutoffs the engine has been using since v7.0.0. Adds an ### Narrative Audit block inside Executive Summary surfacing summary.narrative_audit.suppressed_findings.{count, by_category} for reviewer transparency. The block does NOT affect verdict computation.

Added

tests/scanners/skill-scanner-narrative.test.mjs — 11 assertions against tests/fixtures/skill-scan/hyperframes-like/. Covers deterministic content-extractor (exactly 1 HIGH HITL trap, ≥ 2 framework env-var refs, has_injection true on any signal, has_critical_injection false), entropy scanner (calibration block present, ≤ 1 finding after suppression), inline co-monotonicity guard ({ high: 1 } → WARNING / High), and prompt-contract static assertions on agents/skill-scanner-agent.md and templates/unified-report.md.
tests/fixtures/skill-scan/hyperframes-like/ — synthetic skill with HTML5 canvas / CSS keyframes / inline SVG data URI noise plus exactly one genuine HITL trap signal. Committed (not gitignored). .llm-security-ignore uses the canonical SCANNER:glob format (ENT:**/*.md).

Tests

1511 → 1522 tests (adds 11 new). Co-monotonicity sweep at tests/lib/severity.test.mjs:252-303 unchanged and green.

Why

Hyperframes.com re-test on 2026-04-19 produced risk_score 20 / WARNING / 1 HIGH numerically (correct after v7.0.0) but the agent listed 8 findings in prose and walked 6 back as "false positive". v7.1.1 closes the structural gap that allowed this: severity is assigned ONCE, context-first, and suppressed signals are categorical telemetry rather than free-text walk-backs.

Out of scope (flagged for Batch B)

commands/scan.md:113-114 retains the v1 risk formula and acts as a third source of truth alongside agent prompt and severity.mjs. Will be unified in v7.2.0.

[7.1.0] - 2026-04-29

Patch release closing the highest-impact items from the v7.0.0 adversarial review (docs/critical-review-2026-04-20.md, grade B-). Bug-fixes plus an honesty-sweep on documentation language. No new features and no behavioral changes outside the listed fixes.

Fixed

Pathguard regex hole — .env.*.*.* could be written without blocking (hooks/scripts/pre-write-pathguard.mjs). The old ENV_PATTERNS only matched a single dotted segment after .env, so .env.production.local.backup, .env.prod.local.bak, etc. slipped through. Replaced with /[\\/]\.env(\.[A-Za-z0-9._-]+)*$/ covering arbitrary multi-segment suffixes. .envrc continues to be allowed. Commit 751f119. (Critical-review B1.)
Distributed trifecta in BLOCK mode only warned (hooks/scripts/post-session-guard.mjs). The previous block-gate required both LLM_SECURITY_TRIFECTA_MODE=block and a "concentrated" or "sensitive-path" qualifier, so a trifecta whose three legs landed on different MCP servers without a sensitive path was advisory-only. Removed the AND-gate; block mode now blocks any detected trifecta. Commit 36be963. (Critical-review B2.)
JSDoc/CHANGELOG arithmetic for riskScore({critical: 4}) (scanners/lib/severity.mjs:23, CHANGELOG.md v7.0.0 tier description). The actual computation has always been 70 + log2(5)*10 = 93.22 → round → 93; only the docs said 90. Fixed; pin test added. (Critical-review B4.)

Changed

Honesty-sweep on documentation language (CLAUDE.md, commands/ide-scan.md, knowledge/mitigation-matrix.md, docs/security-hardening-guide.md). Critical-review §9 flagged a set of overclaim phrasings; rewritten while preserving accurate underlying claims:
- "Trustworthy scoring (BREAKING)" → "Severity-dominated risk scoring (v2 model, BREAKING)"
- "Context-aware entropy scanner" → "Rule-based entropy scanner with file-extension skip, 8 line-level suppression rules, and configurable policy"
- "1487 tests" → "1511 unit and integration tests; mutation-testing coverage not published"
- "Fully Schrems II compatible" → "Schrems II compatible in default offline mode. Optional OSV.dev enrichment is a separate compliance consideration"
- "Rule of Two enforcement" → "Rule of Two detection (configurable; default warn; blocks on high-confidence trifectas in opt-in block mode)"
- "Hardened ZIP extractor" → suffix " — no fuzz-testing results published to date"
- "defense-in-depth" → preserved, but quantified in docs/security-hardening-guide.md §4: "three independent detection layers with documented bypass classes"
CaMeL claims toned down (hooks/scripts/post-session-guard.mjs:646, CLAUDE.md:184). Implementation is opportunistic byte-matching of truncated output fingerprints (first 200 bytes, SHA-256/16-hex tag) — trivially bypassed by mutation, summarisation, or re-encoding. Renamed framing from "CaMeL-inspired data-flow tagging (SHA-256 provenance tracking)" to "output fingerprint matching (inspired by CaMeL but not a CaMeL capability-tracking implementation)". (Critical-review B8.)
Plugin version: 7.0.0 → 7.1.0 across package.json, .claude-plugin/plugin.json, scanners/ide-extension-scanner.mjs (VERSION), README badge, CLAUDE.md header, marketplace root README. Test count 1487 → 1511 in marketplace root README.

Tests

+8 tests for B1 pathguard (tests/hooks/pre-write-pathguard.test.mjs): 6 multi-segment BLOCK + 1 .envrc ALLOW + 1 sentinel.
+1 test for B2 distributed trifecta (tests/hooks/post-session-guard.test.mjs): three legs from different sources blocked under block mode.
+15 sweep tests + 1 anchor test for verdict/riskBand co-monotonicity (tests/lib/severity.test.mjs): asserts (verdict, riskBand) agree under v7.0.0 contract for representative count vectors. Catches future drift between scoring tiers, verdict cutoffs, and riskBand cutoffs. Anchor test pins riskScore({critical: 4}) === 93 so doc/code drift fails loudly.
Total: 1511 tests (was 1487). All green.

Why

Pathguard and trifecta-block bugs were live security holes — both fixed at the hook level so users on the default install get the fix automatically.
The honesty-sweep is a deliberate response to the critical-review CISO-perspective (§F): "Would a CISO install this?" — overclaim language was identified as a blocker for regulated environments. Toning it down does not weaken the actual defenses; it lets users trust the documentation.

[7.0.0] - 2026-04-19

BREAKING CHANGES

Risk-score formula rewritten (scanners/lib/severity.mjs). The v1 sum-and-cap formula (critical*25 + high*10 + medium*4 + low*1, capped at 100) collapsed every non-trivial scan to 100/Extreme regardless of actual risk distribution. v2 is severity-dominated and log-scaled within tier:
- Critical present → 70–95 (1=80, 2=86, 4=93, 10=95)
- High only → 40–65 (1=48, 5=60, 17=65)
- Medium only → 15–35 (1=20, 5=28, 50=33)
- Low only → 1–11 (1=4, 10=11)
- None → 0 Verdict cutoffs realigned to new bands: BLOCK if critical ≥1 or score ≥65, WARNING if high ≥1 or score ≥15. Legacy v1 formula kept as riskScoreV1() for reference only. CI pipelines with --fail-on thresholds may need recalibration — see docs/security-hardening-guide.md §6.
Verdict/band cutoffs aligned for co-monotonicity. Old cutoffs (BLOCK ≥61, WARNING ≥21) could produce "BLOCK / Medium band" or "ALLOW / High band" contradictions. New cutoffs (65, 15) are locked to the v2 riskBand() boundaries.

Added

Context-aware entropy scanner (scanners/entropy-scanner.mjs). Skip-lists and line-level rules drastically reduce false positives in shader/CSS/HTML/SQL-heavy codebases:
- File-extension skip: .glsl, .frag, .vert, .shader, .wgsl, .css, .scss, .sass, .less, .svg + compound .min.js, .min.css, .map
- Line-level rules 11–18 in isFalsePositive(): GLSL keywords (uniform, vec3, texture2D...), CSS-in-JS templates (styled.), inline <svg> markup, ffmpeg filter_complex syntax, browser User-Agent strings, SQL DDL on dedicated lines (^\s*(SELECT|INSERT|UPDATE|DELETE|CREATE|...)), throw new Error(\…`) templates, markdown image syntax with external URLs (` — common in JSON content indexes)
- Scanner envelope gains calibration block: files_skipped_by_extension, files_skipped_by_path, effective thresholds, and policy_source ('default' | 'policy.json')
Policy-driven entropy configuration — .llm-security/policy.json entropy section accepts:
- thresholds.{critical,high,medium}.{entropy,minLen} — override defaults per project
- suppress_extensions: string[] — additional file extensions to skip
- suppress_line_patterns: string[] — user-defined regexes for line suppression
- suppress_paths: string[] — substring match against relPath to skip entire paths (e.g., "vendored/")
DEP typosquat allowlist expansion (knowledge/typosquat-allowlist.json). 22 npm + 5 PyPI entries for short-name modern tools that tripped Levenshtein detection on nearly every real codebase:
- npm: knip, oxlint, tsx, nx, rimraf, glob, tar, zod, ky, ow, esm, ip, qs, url, prettier, vitest, vite, rollup, swc, turbo, bun, deno
- PyPI: uv, ruff, rich, typer, anyio
Synthesizer "Scan Calibration" section (agents/deep-scan-synthesizer-agent.md). Heuristic: omit if <5% files skipped, flag prominently if >80% skipped by path (signals over-aggressive user policy). Agent instructed to NEVER override scanner verdict with narrative opinion.
26 new unit tests (tests/scanners/entropy-context.test.mjs): A. File-extension skip (4), B. Line-level rules 11–18 (10), C. Policy overrides (3); plus expanded tests/lib/severity.test.mjs with v2 scoring/band/verdict tables (70 tests total, was 52). Total: 1487 tests (was 1461).

Changed

tests/lib/output.test.mjs:243 — "1 critical = score 80" under v2 (was 25 under v1).
scanners/lib/file-discovery.mjs — TEXT_EXTENSIONS now includes .sass and GPU shader source extensions (.glsl, .frag, .vert, .shader, .wgsl) so these files are discovered and explicitly counted as skipped by the entropy scanner instead of invisibly filtered out.
Plugin version: 6.6.0 → 7.0.0 across package.json, .claude-plugin/plugin.json, scanners/ide-extension-scanner.mjs (VERSION), README badge, CLAUDE.md header, marketplace root README.

Why

Real-world scan on hyperframes.com produced BLOCK / Extreme / 100 with ~70% noise (shader strings, CSS gradients, bundled JS, Levenshtein false positives). A scanner that cries "extreme" on every project destroys its own credibility — users learn to ignore findings, so genuine threats slip past.
Trustworthiness comes from calibration, not from detecting everything. v7.0.0 accepts that some detection heuristics are noisy in context (entropy on shaders, typosquat on 2–3 letter tool names) and gives users both built-in suppression and policy-driven override controls.
Verdict/score/band co-monotonicity fixed. A user can now correctly reason: "HIGH band → WARNING verdict" without reading the source. The v1 cutoffs allowed a mid-High score (42) to produce ALLOW and a low-Medium score (22) to produce WARNING.

[6.6.0] - 2026-04-18

Added

JetBrains/IntelliJ plugin scanning. /security ide-scan extends beyond VS Code forks to cover the JetBrains IDE family: IntelliJ IDEA, PyCharm, GoLand, WebStorm, RubyMine, PhpStorm, CLion, DataGrip, RustRover, Rider, Aqua, Writerside, Android Studio. Fleet and Toolbox are intentionally excluded (different plugin model, out of scope)
OS-aware JetBrains plugin discovery in lib/ide-extension-discovery.mjs — macOS ~/Library/Application Support/JetBrains/<IDE><version>/plugins/, Windows %APPDATA%\JetBrains\..., Linux ~/.config/JetBrains/.... Regex excludes Fleet/Toolbox
Zero-dep META-INF/plugin.xml + META-INF/MANIFEST.MF parsers in lib/ide-extension-parser-jb.mjs with nested-jar extraction for the common <plugin-root>/lib/*.jar → META-INF/plugin.xml layout
7 JetBrains-specific checks in runJetBrainsChecks: checkThemeWithCodeJB, checkBroadActivationJB (application-components), checkPremainClassJB (HIGH — javaagent retransform), checkNativeBinariesJB, checkDependsChainJB (long mandatory <depends> = supply-chain pressure), checkTyposquatJB (Levenshtein vs top JetBrains plugins), checkShadedJarsJB (advisory — many bundled jars)
JetBrains Marketplace URL fetch. Supports https://plugins.jetbrains.com/plugin/<numericId>-<slug> (metadata resolves numericId → xmlId, then downloads) and https://plugins.jetbrains.com/plugin/download?pluginId=<xmlId>[&version=<v>] (direct download). Host allowlist: plugins.jetbrains.com only
fetchJetBrainsPlugin in lib/vsix-fetch.mjs with the same safety envelope as VSIX fetch (50 MB cap, 30 s timeout, SHA-256, manual redirect host whitelist)
lib/jetbrains-fetch-worker.mjs — sub-process worker mirroring the VSIX worker's JSON-line IPC. Shares the sandbox primitives through parameterized buildSandboxedWorker(dirs, workerPath)
.kt, .groovy, .scala added to scanners/taint-tracer.mjs CODE_EXTENSIONS so Kotlin/Groovy/Scala plugin sources are covered by taint analysis
Knowledge additions: knowledge/jetbrains-marketplace-api-notes.md, expanded knowledge/ide-extension-threat-patterns.md with JetBrains sections, seeded knowledge/top-jetbrains-plugins.json (no longer a stub) with loadJetBrainsBlocklist helper
8 new test files / suites covering JetBrains data, parsers, discovery, checks, URL fetch (unit + integration), end-to-end scan against a real JetBrains-layout fixture tree, plus a deterministic fixture-jar builder (tests/helpers/build-jetbrains-fixtures.mjs) that produces byte-identical reproducible jars. Total: 1461 tests (was 1352)

Changed

buildSandboxedWorker(dirs) → buildSandboxedWorker(dirs, workerPath) — parameterized so the same sandbox wrapper is reused for VSIX and JetBrains workers instead of copying the primitives a third time
/security ide-scan command description updated to reflect the JetBrains branch; "JetBrains is a v1.1 stub" wording removed
CLAUDE.md and plugin README updated: scanner bullet rewritten to document the JetBrains branch, the seven JB-specific checks, and the new knowledge files
Plugin version: 6.5.0 → 6.6.0 across package.json, .claude-plugin/plugin.json, scanners/ide-extension-scanner.mjs (VERSION), README badge, CLAUDE.md header, marketplace root README
tests/scanners/git.test.mjs — loosened findings.length caps (were too tight for organic repo growth; baseline already exceeded them)

Why

Parity with the VS Code branch: organizations running IntelliJ-family IDEs get the same pre-install and installed-plugin coverage Koi-style supply-chain attacks now target across both platforms
Reuse of lib/vsix-sandbox.mjs honors the user-memory rule "don't copy a third sandbox" — one set of primitives, two workers, same kernel-enforced FS confinement
JetBrains-specific checks target the platform's real attack surface: Premain-Class javaagents (class retransform at JVM startup), application-components (global lifecycle hooks), nested-jar shading (dependency opacity), and typosquat on com.intellij.* / org.jetbrains.* namespaces

[6.5.0] - 2026-04-17

Added

OS sandbox for /security ide-scan <url>. VSIX fetch + extract now runs in a sub-process wrapped by sandbox-exec (macOS) or bwrap (Linux), reusing the same primitives proven by the git clone sandbox introduced in v5.1. Defense-in-depth: even if zip-extract.mjs has an undiscovered bypass, the kernel refuses any write outside the per-scan temp directory
scanners/lib/vsix-fetch-worker.mjs — Sub-process worker. Argv: --url <url> --tmpdir <writable-dir>. Emits a single JSON line on stdout ({ok, sha256, size, finalUrl, source, extRoot} or {ok:false, error, code?}). Exit 0 on success, 1 on failure. Silent on stderr
scanners/lib/vsix-sandbox.mjs — Wrapper. Exports buildSandboxProfile, buildBwrapArgs, buildSandboxedWorker(tmpDir, args), runVsixWorker(url, tmpDir, opts). 35 s timeout, 1 MB stdout cap, deterministic JSON-line protocol
scan(url, { useSandbox }) option. Default true for CLI invocations; tests pass false to keep globalThis.fetch mocking working (mocks do not cross process boundaries). When sandbox unavailable on the platform (e.g., Windows), a warning is added to meta.warnings and the scan still completes via the in-process fallback
meta.source.sandbox — New envelope field: 'sandbox-exec' | 'bwrap' | 'none' | 'in-process'. Tells the report which protection layer was actually active
8 new tests in tests/scanners/vsix-sandbox.test.mjs covering profile generation per platform, worker arg construction, and live worker exit behavior on invalid URLs (no network required)

Changed

fetchAndExtractVsixUrl in ide-extension-scanner.mjs is now sandbox-aware (useSandbox option, default true). Existing in-process logic preserved as fallback path
Version bump: 6.4.0 → 6.5.0 across all files

Why

Aligns the IDE-scan URL pipeline with the same defense-in-depth posture as the GitHub clone pipeline — kernel-enforced FS confinement instead of in-process validation alone
VSIX is untrusted bytes from a third-party registry; even with hardened parsing, an OS sandbox is the right blast-radius constraint for filesystem writes

[6.4.0] - 2026-04-17

Added

/security ide-scan <url> — pre-install verification. The IDE extension scanner now accepts URLs as targets and fetches the VSIX before scanning. Supported sources:
- VS Code Marketplace: https://marketplace.visualstudio.com/items?itemName=publisher.name
- OpenVSX: https://open-vsx.org/extension/publisher/name[/version]
- Direct VSIX download: https://example.com/path/foo.vsix (HTTPS only)
scanners/lib/vsix-fetch.mjs — HTTPS-only fetcher with 50 MB compressed cap, 30 s total timeout, SHA-256 streamed during download, manual redirect handling with per-source host whitelist (Marketplace gallerycdn, OpenVSX blob storage). No npm dependencies — uses Node 18+ fetch
scanners/lib/zip-extract.mjs — Zero-dependency ZIP parser + safe extractor. Rejects: zip-slip via .. paths, POSIX absolute paths, Windows drive letters, NUL bytes, encrypted entries, ZIP64, multi-disk archives, unsupported compression methods, symlink entries (Unix 0xA000 mode bits in external_attr). Caps: 10 000 entries, 500 MB uncompressed total, 100× expansion ratio (sum-uncomp / sum-comp), depth 20. STORE + DEFLATE only
Envelope meta.source — When invoked with a URL, the scan envelope's meta.source field carries { type: "url", kind, url, finalUrl, sha256, size, publisher, name, version, requestedUrl } so reports can attribute findings to the upstream artifact
knowledge/marketplace-api-notes.md — Reference notes for the (undocumented but stable) Marketplace direct-download endpoint and the (officially documented) OpenVSX endpoints used by vsix-fetch.mjs
48 new tests across tests/scanners/zip-extract.test.mjs (validateEntryName / isSymlink / extractToDir happy + adversarial), tests/scanners/vsix-fetch.test.mjs (detectUrlType / isAllowedHost / readBodyCapped), tests/scanners/ide-extension-url.test.mjs (URL flow integration with global.fetch mock — Marketplace, OpenVSX, direct VSIX, malformed VSIX, zip-slip VSIX, network failure, unsupported URL, GitHub URL). 1344 tests total (was 1296). Test helper: tests/lib/build-zip.mjs builds adversarial ZIPs that real zip tools refuse to emit

Changed

scanners/ide-extension-scanner.mjs early-detects URL targets and routes through fetch + extract → temp dir → existing single-target scan path. Temp directory cleaned in try/finally regardless of success/error/abort
CLI help text in bin/llm-security.mjs and commands/ide-scan.md updated with URL examples and security model
Version bump: 6.3.0 → 6.4.0 across all files

Not supported (intentional)

GitHub repo URLs — would require npm install + vsce package build step. Use the Marketplace, OpenVSX, or a direct .vsix URL instead
VSIX .signature.p7s verification — deferred to v6.5.0 (requires X.509 / PKCS#7 parsing)
ZIP64 archives — real-world VSIX never approaches the 4 GB threshold

[6.3.0] - 2026-04-17

Added

IDE extension prescan — New /security ide-scan command and scanners/ide-extension-scanner.mjs (prefix IDE) discover and audit installed VS Code extensions across 6 roots (~/.vscode/extensions, ~/.vscode-insiders/extensions, ~/.cursor/extensions, ~/.windsurf/extensions, ~/.vscode-oss/extensions, ~/.vscode-server/extensions, plus Linux code-server). OS-aware discovery via scanners/lib/ide-extension-discovery.mjs. Manifest parsing via scanners/lib/ide-extension-parser.mjs. Data loading via scanners/lib/ide-extension-data.mjs. JetBrains discovery is a v1.1 stub.
7 IDE-specific detection categories — Blocklist match (CRITICAL), theme-with-code (HIGH, Material Theme pattern), sideload .vsix (HIGH unsigned / MEDIUM signed), broad activation * / onStartupFinished (MEDIUM/LOW, suppressed for top-100 exact matches), Levenshtein typosquat ≤2 vs top-100 (HIGH distance-1 / MEDIUM distance-2 against top-50), extension-pack expansion ≥3 (MEDIUM), dangerous vscode:uninstall hooks referencing child_process/curl/wget/rm/powershell (HIGH/LOW)
Per-extension scanner orchestration — Each discovered extension runs through UNI, ENT, NET, TNT, MEM, SCR scanners with bounded concurrency (default 4). MEM gets a filtered file list (README.md / CHANGELOG.md / package.json) to catch prompt-injection in marketplace-rendered text
New knowledge files — knowledge/ide-extension-threat-patterns.md (10 categories with 2024-2026 case studies from Koi Security — GlassWorm, WhiteCobra, TigerJack, Material Theme, VS Code Cryptojacking, MaliciousCorgi), knowledge/top-vscode-extensions.json (top ~100 Marketplace IDs + blocklist), knowledge/top-jetbrains-plugins.json (stub)
CLI integration — bin/llm-security.mjs gains ide-scan subcommand with passthrough flags
22 new tests in tests/scanners/ide-extension-scanner.test.mjs (fixtures under tests/fixtures/ide-extensions/). 1296 tests total (was 1274)

Changed

Version bump: 6.2.0 → 6.3.0 across all files

[6.2.0] - 2026-04-17

Added

Bash-normalize T5 + T6 — scanners/lib/bash-normalize.mjs now collapses ${IFS} word-splitting (T5) and ANSI-C hex quoting $'\xHH' (T6) before the denylist gate runs. Defense-in-depth layer complementing the Claude Code 2.1.98+ harness fixes. 4 new unit tests in tests/scanners/bash-normalize.test.mjs
PreCompact hook — hooks/scripts/pre-compact-scan.mjs scans the transcript tail (default 500 KB) for injection patterns before Claude Code compacts context. Prevents poisoned summaries from surviving into the next turn. Modes: block / warn / off via LLM_SECURITY_PRECOMPACT_MODE. 6 new tests in tests/hooks/pre-compact-scan.test.mjs. Brings total hooks to 9
Security hardening guide — docs/security-hardening-guide.md documents environment variables (CLAUDE_CODE_EFFORT_LEVEL, ENABLE_PROMPT_CACHING_1H, CLAUDE_CODE_SCRIPT_CAPS, all LLM_SECURITY_* modes), sandboxing (sandbox-exec / bwrap / fallback), T1-T6 normalization table, Opus 4.7 system card §5.2.1 + §6.3.1.1 alignment, baseline production recommendations

Changed

Agent refactor for Opus 4.7 literal instruction following — agents/skill-scanner-agent.md and agents/mcp-scanner-agent.md reframe stacked CANNOT/MUST NOT imperatives in favor of tool-level enforcement via tools: frontmatter. New Step 0 "Generaliseringsgrense" blocks (cite evidence path:line, mark speculation as speculation) and "Parallell Read-strategi" notes (prefer parallel Read calls for independent file reads)
Defense Philosophy linked to Opus 4.7 system card — CLAUDE.md §Defense Philosophy now cites Opus 4.7 system card §5.2.1 (multi-layer defenses) and §6.3.1.1 (instruction hierarchy → tool-level enforcement)
Version bump: 6.1.0 → 6.2.0 across all files

[6.1.0] - 2026-04-10

Added

--fail-on <severity> flag — CI-friendly exit codes: exit 1 when any finding at or above the specified severity exists (critical/high/medium/low). Configurable via policy.json ci.failOn
--compact output mode — One-liner per finding format ([SEVERITY] scanner: title (file:line)), reduces CI log noise. Configurable via policy.json ci.compact
CI/CD pipeline templates — Ready-to-use templates in ci/: GitHub Actions (github-action.yml), Azure DevOps (azure-pipelines.yml), GitLab CI (gitlab-ci.yml) with SARIF upload, Node 18 setup
CI/CD integration guide — docs/ci-cd-guide.md with 5-minute setup per platform, Schrems II/NSM compliance documentation, exit code reference
npm publish preparation — files whitelist in package.json (only bin/ + scanners/), .npmignore safety net, homepage field
Policy ci section — New ci: { failOn, compact } section in .llm-security/policy.json for distributable CI configuration

Changed

Version bump: 6.0.0 → 6.1.0 across all files

[6.0.0] - 2026-04-10

Added

Compliance mapping — knowledge/compliance-mapping.md maps plugin capabilities to EU AI Act (Art. 9, 15, 17), NIST AI RMF (Map, Measure, Manage, Govern), ISO 42001 (Annex A), and MITRE ATLAS techniques (AML.T IDs)
Norwegian regulatory context — knowledge/norwegian-context.md covers Datatilsynet (DPIA for AI), NSM (basic security principles), and Digitaliseringsdirektoratet guidance
SARIF 2.1.0 output — scanners/lib/sarif-formatter.mjs converts scan output to OASIS SARIF standard format. Use --format sarif with scan/deep-scan commands
Structured audit trail — scanners/lib/audit-trail.mjs writes JSONL audit events with ISO 8601 timestamps, OWASP category tags, and SIEM-ready schema. Configurable via LLM_SECURITY_AUDIT_* env vars
AI-BOM generator — scanners/ai-bom-generator.mjs + scanners/lib/bom-builder.mjs produce CycloneDX 1.6 Bills of Materials for AI components (models, MCP servers, plugins, knowledge, hooks)
Policy-as-code — scanners/lib/policy-loader.mjs reads .llm-security/policy.json for distributable hook configuration. Integrated into all 8 hooks. Env vars always take precedence
Standalone CLI — bin/llm-security.mjs provides npx llm-security entry point. Subcommands: scan, deep-scan, posture, audit-bom, benchmark
Posture compliance categories — 3 new posture categories (14: EU AI Act, 15: NIST AI RMF, 16: ISO 42001). Advisory only — do not affect Grade A threshold
Attack simulator benchmark mode — --benchmark flag outputs structured pass/fail metrics for CI integration

Changed

Version bump: 5.1.0 → 6.0.0 across all files
Knowledge base expanded from 13 to 15 files
Scanner count: 15 → 16 (AI-BOM generator added)
Posture scanner: 13 → 16 categories
All hooks now read policy from .llm-security/policy.json (backward-compatible — defaults match hardcoded values)

[5.1.0] - 2026-04-07

Added

Sandboxed remote cloning — git clone for remote scans is now hardened with two defense layers:
1. Git config flags: core.hooksPath=/dev/null, core.symlinks=false, core.fsmonitor=false, all LFS filter drivers disabled, protocol.file.allow=never, transfer.fsckObjects=true. Environment: GIT_CONFIG_NOSYSTEM=1, GIT_CONFIG_GLOBAL=/dev/null, GIT_ATTR_NOSYSTEM=1, GIT_TERMINAL_PROMPT=0
2. OS-level filesystem sandbox: macOS sandbox-exec and Linux bubblewrap (bwrap) restrict file writes to only the specific temp directory. Even if .gitattributes filter drivers bypass git config, they cannot write outside the clone dir. bwrap probe-tests availability before use (graceful fallback on Ubuntu 24.04+ where AppArmor blocks it). Graceful fallback on Windows (git config flags only, WARN logged)
Post-clone size check — Repos exceeding 100MB after clone are rejected and cleaned up
UUID-unique evidence filenames — fs-utils.mjs tmppath now generates unique filenames with crypto.randomUUID() suffix, preventing race conditions between concurrent scans
Evidence file cleanup — scan.md and plugin-audit.md now clean up evidence files (content-extract, plugin-extract) after scanning
Cleanup guarantee — Both scan.md and plugin-audit.md have explicit cleanup guarantee: temp dir + evidence file are removed even if scan fails or errors

Changed

scanners/lib/git-clone.mjs — complete rewrite of clone command with sandbox wrapping
scanners/lib/fs-utils.mjs — tmppath uses crypto.randomUUID() for unique names

[5.0.0] - 2026-04-06

Added

Prompt Injection Hardening (v5.0) — 8-session defense-in-depth overhaul driven by 7 research papers (2025-2026). Defense philosophy: broader detection + increased attack cost + longer monitoring windows + architectural constraints + honest documentation
MEDIUM advisory wiring — pre-prompt-inject-scan.mjs emits advisory for MEDIUM-severity obfuscation signals (leetspeak, homoglyphs, zero-width, multi-language). Never blocks. post-mcp-verify.mjs includes MEDIUM in injection scan advisory
Unicode Tag steganography — string-utils.mjs decodes U+E0001-E007F (invisible ASCII encoding). CRITICAL if decoded content matches injection patterns, HIGH for bare presence. Integrated into normalizeForScan() pipeline
BIDI override stripping — Removes directional override characters before injection scanning
Bash expansion normalization — New bash-normalize.mjs strips ${}, empty quotes, backslash splits before command matching. Applied in pre-bash-destructive.mjs and pre-install-supply-chain.mjs
Rule of Two enforcement — post-session-guard.mjs gains LLM_SECURITY_TRIFECTA_MODE=block|warn|off (default: warn). Block mode exits with code 2 for MCP-concentrated trifecta or sensitive path + exfiltration
100-call long-horizon monitoring — Extended window alongside 20-call sliding window. Slow-burn trifecta detection (legs >50 calls apart = MEDIUM). Behavioral drift via Jensen-Shannon divergence on tool-class distribution
HITL trap detection — HIGH patterns for approval urgency, summary suppression, scope minimization. MEDIUM for cognitive load (injection buried in verbose output)
Sub-agent delegation tracking — post-session-guard.mjs tracks Task/Agent tool usage. Escalation-after-input advisory when delegation occurs within 5 calls of untrusted input (DeepMind Agent Traps kat. 4)
Natural language indirection — MEDIUM patterns for "fetch this URL and execute", "send this data to", "read ~/.ssh". Strict false-positive tests for benign phrasing
Hybrid attack patterns — P2SQL (SQL keywords in injection text), recursive injection (injection containing injection), XSS in agent context (<script>, javascript:, onerror=)
CaMeL-inspired data flow tagging — SHA-256 provenance tracking in post-session-guard.mjs. Hash of tool output → match against subsequent tool input. Linked data flows elevate trifecta severity
Adaptive red-team — attack-simulator.mjs --adaptive runs 5 mutation rounds per passing scenario: homoglyph substitution, encoding wrapping, zero-width injection, case alternation, synonym substitution. Rules in knowledge/attack-mutations.json
Knowledge base expansion — prompt-injection-research-2025-2026.md (7 papers), deepmind-agent-traps.md (6 categories, 43 techniques), attack-mutations.json (synonym tables). Attack scenarios expanded from 38 to 64 across 12 categories
Posture scanner expanded to 13 categories — New: Prompt Injection Hardening (cat 11), Rule of Two (cat 12), Long-Horizon Monitoring (cat 13). Checks for MEDIUM advisory, Unicode Tag detection, bash normalization, TRIFECTA_MODE, behavioral drift
Defense Philosophy section in CLAUDE.md — honest documentation of what v5.0 can and cannot do, based on joint paper findings (95-100% ASR against all tested defenses)
8 new posture scanner tests (49 total for posture)

Changed

Posture scanner version updated to 5.0.0
Dashboard aggregator version updated to 5.0.0
Red-team scenarios expanded from 38 to 64 across 12 categories
Knowledge files count: 10 -> 13

[4.5.1] - 2026-04-04

Fixed

Cross-platform support (Windows/Linux). Replaced all Unix-only patterns: fileURLToPath() instead of import.meta.url.replace('file://', ''), path.dirname() instead of lastIndexOf('/'), native fetch() instead of curl subprocess (Node 18+), removed 2>/dev/null from shell commands, fixed tilde expansion regex for Windows backslash paths. 11 files changed, 782 tests pass.

[4.5.0] - 2026-04-04

Added

Attack simulation / red-team mode — scanners/attack-simulator.mjs runs 38 crafted attack scenarios across 7 categories against the plugin's own hooks. Data-driven: scenarios defined in knowledge/attack-scenarios.json, payloads assembled at runtime via fragment concatenation (avoids triggering hooks on source file). Categories: secrets (7), destructive (8), supply-chain (4), prompt-injection (6), pathguard (6), mcp-output (4), session-trifecta (3). CLI: node scanners/attack-simulator.mjs [--category <name>] [--json] [--verbose]. Library: import { loadScenarios, runScenario, resolvePayloads }
/security red-team command — attack simulation with category filter (--category secrets|destructive|...). Narrative report with per-category breakdown and defense score
knowledge/attack-scenarios.json — 38 red-team scenarios with placeholder payloads ({{MARKER}} syntax), resolved at runtime to actual attack strings
31 new tests for attack simulator (unit + integration + CLI)

[4.4.0] - 2026-04-03

Added

Cross-project security dashboard — scanners/dashboard-aggregator.mjs discovers all Claude Code projects under ~/ (depth 3) and ~/.claude/plugins/, runs posture-scanner on each, aggregates results. Machine grade = weakest link across all projects. Cache in ~/.cache/llm-security/dashboard-latest.json (24h staleness). CLI: node scanners/dashboard-aggregator.mjs [--no-cache] [--max-depth N]. Library: import { aggregate, discoverProjects }
/security dashboard command — machine-wide security overview with per-project grade table, sorted by grade (worst first). Shows cache status, total findings, and recommendations based on machine grade
16 new tests for dashboard aggregator (discovery, aggregation, caching, grade logic)

[4.3.0] - 2026-04-03

Added

MCP description drift detection — scanners/lib/mcp-description-cache.mjs caches MCP tool descriptions in ~/.cache/llm-security/mcp-descriptions.json with 7-day TTL. Compares via Levenshtein distance — >10% change triggers advisory (OWASP MCP05 rug-pull). extractMcpServer() exported for server attribution
MCP-concentrated trifecta — post-session-guard.mjs now detects when all 3 lethal trifecta legs (input + access + exfil) originate from the same MCP server, elevating severity. Single compromised server pattern
Cumulative data volume tracking — post-session-guard.mjs tracks total output bytes per session, warns at 100KB (LOW), 500KB (MEDIUM), 1MB (HIGH) thresholds (OWASP ASI02)
Per-MCP-tool volume tracking — post-mcp-verify.mjs tracks cumulative output per MCP tool, warns when a single tool exceeds 100KB (OWASP ASI02, MCP03)
MCP drift integration in post-mcp-verify — checks MCP tool descriptions on every invocation against cached baseline, advisory on significant drift
35 new tests: 16 for mcp-description-cache, 5 for post-mcp-verify drift/volume, 14 for post-session-guard MCP features

[4.2.0] - 2026-04-03

Added

Supply chain re-check scanner — scanners/supply-chain-recheck.mjs (prefix SCR) periodically re-audits installed dependencies by parsing lockfiles (package-lock.json, yarn.lock, requirements.txt, Pipfile.lock). Checks against curated blocklists, OSV.dev batch API (/v1/querybatch) for known CVEs, and Levenshtein-based typosquat detection against top-packages knowledge base. Offline fallback: blocklist + typosquat checks run without network, INFO finding notes skipped CVE check. OWASP: LLM03, ASI04, AST06, MCP04
Shared supply chain data module — scanners/lib/supply-chain-data.mjs extracts blocklists (NPM/PIP/Cargo/Gem), helper functions, and OSV.dev API calls shared between the hook (pre-install-supply-chain.mjs) and the new scanner
/security supply-check command — standalone dependency re-audit with focused output. CLI wrapper: node scanners/supply-chain-recheck-cli.mjs <path>
SCR prefix added to all 4 OWASP maps (LLM, ASI, AST, MCP) in severity.mjs
Supply chain scanner integrated into scan-orchestrator (10th scanner, runs before toxic-flow)
Test fixtures: tests/fixtures/supply-chain/ with compromised and clean lockfiles for npm, pip, yarn, Pipfile
30 new tests for supply-chain-recheck scanner and shared module

Changed

pre-install-supply-chain.mjs hook refactored to import blocklists and helpers from shared supply-chain-data.mjs module (reduced duplication by ~160 lines)

[4.1.0] - 2026-04-03

Added

Reference configuration generator — scanners/reference-config-generator.mjs generates Grade A security configuration based on posture scanner gaps. Detects project type (plugin/monorepo/standalone). Templates in templates/reference-config/. CLI: node scanners/reference-config-generator.mjs [path] [--apply]. Library: import { generate } from './reference-config-generator.mjs'
/security harden command — runs posture scanner, identifies gaps, generates settings.json (deny-first), CLAUDE.md security section, and .gitignore additions. Supports --dry-run (default) and --apply (writes with backup). Post-apply verification re-runs posture scanner to confirm improvement
Reference config templates: settings-deny-first.json, claude-md-security-section.md, gitignore-security.txt
23 new tests for reference-config-generator (grade-a, grade-f, apply mode, project type detection)

[4.0.0] - 2026-04-03

Added

Deterministic posture scanner — posture-scanner.mjs replaces the Opus-based posture-assessor-agent for /security posture. 10 categories assessed in <50ms (was ~6 min with agent). Scanner prefix PST. Standalone CLI: node scanners/posture-scanner.mjs [path] → JSON stdout. Categories: Deny-First, Secrets, Path Guarding, MCP Trust, Destructive Blocking, Sandbox, Human Review, Plugin Sources, Session Isolation, Cognitive State Security. Reuses scanForInjection() and gradeFromPassRate() from shared libraries. Grade A/B/C/D/F with risk score, risk band, and verdict
PST prefix added to all 4 OWASP maps (LLM, ASI, AST, MCP) in severity.mjs
Test fixtures: tests/fixtures/posture-scan/grade-a-project/ (Grade A) and grade-f-project/ (Grade F)
41 new tests for posture scanner (interface, grade-a, grade-f)

Changed

/security posture now uses deterministic scanner via Bash instead of spawning posture-assessor-agent. Instant results, zero token cost
/security audit runs posture scanner first for instant category data, then agents for narrative and skill/MCP analysis
Posture-assessor-agent retained for full audit narrative only

[3.1.1] - 2026-04-03

Audit remediation: 6 findings fixed, global settings hardened.

[3.0.0] - 2026-04-03

Public release. 8 development sessions from v2.5 to v3.0.

Added

Toxic flow analysis (v2.7.0) — 8th orchestrated scanner (toxic-flow-analyzer.mjs, prefix TFA) detecting lethal trifecta patterns: untrusted input + sensitive data access + exfiltration sink. Post-processing correlator consuming output from all prior scanners. Direct, cross-component, and project-level detection with mitigation downgrades. OWASP: ASI01, ASI02, ASI05
Runtime session guard (v2.7.1) — PostToolUse hook monitoring tool call sequences for lethal trifecta forming during a session. Sliding window (20 calls), per-session JSONL state in /tmp/, advisory warning (never blocks). Auto-cleanup after 24h
MCP runtime inspection (v2.8.0) — Standalone scanner (mcp-live-inspect.mjs, prefix MCI) connecting to running MCP stdio servers via JSON-RPC 2.0. Fetches live tool/prompt/resource lists, scans descriptions for injection patterns, detects tool shadowing across servers. 10s timeout per server. New /security mcp-inspect command. /security mcp-audit --live flag for combined static + live analysis
Auto update notifications (v2.8.1) — UserPromptSubmit hook checking for newer plugin versions against the public Forgejo repo (max 1x/24h, cached in ~/.cache/llm-security/). Disable: LLM_SECURITY_UPDATE_CHECK=off
Report diffing & baseline (v2.9.0) — diff-engine.mjs library for finding fingerprinting, fuzzy line matching (+-3), and diff categorization (new/resolved/unchanged/moved). Scan orchestrator gains --baseline and --save-baseline flags. Baselines stored per target hash in reports/baselines/. New /security diff command
Continuous scanning (v2.9.1) — /security watch [path] [--interval 6h] using built-in /loop for recurring diff scanning. watch-cron.mjs standalone script for system cron/launchd with multi-target config and exit codes
Skill signature registry (v2.9.2) — skill-registry.mjs library for SHA-256 fingerprinting of normalized skill content, scan result caching (7-day staleness), and pattern search. New /security registry command. /security scan checks registry before full scan for instant results on known fingerprints
OWASP Skills Top 10 (v2.6.0) — New knowledge file owasp-skills-top10.md (AST01-AST10) with skill-specific threat definitions and mitigations
MEDIUM injection patterns (v2.6.0) — ~15 new patterns: base64 payloads, leetspeak, homoglyphs, multi-language mixing, markdown/HTML comment injection
4-framework OWASP mapping (v2.6.0) — Full coverage of LLM Top 10, Agentic AI Top 10 (ASI), Skills Top 10 (AST), MCP Top 10 in severity.mjs
Architecture diagram (mermaid) in README
CHANGELOG.md

Changed

Scan orchestrator now runs 8 scanners (was 7) with TFA running last
Agent prompts updated with ASI/AST/MCP OWASP references
scanForInjection() returns { found, severity, patterns } instead of boolean
Self-scan suppressions updated from ~150 to ~190 (TFA self-referential findings added)
Plugin description updated to reference all 4 OWASP frameworks

Fixed

package.json version sync with plugin.json

[2.5.0] - 2026-04-02

Added

Pre-extraction indirection layer for remote scan defense. Remote scans pre-extract structured evidence via content-extractor.mjs and strip injection patterns BEFORE LLM agents see the content

[2.4.0] - 2026-04-01

Added

GitHub repo URL support for scan and plugin-audit. Clone to temp dir via git-clone.mjs, scan locally, clean up. --branch <name> flag for non-default branches

[2.3.0] - 2026-04-01

Added

PostToolUse expanded to ALL tools (was Bash-only). Scans Read, WebFetch, MCP, and all other tool output for indirect prompt injection
LLM_SECURITY_INJECTION_MODE env var: block (default), warn, off
Complementary Tools section in README (parry-guard, Lasso, Snyk)
CLAUDE.md poisoning documented as known limitation

Changed

Short output skip (<100 chars) for PostToolUse performance

[2.2.0] - 2026-04-01

Added

UserPromptSubmit hook blocking prompt injection in user input
Obfuscation decoding: unicode-escape, hex-escape, URL-encoding, base64 normalization
Shared injection-patterns.mjs module (21 critical + 8 high patterns)
PostToolUse indirect injection scanning in tool output (LLM01)

Changed

LLM01 coverage 83% -> 95%, LLM05 80% -> 83%

[2.1.0] - 2026-04-01

Added

383 tests (was 177): full hook coverage (66 tests), auto-cleaner coverage (140 tests)
HTTPS install URL under fromaitochitta org

Fixed

Auto-cleaner import guard
Solo project setup (CONTRIBUTING.md removed)

Changed

Model defaults set to sonnet

[2.0.0] - 2026-03-31

Added

Open-source release: MIT LICENSE, SECURITY.md
Test suite (node:test, 177 tests)
pre-write-pathguard.mjs hook (8 path categories)
.gitignore, .editorconfig

[1.4.0] - 2026-02-21

Added

Unified risk scoring formula (25/10/4/1 weights)
Score-based verdicts and risk bands (Low -> Extreme)
OWASP categorization and A-F grading
Single unified-report.md template replacing 9 separate templates

[1.3.0] - 2026-02-21

Added

/security clean command with 3-tier remediation (auto/semi-auto/manual)
auto-cleaner.mjs engine (16 fix operations, atomic writes, post-fix validation)
cleaner-agent for semi-auto proposals
--dry-run flag

[1.2.0] - 2026-02-19

Added

7 deterministic Node.js scanners (unicode, entropy, permissions, dependencies, taint, git forensics, network)
/security deep-scan command and --deep flag
Synthesizer agent for scanner JSON interpretation
Shared scanner library (scanners/lib/)
Demo fixture with 85-finding security assessment

Changed

OWASP coverage: LLM01 70->85%, LLM02 90->95%, LLM03 80->90%, LLM06 85->95%

[1.1.0] - 2026-02-19

Added

/security plugin-audit command
/security mcp-audit command
/security pre-deploy command
3 new report templates

Changed

OWASP coverage: LLM03 75% -> 80%

[1.0.0] - 2026-02-19

Added

Initial release
4 agents: skill-scanner, mcp-scanner, posture-assessor, threat-modeler
4 hooks: secret detection, destructive commands, supply chain, output verification
6 knowledge files (2,771 lines)
8 commands: security, scan, audit, posture, threat-model, plugin-audit, mcp-audit, pre-deploy
7 report templates
OWASP LLM Top 10 + Agentic AI Top 10 coverage

69 KiB Raw Blame History Unescape Escape

Changelog

[Unreleased]

Added

[7.3.1] - 2026-05-01

Added

Changed

Fixed

Notes

[7.3.0] - 2026-05-01

Added

Changed

Documentation

Tests

Notes

[7.2.0] - 2026-04-29

Added

Changed

Documentation

Tests

Notes

[7.1.1] - 2026-04-29

Fixed

Added

Tests

Why

Out of scope (flagged for Batch B)

[7.1.0] - 2026-04-29

Fixed

Changed

Tests

Why

[7.0.0] - 2026-04-19

BREAKING CHANGES

Added

Changed

Why

[6.6.0] - 2026-04-18

Added

Changed

Why

[6.5.0] - 2026-04-17

Added

Changed

Why

[6.4.0] - 2026-04-17

Added

Changed

Not supported (intentional)

[6.3.0] - 2026-04-17

Added

Changed

[6.2.0] - 2026-04-17

Added

Changed

[6.1.0] - 2026-04-10

Added

Changed

[6.0.0] - 2026-04-10

Added

Changed

[5.1.0] - 2026-04-07

Added

Changed

[5.0.0] - 2026-04-06

Added

Changed

[4.5.1] - 2026-04-04

Fixed

[4.5.0] - 2026-04-04

Added

[4.4.0] - 2026-04-03

Added

[4.3.0] - 2026-04-03

Added

[4.2.0] - 2026-04-03

Added

Changed

[4.1.0] - 2026-04-03

Added

69 KiB

Raw Blame History