History

Kjell Tore Guttormsen b7d64a6d2b docs(llm-security): tre doc-nivåer oppdatert for v7.6.1 CLAUDE.md OBLIGATORISK-regel: enhver feature-endring som pusher til Forgejo MÅ oppdatere alle tre doc-nivåer i SAMME commit eller umiddelbart etter. v7.6.1-fix-commit (`f9b555a`) bumpet kun versjons-badgen — denne oppfølgings-commit-en lukker doc-gapet. - plugins/llm-security/README.md: ny [7.6.1] history-tabell-rad - plugins/llm-security/CLAUDE.md: header bumpet v7.6.0 → v7.6.1 + ny v7.6.1-blurb (alle 6 fix-detaljer) - README.md (rot): llm-security versjons-rad bumpet v7.6.0 → v7.6.1 + v7.6.1 history-bullet over v7.6.0-bullet Ingen kodeendringer. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>		2026-05-06 14:44:55 +02:00
..
.claude-plugin	fix(llm-security): playground v7.6.1 — visuelle bugs i v7.6.0	2026-05-06 14:33:19 +02:00
agents	docs(scoring): unify scan/audit/mcp-scanner/posture-assessor to v2 formula	2026-04-29 13:58:25 +02:00
bin	feat(llm-security): /security ide-scan <url> — Marketplace/OpenVSX/direct VSIX (v6.4.0)	2026-04-17 17:16:26 +02:00
ci	feat(ci): add CI/CD integration — --fail-on, --compact, pipeline templates	2026-04-10 14:59:05 +02:00
commands	feat(commands): E14 part 3 — /security mcp-baseline-reset slash command	2026-04-30 16:49:01 +02:00
docs	docs(hardening-guide): 8.6 — sandbox-architecture rationale (no code consolidation)	2026-04-30 16:55:45 +02:00
examples	feat(llm-security): add pre-compact-poisoning example for PreCompact hook [skip-docs]	2026-05-05 15:23:10 +02:00
hooks	feat(policy-loader): 8.7 — env-var deprecation warnings (v8.0.0 removal)	2026-04-30 17:11:07 +02:00
knowledge	feat(workflow-scanner): E11 part 1 — core file-walk + 23-field blacklist + sink-restriction	2026-04-30 15:48:48 +02:00
playground	fix(llm-security): playground v7.6.1 — visuelle bugs i v7.6.0	2026-05-06 14:33:19 +02:00
reports	chore(privacy): scrub real-org references from plugin internals (phase 2)	2026-05-03 04:28:15 +02:00
scanners	feat(llm-security): playground Fase 3 — v7.5.0 med 18 parsere/renderere	2026-05-05 22:15:47 +02:00
scripts	feat: initial open marketplace with llm-security, config-audit, ultraplan-local	2026-04-06 18:47:49 +02:00
templates	fix(llm-security): template — v1 → v2 risk constants + narrative_audit block	2026-04-29 12:45:28 +02:00
test-fixtures/trifecta-plugin	feat: initial open marketplace with llm-security, config-audit, ultraplan-local	2026-04-06 18:47:49 +02:00
tests	test(llm-security): add e2e suite proving framework works as coordinated system	2026-05-05 12:06:57 +02:00
--json	feat: initial open marketplace with llm-security, config-audit, ultraplan-local	2026-04-06 18:47:49 +02:00
.editorconfig	feat: initial open marketplace with llm-security, config-audit, ultraplan-local	2026-04-06 18:47:49 +02:00
.gitignore	chore(llm-security): stage ignore patterns for session files	2026-04-30 15:07:35 +02:00
.llm-security-ignore	feat: initial open marketplace with llm-security, config-audit, ultraplan-local	2026-04-06 18:47:49 +02:00
.npmignore	feat(ci): add CI/CD integration — --fail-on, --compact, pipeline templates	2026-04-10 14:59:05 +02:00
.orphaned_at	feat: initial open marketplace with llm-security, config-audit, ultraplan-local	2026-04-06 18:47:49 +02:00
CHANGELOG.md	fix(llm-security): playground v7.6.1 — visuelle bugs i v7.6.0	2026-05-06 14:33:19 +02:00
CLAUDE.md	docs(llm-security): tre doc-nivåer oppdatert for v7.6.1	2026-05-06 14:44:55 +02:00
CONTRIBUTING.md	fix(llm-security): correct distribution URLs to marketplace path	2026-05-01 06:20:54 +02:00
GOVERNANCE.md	docs: introduce GOVERNANCE.md and unify fork-and-own blurb	2026-05-03 14:57:00 +02:00
LICENSE	feat: initial open marketplace with llm-security, config-audit, ultraplan-local	2026-04-06 18:47:49 +02:00
package.json	fix(llm-security): playground v7.6.1 — visuelle bugs i v7.6.0	2026-05-06 14:33:19 +02:00
README.md	docs(llm-security): tre doc-nivåer oppdatert for v7.6.1	2026-05-06 14:44:55 +02:00
SECURITY.md	chore(llm-security): v7.3.1 — stabilization patch for forkers and downstream users	2026-05-01 06:14:03 +02:00
V3-ANNOUNCEMENT.md	feat: initial open marketplace with llm-security, config-audit, ultraplan-local	2026-04-06 18:47:49 +02:00
V3-UPGRADE.md	feat: initial open marketplace with llm-security, config-audit, ultraplan-local	2026-04-06 18:47:49 +02:00

README.md

LLM Security Plugin for Claude Code

Automated defense and advisory analysis for the agentic AI attack surface.

Solo-maintained, fork-and-own. This plugin is a starting point, not a vendor product. Issues are welcome as signals; pull requests are not accepted. See GOVERNANCE.md for the full model and what upstream provides.

AI-generated: all code produced by Claude Code through dialog-driven development. Full disclosure →

A Claude Code plugin that provides security scanning, auditing, and threat modeling for agentic AI projects. Built on OWASP LLM Top 10 (2025), OWASP Agentic AI Top 10 (ASI01-ASI10), OWASP Skills Top 10 (AST01-AST10), MCP Top 10, and the AI Agent Traps taxonomy (Google DeepMind, 2025), grounded in published research from ToxicSkills, ClawHavoc, MCPTox, Pillar Security, Invariant Labs, GHSL Security Lab, and Operant AI.

Why this exists

Claude Code's extensibility model — skills, MCP servers, plugins, hooks, IDE extensions — creates an attack surface that mirrors the npm/PyPI supply chain problem with one critical difference: extensions run with LLM agency. A malicious plugin doesn't just execute code in a sandbox. It can instruct the agent to read your SSH keys, exfiltrate environment variables, install persistence mechanisms, and modify its own configuration — all while appearing to be a helpful "Project Health Dashboard."

This is not theoretical. ToxicSkills (Xi'an Jiaotong, 2025), ClawHavoc (Repello AI, 2025), GHSL-class workflow injections, and the November 2024 npm/PyPI typosquat campaigns documented real attack patterns. OWASP, NIST, and the EU AI Act now formalize the controls needed.

This plugin layers three independent kinds of defense — runtime hooks that block, deterministic scanners that compute, and LLM-driven advisory commands that judge — so failures in any one layer are caught by the others.

Important

Scan repos remotely before cloning. A poisoned CLAUDE.md injects instructions into the model context the moment you open a cloned repo — before any hook can intervene. /security scan https://repo-url --deep analyses everything safely via pre-extraction, without loading anything into your session. This is the primary defense against CLAUDE.md poisoning.

Quick Start

Prerequisites

Claude Code v2.x+
Node.js (any recent LTS — required for hook scripts)

Install

claude plugin marketplace add https://git.fromaitochitta.com/open/ktg-plugin-marketplace.git

Or enable directly in ~/.claude/settings.json:

{
  "enabledPlugins": {
    "llm-security@ktg-plugin-marketplace": true
  }
}

Hooks activate immediately on install. Secret detection, path guarding, prompt-injection scanning, destructive-command blocking, supply-chain guardrails, and runtime trifecta detection start working without any commands.

First scan

> /security posture

┌──────────────────────────────────────────────┐
│  Security Posture: 8/16  [B]  77%            │
├──────────────────────────────────────────────┤
│  ✅ Deny-First Config                         │
│  ✅ Secrets Protection                        │
│  ⚠️  MCP Server Trust                          │
│  ✅ Destructive Command Blocking              │
│  ⚠️  Sandbox Config                            │
│  ✅ Prompt Injection Hardening                │
│  ⚠️  Rule of Two                               │
│  ✅ EU AI Act                                 │
│  ⚠️  NIST AI RMF                               │
│  —  ISO 42001                                 │
├──────────────────────────────────────────────┤
│  6 findings (1 high, 3 medium, 2 low)        │
└──────────────────────────────────────────────┘

Tip

Start with /security posture for a 30-second baseline, then /security audit for the full picture, or /security scan <target> for supply-chain gating.

Important

Opus extended-context users: subagents inherit the parent session's context limit but do not support extended context, causing API errors. Run /model Opus before using security commands to reset to the standard 200 K context window subagents handle correctly.

What's inside

flowchart TB
    subgraph Runtime["Runtime defense — 9 hooks"]
        direction LR
        H1["UserPromptSubmit<br/>injection scan"]
        H2["PreToolUse<br/>secrets · paths · bash · supply chain"]
        H3["PostToolUse<br/>output verify · session guard"]
        H4["PreCompact<br/>transcript scan"]
    end

    subgraph Scanning["Deterministic analysis — 23 scanners"]
        direction LR
        S1["UNI · ENT · PRM · DEP<br/>TNT · GIT · NET · MEM · SCR · TFA"]
        S2["WFL workflow scanner"]
        S3["MCI · IDE · PST · BOM<br/>+ standalone CLIs"]
    end

    subgraph Advisory["Advisory analysis — 6 agents · 20 commands"]
        direction LR
        A1["Skill scanner<br/>7 threat categories"]
        A2["MCP scanner<br/>5-phase analysis"]
        A3["Posture · audit<br/>16 categories, A-F"]
        A4["Threat model<br/>STRIDE × MAESTRO"]
    end

    subgraph Knowledge["Knowledge base — 22 files"]
        direction LR
        K1["5 OWASP frameworks<br/>+ DeepMind Agent Traps"]
        K2["Threat patterns<br/>skills · MCP · workflows · IDE · secrets"]
        K3["Compliance · research<br/>registry · packages"]
    end

    Runtime -->|"blocks/warns in real time"| User["Claude Code session"]
    User -->|"/security scan"| Scanning
    User -->|"/security audit"| Advisory
    Advisory -.->|"grounded by"| Knowledge
    Scanning -->|"enriches"| Advisory

Each layer is independent. A failure in one (e.g. an injection that slips past the prompt scanner) gets a second chance from the others (e.g. the trifecta session guard catching the attempted exfiltration step downstream).

Commands

20 slash commands grouped by purpose. All accept path or GitHub URL targets unless noted.

Scanning & assessment

Command	Description
`/security`	Router with quick-start guide
`/security scan [path\|url]`	Supply-chain gate — ALLOW/WARNING/BLOCK verdict on skills, MCP servers, directories, or remote repos
`/security scan [path\|url] --deep`	Adds 10 deterministic scanners on top of the LLM agents
`/security deep-scan [path]`	Run only the 10 orchestrated deterministic scanners. Supports `--fail-on <severity>`, `--compact`, `--format sarif`, `--output-file <path>`
`/security audit`	Full project audit, A-F grade, prioritized action plan
`/security plugin-audit [path\|url]`	Plugin trust assessment with Install/Review/Do Not Install verdict
`/security mcp-audit [--live]`	Audit installed MCP server configs (`--live` adds runtime inspection)
`/security mcp-inspect`	Connect to running MCP stdio servers and scan live tool descriptions via JSON-RPC 2.0
`/security mcp-baseline-reset`	Clear cumulative-drift baseline cache after a legitimate MCP server upgrade (E14, v7.3.0)
`/security ide-scan [target\|url]`	Scan installed VS Code (+ Cursor, Windsurf, VSCodium, code-server) and JetBrains extensions, OR fetch a remote VSIX from VS Code Marketplace, OpenVSX, direct `.vsix`, or JetBrains Marketplace. 7 VS Code + 7 JetBrains-specific checks plus UNI/ENT/NET/TNT/MEM/SCR per extension
`/security posture`	30-second scorecard across 16 categories incl. EU AI Act, NIST AI RMF, ISO 42001
`/security diff [path]`	Compare scan against stored baseline — new/resolved/unchanged/moved findings
`/security watch [path] [--interval 6h]`	Continuous monitoring via `/loop`
`/security registry [scan\|search]`	Skill signature registry — view stats, scan-and-register, search known fingerprints
`/security supply-check [path]`	Re-audit installed dependencies from lockfiles against blocklists, OSV.dev, and typosquats
`/security dashboard`	Cross-project security dashboard — machine-grade aggregation across all projects under `~/`

Remediation

Command	Description
`/security clean [path]`	Three-tier remediation pipeline — auto-fix safe issues, confirm semi-auto with the user, report manual findings
`/security clean [path] --dry-run`	Preview without modifying files
`/security harden [path]`	Generate Grade A reference config (`settings.json`, `CLAUDE.md`, `.gitignore`)
`/security harden [path] --apply`	Apply with automatic backup

Threat modeling & planning

Command	Description
`/security threat-model`	Interactive STRIDE × MAESTRO 7-layer session, 15-30 min
`/security red-team [--category] [--adaptive]`	Attack simulation — 72 scenarios across 12 categories, 100 % block rate. `--adaptive` applies 5 mutation rounds per blocked scenario for evasion testing
`/security pre-deploy`	Pre-deployment checklist — 10 automated + 3 manual checks

Remote scanning safely

/security scan and /security plugin-audit accept GitHub and Forgejo URLs directly. The plugin clones to a temp directory inside an OS sandbox, scans, and cleans up.

/security scan https://github.com/org/repo --deep
/security scan https://github.com/org/repo --branch dev --deep

Defense-in-depth on the clone path (v5.1+):

Layer	Mechanism	Mitigates
Git config hardening	`core.hooksPath=/dev/null`, `core.symlinks=false`, all `filter.lfs.*` neutralized, `protocol.file.allow=never`, `transfer.fsckObjects=true`, plus `GIT_CONFIG_NOSYSTEM=1` and friends	Git hooks at clone, symlink traversal, filter/smudge driver code execution via `.gitattributes` (CVE-2024-32002 class), local-file protocol traversal, malformed objects
OS filesystem sandbox	macOS `sandbox-exec` (Seatbelt) or Linux `bwrap` (bubblewrap) restricts writes to the per-clone temp dir	Even if a filter driver bypasses git config hardening, the kernel refuses writes outside the sandbox
`.gitattributes` post-clone advisory (E12, v7.3.0)	`scanGitAttributes()` scans for `filter=` / `diff=` / `merge=` driver directives and emits MEDIUM advisories	Surfaces driver-based supply-chain surface that survives even a sandboxed clone
Pre-LLM injection strip	`content-extractor.mjs` produces a structured JSON evidence package; `[INJECTION-PATTERN-STRIPPED]` markers are confirmed findings	Agents never see raw poisoned files from untrusted repos
Post-clone size cap	100 MB max	Resource-exhaustion attacks

Windows has no kernel-level sandbox equivalent. Run Claude Code inside WSL2 or Docker Desktop for full coverage; the git config hardening alone is sufficient against all known .gitattributes attack vectors.

Automated hooks (9)

Hooks run on every operation — no commands needed. They activate the moment the plugin is installed.

Hook	Event	What it does
Prompt injection scan	UserPromptSubmit	Blocks direct injection (override instructions, spoofed system headers, identity redefinition) and warns on subtle signals (leetspeak, homoglyphs, zero-width chars, multi-language). Decodes obfuscated payloads (Unicode Tag, hex, URL, base64, rot13) before matching. Mode: `LLM_SECURITY_INJECTION_MODE=block\|warn\|off` (default block)
Secret detection	Edit, Write	Blocks AWS keys, Azure tokens, GitHub PATs, npm tokens, PEM keys, database URLs, Bearer tokens, and 30+ other secret patterns
Path guarding	Write	Blocks writes to `.env*` (multi-segment-suffix-safe), `.ssh/`, `.aws/`, `.gnupg/`, credentials files, hook scripts, `/etc/`, `settings.json`
Destructive commands	Bash	Blocks `rm -rf /`, `chmod 777`, pipe-to-shell, fork bombs, eval-with-substitution, T8 base64-pipe-shell loaders. Bash-normalize T1-T9 collapses obfuscation (empty quotes, `${IFS}`, ANSI-C hex, process substitution, eval-via-variable) before pattern matching
Supply-chain guardrail	Bash	Blocks known-compromised npm/pip packages, Levenshtein typosquats, age-gated installs (<72 h), OSV.dev CVE checks. Covers npm, pip, brew, docker, go, cargo, gem. v7.3.0: npm scope-hop typosquat advisory (E13) — `@evil/lodash`-class catches scope-jumping when the unscoped name matches a popular package
Output verification	All tools (post)	Advisory: scans ALL tool output for indirect injection (LLM01) and HITL traps (DeepMind kat. 6). Bash-specific: leaked secrets, unexpected URLs, oversized MCP responses. v7.3.0: per-update MCP description drift AND cumulative drift vs sticky baseline (E14) — slow-burn rug-pulls that stay under per-update thresholds but cumulatively diverge ≥25% emit `mcp-cumulative-drift` MEDIUM
Session guard	All tools (post)	Advisory: monitors tool-call sequences for the lethal trifecta (untrusted input + sensitive read + exfiltration sink). 20-call sliding window + 100-call long-horizon window. Mode: `LLM_SECURITY_TRIFECTA_MODE=block\|warn\|off`. Sub-agent delegation tracking via Task/Agent tools surfaces escalation-after-input as a separate advisory
Pre-compact scan	PreCompact	Scans transcript tail (max 512 KB, <500 ms) for injection patterns + credentials before context compaction. Prevents poisoned content from surviving in compact form. Mode: `LLM_SECURITY_PRECOMPACT_MODE=block\|warn\|off` (default warn)
Update check	UserPromptSubmit	Checks for newer plugin versions max 1× / 24 h, cached. Disable: `LLM_SECURITY_UPDATE_CHECK=off`

All hooks are Node.js .mjs for cross-platform compatibility (macOS, Linux, Windows).

Important

Five hooks are blocking (prompt injection, secrets, path guarding, destructive commands, supply chain). Four are advisory (output verification, session guard, pre-compact, update check). Blocking modes can be downgraded via env-vars or policy.json for security research or staged rollouts.

Deterministic scanners

23 scanners. Zero external dependencies. All output JSON.

Orchestrated (10) — run via `node scanners/scan-orchestrator.mjs <target>` or `/security deep-scan`

Scanner	Prefix	Detects	OWASP
`unicode-scanner.mjs`	UNI	Zero-width chars, Unicode Tag steganography (incl. PUA-A/B), BIDI overrides, Cyrillic/Greek homoglyphs (NFKC fold)	LLM01
`entropy-scanner.mjs`	ENT	High-entropy strings, base64/hex blobs, encoded payloads via Shannon entropy. Two-stage context classification suppresses GLSL/CSS/inline-SVG/markdown-CDN false positives	LLM01, LLM03
`permission-mapper.mjs`	PRM	Purpose-vs-tools mismatch, ghost hooks, haiku on sensitive agents, overprivileged components	LLM06
`dep-auditor.mjs`	DEP	CVEs (npm/pip audit + OSV.dev), Levenshtein + token-overlap typosquats, malicious install scripts, unpinned versions	LLM03
`taint-tracer.mjs`	TNT	Source-to-sink data flow (process.env / req.body → eval / exec / fetch / writeFile), 3-pass analysis, destructuring + spread support	LLM01, LLM02
`git-forensics.mjs`	GIT	Force pushes, description drift, hook modifications, new outbound URLs, author changes	LLM03
`network-mapper.mjs`	NET	Undisclosed URLs, suspicious domains (ngrok, webhook.site), IP-based URLs, DNS analysis	LLM02, LLM03
`memory-poisoning-scanner.mjs`	MEM	Injection patterns, shell commands, credential paths, permission expansion, suspicious URLs in `CLAUDE.md` / memory / `.claude/rules` / `.claude/agents/*.md`	LLM01, ASI02
`supply-chain-recheck.mjs`	SCR	Re-audit installed deps from lockfiles against blocklists, OSV.dev batch API, typosquats	LLM03
`toxic-flow-analyzer.mjs`	TFA	Lethal trifecta correlation across prior scanner output (runs last)	ASI01, ASI02, ASI05

Workflow & live (3, run independently)

Scanner	Prefix	Detects	OWASP
`workflow-scanner.mjs` (E11, v7.3.0)	WFL	GitHub Actions and Forgejo Actions injection — dangerous `${{ <field> }}` interpolations inside `run:` blocks across a 23-field GHSL+GlueStack-class blacklist; sink-restricted (only `run:` is a shell sink); severity matrix grades by trigger privilege; tracks env-block re-interpolation (Appsmith GHSL-2024-277 stealth pattern); flags `actor == bot[bot]` auth-bypass (Synacktiv 2023 Dependabot class)	LLM02, LLM06
`mcp-live-inspect.mjs`	MCI	Connects to running MCP servers via JSON-RPC 2.0 and scans live tool descriptions for injection, shadowing, drift	LLM01, LLM02
`ide-extension-scanner.mjs`	IDE	VS Code (+ forks) and JetBrains plugin prescan: blocklist, theme-with-code, sideload, broad activation, typosquat, extension-pack expansion, dangerous uninstall hooks; `Premain-Class` instrumentation; native binaries; nested-jar inspection	LLM01-03, LLM06, ASI02, ASI04

Standalone utilities (10)

Scanner	Purpose
`posture-scanner.mjs`	Deterministic posture assessment, 16 categories, <50 ms
`attack-simulator.mjs`	Red-team harness: 72 scenarios, 12 categories, fixed + adaptive modes, benchmark output
`ai-bom-generator.mjs`	CycloneDX 1.6 AI Bill of Materials
`dashboard-aggregator.mjs`	Cross-project security dashboard with weakest-link machine grade
`reference-config-generator.mjs`	Grade A config generation based on posture gaps
`mcp-baseline-reset.mjs`	Clear cumulative-drift baseline cache (`--list` / `--target <tool>` / clear-all)
`auto-cleaner.mjs`	Remediation engine — 16 fix operations, atomic writes, post-fix validation
`content-extractor.mjs`	Pre-extracts evidence from untrusted repos and strips injection patterns before LLM exposure
`watch-cron.mjs`	Cron wrapper for background scanning
`scan-orchestrator.mjs`	Entry point that runs all 10 orchestrated scanners

Why deterministic? LLMs are powerful at semantic analysis — intent, social engineering, context. They cannot reliably calculate Shannon entropy, measure Levenshtein distance between package names, trace taint flow across function boundaries, or detect individual Unicode codepoints. These scanners fill that gap.

MCP cumulative drift baseline (E14, v7.3.0)

scanners/lib/mcp-description-cache.mjs anchors a sticky baseline description per MCP tool plus a rolling 10-event history. Cumulative drift = levenshtein(current, baseline) / max(|current|, |baseline|). When the ratio crosses mcp.cumulative_drift_threshold (default 0.25), post-mcp-verify.mjs emits a MEDIUM mcp-cumulative-drift advisory — independent of the existing per-update >10% drift signal. Slow-burn rug-pulls that keep each update under the per-update threshold but cumulatively diverge from baseline are now caught.

The baseline survives the 7-day TTL purge so detection persists across the full window. After a legitimate MCP server upgrade, run /security mcp-baseline-reset (or node scanners/mcp-baseline-reset.mjs --target <tool>) to clear the stale baseline. The next call seeds a fresh baseline; description, firstSeen, lastSeen, and history are preserved across reset for audit. LLM_SECURITY_MCP_CACHE_FILE overrides the cache path for testing.

Agents (6)

Specialized analysts spawned by commands. Read-only by default; clean and harden grant Edit/Write under explicit user confirmation.

Agent	Role	Spawned by
`skill-scanner-agent`	7 threat categories (injection, exfiltration, privilege escalation, scope creep, hidden instructions, toolchain manipulation, persistence) for skills/commands/agents	`scan`, `audit`, `plugin-audit`
`mcp-scanner-agent`	5-phase MCP analysis (tool descriptions, source code, dependencies, configuration, rug pull detection)	`scan`, `mcp-audit`
`posture-assessor-agent`	Full audit narrative with PASS/PARTIAL/FAIL scoring and A-F grading (the deterministic `posture-scanner.mjs` handles quick mode)	`audit`, `posture`
`threat-modeler-agent`	Interactive STRIDE × MAESTRO interview, 5-phase workflow	`threat-model`
`deep-scan-synthesizer-agent`	Interprets deterministic scanner JSON into a human-readable report with executive summary + prioritized recommendations	`deep-scan`, `scan --deep`
`cleaner-agent`	Generates semi-auto remediation proposals for findings requiring human judgment (returns JSON proposals; `clean.md` performs the edits after user approval)	`clean`

All agents run on Opus and reference the knowledge base for grounding. Agents are spawned sequentially to avoid burst rate limits.

Knowledge base (22 files)

All analysis is grounded in published threat intelligence. The knowledge files are read by agents at scan time, not loaded preemptively.

Category	Files
OWASP frameworks	`owasp-llm-top10.md`, `owasp-agentic-top10.md`, `owasp-skills-top10.md`, `mcp-threat-patterns.md` (9 categories), `mitigation-matrix.md`
Threat patterns	`skill-threat-patterns.md` (7 categories from ToxicSkills/ClawHavoc), `secrets-patterns.md` (30+ regex), `ide-extension-threat-patterns.md` (10 categories with 2024-2026 case studies), `workflow-injection-patterns.md` (23-field blacklist + Forgejo divergences)
Research	`prompt-injection-research-2025-2026.md` (7 papers), `deepmind-agent-traps.md` (6 categories, 43 techniques), `attack-scenarios.json` (72 red-team scenarios), `attack-mutations.json` (synonym tables for adaptive testing)
Compliance	`compliance-mapping.md` (EU AI Act, NIST AI RMF, ISO 42001, MITRE ATLAS), `norwegian-context.md` (Datatilsynet, NSM, Digitaliseringsdirektoratet)
Reference data	`top-packages.json` (top 200 npm + 100 PyPI), `top-vscode-extensions.json`, `top-jetbrains-plugins.json`, `typosquat-allowlist.json`, `marketplace-api-notes.md`, `jetbrains-marketplace-api-notes.md`, `skill-registry.json`

Coverage at a glance

OWASP LLM Top 10 (2025) — control-count coverage from knowledge/mitigation-matrix.md:

Category	Hooks	Scanners	Commands	Coverage
LLM01 Prompt Injection	✅	UNI + ENT + TNT	scan, audit	95 %
LLM02 Sensitive Info Disclosure	✅	TNT + NET	audit	83 %
LLM03 Supply Chain	◐	ENT + DEP + GIT + NET	scan, plugin-audit, mcp-audit, supply-check	60 %
LLM04 Data Poisoning	—	—	threat-model	40 %
LLM05 Improper Output Handling	✅	—	audit	83 %
LLM06 Excessive Agency	✅	PRM + WFL	posture	100 %
LLM07 System Prompt Leakage	—	—	audit	60 %
LLM08 Vector/Embedding Weaknesses	—	—	threat-model	40 %
LLM09 Misinformation	—	—	advisory	50 %
LLM10 Unbounded Consumption	—	—	pre-deploy	83 %

Average ~69 %. Strongest at prompt injection (95 % with input + output scanning + obfuscation decoders) and agency controls (100 %). Weakest at LLM04/08, which are better addressed at the model-provider or platform level. /security threat-model and /security pre-deploy surface the gaps advisorily.

Agentic and skill frameworks — full ASI01-ASI10 and AST01-AST10 mapping in knowledge/owasp-agentic-top10.md and knowledge/owasp-skills-top10.md.

Compliance & governance

Capability	Detail
Compliance mapping	EU AI Act (Art. 9, 15, 17), NIST AI RMF (Map / Measure / Manage / Govern), ISO 42001 (Annex A), MITRE ATLAS techniques. Posture categories 14-16 assess readiness
Norwegian context	Datatilsynet DPIA-for-AI guidance, NSM basic security principles, Digitaliseringsdirektoratet — relevant for Norwegian public-sector deployments
SARIF 2.1.0 output	`--format sarif` on scan / deep-scan produces OASIS SARIF for CI/CD ingestion (GitHub Advanced Security, Azure DevOps, SonarQube)
Structured audit trail	JSONL events with ISO 8601 timestamps and OWASP category tags (`LLM_SECURITY_AUDIT_*` env-vars or `audit.log_path` policy key) — SIEM-ready
AI-BOM	CycloneDX 1.6 BOM for AI components — models, MCP servers, plugins, knowledge files, hooks (`llm-security audit-bom <target>`)
Policy-as-code	`.llm-security/policy.json` ships hook configuration with the team. v7.3.0 (D3) adds a one-time-per-process stderr deprecation line when both an env-var AND its `policy.json` equivalent are explicitly set; env still wins through the v7.x runway, env reads removed in v8.0.0. Suppress noise with `LLM_SECURITY_DEPRECATION_QUIET=1`
Standalone CLI	`node bin/llm-security.mjs scan <target>` — runs scanners without Claude Code. Subcommands: `scan`, `deep-scan`, `posture`, `audit-bom`, `benchmark`. Schrems II compatible in default offline mode (optional OSV.dev enrichment is the only network call and is opt-in)
CI/CD integration	`--fail-on <severity>` for threshold-based exit codes, `--compact` for one-liner output. Templates for GitHub Actions, Azure DevOps, GitLab CI in `ci/`. Guide: `docs/ci-cd-guide.md`

Benchmarks

/security red-team (also llm-security benchmark) tests hook defenses with 72 crafted scenarios across 12 categories. Adaptive mode applies 5 mutation rounds per blocked scenario (homoglyph substitution, encoding wrapping, zero-width injection, case alternation, synonym replacement). Current block rate: 100 % fixed mode.

Workflow examples

1 — pre-installation gate

/security scan path/to/plugin           # ALLOW/WARNING/BLOCK
/security plugin-audit path/to/plugin   # Install/Review/Do Not Install

# Remote — scans without installing
/security scan https://github.com/org/repo --deep
/security plugin-audit https://github.com/org/repo

2 — monthly review

/security posture     # 30-second baseline
/security audit       # full A-F grade with action items
                      # → fix critical/high
/security posture     # verify improvement

3 — track over time

/security diff path/to/project    # first run creates baseline
/security watch path/to/project   # continuous, runs diff every 6h via /loop

4 — deep threat analysis

/security threat-model     # 15-30 min STRIDE × MAESTRO interview
/security audit            # verify current controls vs identified threats
/security pre-deploy       # 10 automated + 3 manual checks

5 — remediation

/security clean path/to/project --dry-run   # preview
/security clean path/to/project             # auto + semi-auto + manual report
/security harden path/to/project --apply    # Grade A reference config

What this plugin does NOT cover

Area	Why	Alternative
Post-clone `CLAUDE.md` poisoning	Once a repo is cloned, `CLAUDE.md` loads into the system prompt before any hook runs. Platform limitation, no hook-based fix.	Always scan repos remotely before cloning with `/security scan <url> --deep`. For repos already cloned: review `CLAUDE.md` before opening
ML-based injection classification	Regex patterns cannot catch novel phrasings or adversarial paraphrasing. Joint paper (14 researchers, 2025) reports 95-100 % ASR against all 12 tested defenses for motivated adaptive attackers	Use parry-guard (DeBERTa v3 + Llama Prompt Guard 2) alongside this plugin. No conflict
Enterprise SSO / SCIM	Platform-level configuration	Anthropic Admin Console
RAG infrastructure	Vector DB / embedding pipeline security	Dedicated RAG security tools
LLM gateway / proxy	Network infrastructure layer	API gateway solutions
SIEM integration	Organization security stack	Splunk, Sentinel, etc. — but the JSONL audit trail is SIEM-ready
General agent scheming detection	The session guard catches the lethal trifecta as a known sequence; novel hidden-goal pursuit remains fundamentally hard for any tool.	Trifecta + delegation tracking provide partial coverage; full scheming detection requires monitoring + human oversight

These gaps are surfaced advisorily through /security threat-model and /security pre-deploy.

Complementary tools

Tool	What it adds
parry-guard	ML injection classification (DeBERTa v3 + Llama Prompt Guard 2 86M, Rust, fail-closed). Catches what regex misses
Lasso claude-hooks	Different philosophy: 96 patterns across 5 categories, warn-and-continue. Both can run in the same hook chain
Snyk agent-scan	Commercial skills/MCP scanning with a larger training set (3 984 skills analyzed)

Tip

Recommended combo: llm-security (breadth — static + supply chain + audit + posture + threat modeling) + parry-guard (depth — ML injection classification). Different layers, no conflict.

Project scope

This is a solo open-source project in stabilization mode as of 2026-05-01. The current feature set (5 frameworks, 23 scanners, 9 hooks, 6 agents, 20 commands, 22 knowledge files, 1822+ tests including a dedicated end-to-end suite) is the natural plateau for what a deterministic + advisory plugin can defend against without crossing into commercial-grade territory. Going forward, work focuses on:

Bug fixes and security patches
Compatibility with new Claude Code releases
Knowledge-base refresh (OWASP updates, new published research, new attack patterns)
Deprecation cleanup — v8.0.0 removes the LLM_SECURITY_* env vars and riskScoreV1 constant deprecated in v7.3.0
Opportunistic small additions that fit the existing deterministic architecture

The following are explicitly out of scope — fork the repo and own them under your organization's name. The MIT license permits this and the project is architected to be forkable. See CONTRIBUTING.md for the fork-and-own guide.

Out of scope	Why	Where to look instead
Web dashboard / fleet policy server	Multi-tenant UX + ongoing infra work	Snyk, Lakera Cloud
Runtime prompt firewall (real-time blocking proxy)	Inline gateway architecture	Lakera Guard, Protect AI Rebuff, parry-guard
IDE real-time LSP scanning	IDE integration + always-on perf budget	Snyk IDE, Semgrep IDE
Compliance PDF/DOCX evidence pack	Auditor-formatted reports as a product	Vanta, Drata, Secureframe
Enterprise ticketing / chat connectors (Jira, ServiceNow, Slack, Teams, PagerDuty)	Per-vendor SDK + auth + ongoing API drift	Splunk SOAR, Tines, custom integration
Multi-tenancy / centralized plugin runtime / fleet state	Hosted-product surface area	Build it on a fork
ML-based detectors requiring model hosting	Model-serving infra (training, eval, drift)	parry-guard (DeBERTa v3 + Llama Prompt Guard 2)
Marketplace UI / web catalog	Frontend product	This is not that kind of project
SSO / SCIM / RBAC	Platform-level enterprise concerns	Anthropic Admin Console + your IdP

If you need any of the above and your organization has the headcount to maintain it, fork freely. The maintainer encourages it. Issues and support flow back to the fork, not here.

Defense philosophy

Prompt injection is structurally unsolvable with current architectures (joint paper, 14 researchers, 2025: 95-100 % ASR against all 12 tested defenses by motivated red-teamers). v5.0+ does not claim to "prevent" injection. It implements defense-in-depth:

Broader detection — MEDIUM advisories for obfuscation signals (leetspeak, homoglyphs, zero-width chars, multi-language); Unicode Tag and PUA-A/B steganography; bash expansion evasion T1-T9; rot13 hidden imperatives in comments
Increased attack cost — Rule-of-Two trifecta detection (configurable block/warn/off, default warn), bash normalization before gate matching, MCP cumulative-drift baseline catching slow-burn rug-pulls
Longer monitoring windows — 100-call long-horizon alongside 20-call sliding window; slow-burn trifecta detection (legs >50 calls apart); Jensen-Shannon behavioral drift; sub-agent delegation tracking
Architectural constraints — opportunistic byte-fingerprint matching for output→input lineage (first 200 bytes, SHA-256/16-hex tag — not semantic capability tracking; trivially bypassed by mutation, but raises the cost of casual exfil)
Honest documentation — known limitations are surfaced, not hidden

System-card alignment (Opus 4.7): §5.2.1 documents that multi-layer defenses outperform single-layer against adaptive attacks; this plugin's posture matches. §6.3.1.1 documents that Opus 4.7 follows agent instructions more literally — stacked imperatives are less useful than tool-level enforcement, and agent files have been updated accordingly. Full mapping in docs/security-hardening-guide.md §5.

What v5.0+ cannot do: prevent adaptive attacks from motivated human red-teamers, fix CLAUDE.md loading before hooks (platform limitation), detect novel NL indirection without ML, prevent long-horizon attacks without detectable patterns, provide formal worst-case guarantees.

Compatibility

Claude Code: v2.x+
Platform: macOS, Linux, Windows (all hooks are Node.js .mjs)
Node.js: any recent LTS for hook scripts and CLI
Overlap with claude-code-essentials: safe to run both. This plugin extends with path guarding, MCP verification, and runtime trifecta detection. Duplicate blocking is harmless — hooks run sequentially

Playground (v7.6.0)

A single-file SPA at playground/llm-security-playground.html provides an interactive surface for onboarding, command discovery and report demos without requiring Claude Code installation. Open the file directly in a browser (Chrome/Firefox/Safari over file://) — no build step, no network calls, no npm install. Theme-bootstrap with FOUC-prevention; state persisted in IndexedDB primary + localStorage fallback.

v7.6.0 Tier 3-referanse-case: Playgroundet er nå en visuelt og strukturelt fullført referanse for shared/playground-design-system/ Tier 3-supplementet. 8 nye DS-komponenter integrert i de 18 rapport- rendererne: tfa-flow (lethal trifecta-kjede), mat-ladder (modenhets- stige), suppressed-group (narrative-audit), codepoint-reveal (Unicode- steganografi), top-risks (rangert top-funn), recommendation-card[data- severity] (severity-tinted advisory), risk-meter (band-visualisering 0-100), card--severity-{level} (severity-color findings-cards). Pluss badge--scope-security, verdict-pill-lg og form-progress+fp-step fra wave 1.

Layout:

playground/
├── llm-security-playground.html     ← single-file SPA (~10 700 lines)
├── vendor/
│   └── playground-design-system/    ← synket fra shared/, sjekksum-låst
├── test-fixtures/                   ← markdown-fixtures (én per kommando)
├── screenshots/v7.5.0/              ← Playwright-genererte demobilder (12)
├── screenshots/v7.6.0/              ← v7.6.0 demobilder (12, manuelt generert)
└── A11Y-RAPPORT.md                  ← WCAG 2.1 AA verifisering + Tier 3 ARIA

Hva playgroundet dekker:

Onboarding (5 grupper): organisasjon, scope, profil, plattform, compliance. Verdier persisteres som shared-state og prefylles automatisk i alle command-skjemaer.
Home: prosjekt-grid, fleet-tracks for posture/scan/red-team. «Last inn demo-data»-knappen aktiverer 3 prosjekter inkludert dft-komplett-demo med alle 18 rapporter ferdig parsed.
Catalog: alle 20 kommandoer gruppert i 5 kategorier. Søk filtrerer cards, og «Åpne skjema»-knapp bygger ferdig pipeline-streng for klipp-og- lim til terminalen.
Project surface: 4 skjermer (Oversikt / Rapporter / Kontekst / Eksport). Rapporter-tabben har category-tabs (discover / posture / findings-ops / hardening / adversarial / mcp-ops) og lim-inn-import for hver rapport-kommando.

Parser/renderer-arkitektur: Hver produces_report=true-kommando i CATALOG har en parser (markdown → struktur) og en renderer (struktur → DS-komponenter). 18 archetypes støttes: findings, findings-grade, risk-score-meter, posture-cards, dashboard-fleet, red-team-results, diff-report, kanban-buckets, matrix-risk. Parser-kontrakten er { ok: true, data: {...} } | { ok: false, errors: [...] }. Test-fixtures under playground/test-fixtures/ er kontrakt-anker — én markdown-fil per kommando som speiler templates/unified-report.md-formatet.

Eksponerte testing/automasjons-globaler: __store, __navigate, __loadDemoState, __scheduleRender, __PARSERS, __RENDERERS, __CATALOG, __inferVerdict, __inferKeyStats, __renderPageShell, __handlePasteImport. Aktiverer Playwright-styrt navigasjon og programmatisk parser/renderer-test mot fixture-katalogen.

Begrensninger: SPA er en lim-inn-overflate — den kjører ingen scannere selv. Output må komme fra Claude Code (/security scan ...), CLI (node scanners/...) eller stub-fixtures. Demo-state inneholder kun de 3 inline-prosjektene; nye prosjekter er per-bruker og lagres lokalt.

Self-scan

Running node scanners/scan-orchestrator.mjs . on this plugin produces 0 findings (ALLOW) with ~190 suppressions via .llm-security-ignore. Every suppression is explained — a security plugin that documents attack patterns, ships a malicious demo fixture, and tests against deliberately evil code will trigger its own scanners. The entropy scanner flags regex patterns in knowledge/secrets-patterns.md. The taint scanner flags eval(user_input) in test fixtures. The toxic flow analyzer flags the plugin's own commands that use Read+Bash. Remove the ignore file and re-run to see the unsuppressed picture.

The examples/malicious-skill-demo/ directory contains a deliberately malicious "Project Health Dashboard" plugin and a full security assessment. The combined LLM + deterministic pipeline produced 85 findings (24 critical, 24 high, 20 medium, 6 low, 11 info) and verdict BLOCK 100/100 — both layers independently maxed the risk score. A human reviewing the plugin's README.md and SKILL.md would likely miss most of them; the Unicode Tag steganography is literally invisible.

node scanners/scan-orchestrator.mjs examples/malicious-skill-demo/evil-project-health/   # ~5s
/security scan examples/malicious-skill-demo/evil-project-health/ --deep                  # full pipeline

Other runnable examples

The examples/ directory contains additional self-contained demonstrations — each with README.md, fixture, run script, and expected-findings.md:

prompt-injection-showcase/ — 61 payloads across 19 categories fed to pre-prompt-inject-scan, post-mcp-verify, and pre-bash-destructive. Run: node examples/prompt-injection-showcase/run-showcase.mjs
lethal-trifecta-walkthrough/ — 5-step Rule-of-Two demonstration (WebFetch → Read .env → Bash curl POST + suppression follow-ups) showing post-session-guard advisory firing on leg 3. State-isolated via run-script PID. Run: node examples/lethal-trifecta-walkthrough/run-trifecta.mjs
mcp-rug-pull/ — 8-stage MCP description drift, each step under the 10% per-update threshold but cumulatively >25% from baseline. Demonstrates the v7.3.0 cumulative-drift advisory (E14, OWASP MCP05). Cache isolated via LLM_SECURITY_MCP_CACHE_FILE. Run: node examples/mcp-rug-pull/run-rug-pull.mjs
supply-chain-attack/ — two-layer demonstration: PreToolUse hook blocks compromised event-stream@3.3.6 and advises on scope-hopping @evilcorp/lodash; offline dep-auditor flags 5 typosquats + a postinstall: curl ... | sh vector in the fixture package.json. Run: node examples/supply-chain-attack/run-supply-chain.mjs
poisoned-claude-md/ — 6 memory-poisoning detectors fire on a fixture CLAUDE.md + agent file (E15 surface). Demonstrates injection, shell-command, suspicious-URL, credential-path, permission-expansion, and base64-encoded-payload detection. Run: node examples/poisoned-claude-md/run-memory-poisoning.mjs
bash-evasion-gallery/ — one disguised variant per T-tag (T1-T9) fed through pre-bash-destructive, verified BLOCK after bash-normalize strips the evasion. T8 has its own BLOCK_RULE. Run: node examples/bash-evasion-gallery/run-evasion-gallery.mjs
toxic-agent-demo/ — single-component lethal trifecta detected by the toxic-flow-analyzer (TFA). A fixture agent with tools: [Bash, Read, WebFetch] covers all three trifecta legs (untrusted input + sensitive data access + exfil sink), and the fixture deliberately ships no hooks/hooks.json so TFA emits a CRITICAL Lethal trifecta: finding without mitigation downgrade. Uses plugin.fixture.json as the plugin marker so the example doesn't trip pre-write-pathguard on .claude-plugin/. Maps to ASI01 / ASI02 / ASI05 / LLM01 / LLM02 / LLM06. Run: node examples/toxic-agent-demo/run-toxic-flow.mjs
pre-compact-poisoning/ — pre-compact-scan PreCompact hook detecting both an injection pattern and a credential-shaped string in a synthetic transcript across all three modes (off / warn / block). The transcript is generated at runtime in a per-invocation tempdir; the AWS-shaped key uses the same 'AK' + 'IA' + ... fragmentation idiom as tests/e2e/attack-chain.test.mjs, so the source contains no literal credentials. Includes a benign-transcript control case in block mode to prove the gate is not a brick wall. Maps to LLM01 / LLM02 / ASI01 / AT-1 / AT-3. Run: node examples/pre-compact-poisoning/run-pre-compact-poisoning.mjs

Recent versions

Version	Date	Highlights
7.6.1	2026-05-06	Playground v7.6.0 visuell-patch. Seks bugs fanget under maintainer-verifisering i nettleser. Alle skyldtes mismatch mellom DS-klasser og rendrer-bruk (eller manglende DS-implementasjoner playground antok eksisterte). (1) `renderFindingsBlock` brukte `.findings` outer som er DS' 2-kolonners list+detail-grid → erstattet med `<section class="report-meta">` + korrekt `findings__list > findings__group`-mønster. (2) `.report-table` manglet helt i DS men brukes i 7+ rendrere → lokal CSS-implementasjon i playground-HTML. (3) `renderPreDeploy` traffic-lights brukte `.sm-card__grade` (28×28 px for én A-F-bokstav) for "PASS"/"PASS-WITH-NOTES"/"FAIL" → erstattet med bredde-tilpasset status-pill. (4) Threat-model matrix-bobler ikke klikkbare → `<button>` med `data-threat-id` + click-handler som scroller til Trusler-tabellen. (5) Radar-labels overlappet ved 6+ akser → SVG 280→380, R 105→125, dynamisk `text-anchor` (start/end/middle) basert på horisontal-posisjon. (6) `recommendation-card__body` overflow på lange tekster → `overflow-wrap: anywhere`. 4/4 fix-spesifikke smoke-tester + 18/18 renderer-regresjon passerer. Ingen scanner- eller hook-atferdsendringer — purely additive surface.
7.6.0	2026-05-06	Playground Tier 3-referanse-case. Playground (`playground/llm-security-playground.html`) hevet til visuelt og strukturelt fullført referanse for `shared/playground-design-system/` Tier 3-supplementet. 8 nye DS-komponenter integrert i de 18 rapport-rendererne: `tfa-flow` + `tfa-leg` + `tfa-arrow` (lethal trifecta-kjede med `<button>`-elementer + ARIA), `mat-ladder` + `mat-step` (5-trinns modenhets-stige med terskler 0/25/50/75/95% PASS), `suppressed-group` (narrative-audit fra `summary.narrative_audit.suppressed_findings`), `codepoint-reveal` + `cp-tag`/`cp-zw`/`cp-bidi` (Unicode-steganografi side-ved-side), `top-risks` + `top-risk[data-severity]` (rangert top-funn-listing, semantisk `<ol>`), `recommendation-card[data-severity]` (severity-tinted advisory på `clean`/`harden`/`audit`/`posture`/`pre-deploy`/`plugin-audit`), `risk-meter` (band-visualisering 0-100 på 5 archetypes), `card--severity-{level}` (severity-color modifier på findings-cards). Wave 1: `badge--scope-security` (identitets-chip), `verdict-pill-lg` (DS Tier 3-pill), `form-progress` + `fp-step` (onboarding-wizard). Slettet ~30 duplikat-CSS-deklarasjoner (DS vinner cascade). 5 nye DS-helpers + `mapSeverityToCardLevel` + `parseNarrativeAudit`. Filendring 10209 → 10677 linjer. Levert over 5 sesjoner, atomic commits. A11Y-rapport oppdatert. Ingen scanner- eller hook-behavior-changes — purely additive surface.
7.5.0	2026-05-05	Playground. Single-file SPA at `playground/llm-security-playground.html` (~10 200 lines) for onboarding, demoer og workshop-bruk uten Claude Code-installasjon. Parsere + renderere for alle 18 `produces_report=true`-kommandoer (Fase 2: 10 høy-prio + Fase 3: 8 gjenstående: mcp-inspect, supply-check, pre-deploy, diff, watch, registry, clean, threat-model). 18 markdown test-fixtures under `playground/test-fixtures/` som kontrakt-anker. Komplett demo-prosjekt `dft-komplett-demo` har alle 18 rapporter ferdig parsed inline. Vendor-synket design-system under `playground/vendor/` (sjekksum-låst). 9 Playwright-genererte screenshots i `playground/screenshots/v7.5.0/`. 11 nye `window`-globaler for testing/automasjon. 2 nye `KEY_STATS_CONFIG`-archetypes (`kanban-buckets`, `matrix-risk`). Bug-fix: `normalizeVerdictText` regex-rekkefølge oppdatert så GO-WITH-CONDITIONS / CONDITIONAL / BETINGET ikke lenger kollapser til ALLOW. Ingen scanner- eller hook-behavior-changes — purely additive surface.
7.4.0	2026-05-05	Examples + e2e suite. Seven runnable demonstration walkthroughs under `examples/` (`prompt-injection-showcase`, `lethal-trifecta-walkthrough`, `mcp-rug-pull`, `supply-chain-attack`, `poisoned-claude-md`, `bash-evasion-gallery`, `toxic-agent-demo`, `pre-compact-poisoning`) — each with `README.md`, runtime-isolated fixture, single-command run-script, and `expected-findings.md` testable contract. Three new `tests/e2e/` suites (attack-chain 17 tests + multi-session 9 tests + scan-pipeline 19 tests = +45 tests, total 1822) prove the framework works as a coordinated system, not just isolated units. No scanner or hook behavior changes — purely additive surface. Scanner `VERSION` constants synced across `dashboard-aggregator.mjs`, `posture-scanner.mjs`, `ide-extension-scanner.mjs`.
7.3.1	2026-05-01	Stabilization patch. Project repositioned as solo, stabilization-only, with explicit "fork & own" stance for enterprise features. New public docs: `CONTRIBUTING.md` (fork-and-own model), README "Project scope" section (out-of-scope table with commercial alternatives), updated `SECURITY.md` (v7.3.x supported, v7.0–v7.2 best-effort, < v7.0 EOL). Coherence: `package.json` files whitelist + `bugs` URL + repo URL fix; scanner `VERSION` constants synced across `dashboard-aggregator.mjs`, `posture-scanner.mjs`, `ide-extension-scanner.mjs`. Test ceiling raised on flaky pre-compact-scan timing test (500 ms → 1000 ms; design target unchanged). No behavior changes.
7.3.0	2026-05-01	Batch C release. Wave A (T7-T9 bash normalization + rot13 comment-block decoder), Wave B (`.gitattributes` post-clone advisory + npm scope-hop typosquat + GitHub/Forgejo workflow-scanner with 23-field blacklist + re-interpolation tracking + auth-bypass detection), Wave C (MCP cumulative-drift baseline + `/security mcp-baseline-reset`), Wave D (riskScoreV1 `@deprecated`; sandbox-architecture rationale docs; env-var deprecation runway to v8.0.0; CLAUDE.md hooks count + consistency test). 1665+ → 1777 tests. Wave E (additional attack-simulator scenarios) deferred indefinitely
7.2.0	2026-04-29	Batch B release. Critical-review B-tier scanner defects + v7.2.0 evasion-arsenal (PUA-A/B Unicode coverage, NFKC homoglyph fold, escalation-after-input window, markdown link-title + SVG `<desc>`/`<foreignObject>` + HTML comment extractors). Two-stage entropy context classification. v1→v2 risk-formula constants unified across docs. 8 new red-team scenarios (64 → 72). 1522 → 1665 tests
7.1.0	2026-04-29	Critical-review patch. Pathguard regex hole closed (`.env.production.local.backup`-class). Distributed-trifecta block-mode AND-gate removed. CaMeL claim toned down to honest "byte-fingerprint matching". Documentation honesty-sweep across 7 overclaim sites. 1487 → 1511 tests

Full history in CHANGELOG.md.

License & attribution

MIT. See LICENSE.

Built on published research from OWASP, ToxicSkills (Xi'an Jiaotong, 2025), ClawHavoc (Repello AI, 2025), MCPTox, Pillar Security, Invariant Labs, GHSL Security Lab, Operant AI, and Google DeepMind's AI Agent Traps taxonomy. Threat patterns and case studies in knowledge/ are cited inline.

Feedback & contributing

Bug reports + feature requests: open an issue on Forgejo
Pull requests: not accepted on this repo (solo project, dialog-driven development with Claude Code). For larger changes, see CONTRIBUTING.md and the fork-and-own model
Security disclosures: see SECURITY.md — please email, do not open a public issue
Project scope: see "Project scope" section above for what is and isn't on the roadmap, and what to fork for instead

README.md Unescape Escape