History

Kjell Tore Guttormsen 5afb9b1f33 feat(llm-security): implement parseIntelliJPlugin with nested-jar extraction		2026-04-18 10:15:12 +02:00
..
.claude-plugin	feat(llm-security): OS sandbox for /security ide-scan <url> (v6.5.0)	2026-04-17 17:28:57 +02:00
agents	refactor(agents): add Step 0 + parallel-read hint to mcp-scanner-agent	2026-04-17 14:46:30 +02:00
bin	feat(llm-security): /security ide-scan <url> — Marketplace/OpenVSX/direct VSIX (v6.4.0)	2026-04-17 17:16:26 +02:00
ci	feat(ci): add CI/CD integration — --fail-on, --compact, pipeline templates	2026-04-10 14:59:05 +02:00
commands	feat(llm-security): /security ide-scan <url> — Marketplace/OpenVSX/direct VSIX (v6.4.0)	2026-04-17 17:16:26 +02:00
docs	docs(llm-security): research brief + URL-support plan for ide-scan	2026-04-17 18:47:01 +02:00
examples	feat: initial open marketplace with llm-security, config-audit, ultraplan-local	2026-04-06 18:47:49 +02:00
hooks	feat(hooks): register PreCompact event in hooks.json	2026-04-17 14:45:13 +02:00
knowledge	docs(llm-security): add knowledge/jetbrains-marketplace-api-notes.md	2026-04-18 10:02:04 +02:00
reports	feat(ms-ai-architect): add plugin to open marketplace (v1.5.0 baseline)	2026-04-07 17:17:17 +02:00
scanners	feat(llm-security): implement parseIntelliJPlugin with nested-jar extraction	2026-04-18 10:15:12 +02:00
scripts	feat: initial open marketplace with llm-security, config-audit, ultraplan-local	2026-04-06 18:47:49 +02:00
templates	feat: initial open marketplace with llm-security, config-audit, ultraplan-local	2026-04-06 18:47:49 +02:00
test-fixtures/trifecta-plugin	feat: initial open marketplace with llm-security, config-audit, ultraplan-local	2026-04-06 18:47:49 +02:00
tests	feat(llm-security): implement parseIntelliJPlugin with nested-jar extraction	2026-04-18 10:15:12 +02:00
--json	feat: initial open marketplace with llm-security, config-audit, ultraplan-local	2026-04-06 18:47:49 +02:00
.editorconfig	feat: initial open marketplace with llm-security, config-audit, ultraplan-local	2026-04-06 18:47:49 +02:00
.gitignore	feat: initial open marketplace with llm-security, config-audit, ultraplan-local	2026-04-06 18:47:49 +02:00
.llm-security-ignore	feat: initial open marketplace with llm-security, config-audit, ultraplan-local	2026-04-06 18:47:49 +02:00
.npmignore	feat(ci): add CI/CD integration — --fail-on, --compact, pipeline templates	2026-04-10 14:59:05 +02:00
.orphaned_at	feat: initial open marketplace with llm-security, config-audit, ultraplan-local	2026-04-06 18:47:49 +02:00
CHANGELOG.md	feat(llm-security): OS sandbox for /security ide-scan <url> (v6.5.0)	2026-04-17 17:28:57 +02:00
CLAUDE.md	feat(llm-security): OS sandbox for /security ide-scan <url> (v6.5.0)	2026-04-17 17:28:57 +02:00
LICENSE	feat: initial open marketplace with llm-security, config-audit, ultraplan-local	2026-04-06 18:47:49 +02:00
package.json	feat(llm-security): OS sandbox for /security ide-scan <url> (v6.5.0)	2026-04-17 17:28:57 +02:00
README.md	feat(llm-security): OS sandbox for /security ide-scan <url> (v6.5.0)	2026-04-17 17:28:57 +02:00
SECURITY.md	chore: fix metadata gaps and add root CLAUDE.md	2026-04-08 13:10:22 +02:00
V3-ANNOUNCEMENT.md	feat: initial open marketplace with llm-security, config-audit, ultraplan-local	2026-04-06 18:47:49 +02:00
V3-UPGRADE.md	feat: initial open marketplace with llm-security, config-audit, ultraplan-local	2026-04-06 18:47:49 +02:00

README.md

LLM Security Plugin for Claude Code

Automated defense and advisory analysis for the agentic AI attack surface.

Built for my own Claude Code workflow and shared openly for anyone who finds it useful. This is a solo project — bug reports and feature requests are welcome, but pull requests are not accepted.

A Claude Code plugin that provides security scanning, auditing, and threat modeling for agentic AI projects. Built on OWASP LLM Top 10 (2025), OWASP Agentic AI Top 10, and the AI Agent Traps taxonomy (Google DeepMind, 2025), with threat intelligence from ToxicSkills, ClawHavoc, MCPTox, Pillar Security, Invariant Labs, and Operant AI research.

What Is This?
The Extension Security Problem
Quick Start
Commands
Agent Architecture
Deterministic Scanners
Automated Hooks
Knowledge Base
OWASP Coverage
Workflow Examples
Security Assessment Demo
Architecture
What This Plugin Does Not Cover
Compatibility
Version History
Feedback & Requests
Contributing
License & Attribution

What Is This?

Claude Code plugins, MCP servers, and agentic workflows introduce attack surfaces that traditional security tools don't cover: prompt injection, tool poisoning, secret exfiltration through tool outputs, supply chain attacks via malicious skills, and excessive agency.

This plugin provides three layers of protection:

Automated enforcement — 9 hooks that block dangerous operations in real time (prompt injection in user input, secrets in code, writes to sensitive paths, destructive shell commands, supply chain guardrails, suspicious tool output, runtime trifecta detection, transcript scanning before context compaction, update notifications)
Deterministic scanning — 22 Node.js scanners (10 orchestrated + 12 standalone) that perform byte-level analysis LLMs cannot: Shannon entropy, Unicode codepoints, Levenshtein distance for typosquatting, source-to-sink taint flow, DNS resolution, git history forensics, toxic flow analysis, memory poisoning, live MCP inspection, AI-BOM generation, attack simulation, IDE extension prescan
Advisory analysis — 19 commands that scan, audit, and model threats with structured reports, letter grades, and actionable remediation plans

Key capabilities:

Supply chain gate — scan any plugin, MCP server, or agent file before installation with ALLOW/WARNING/BLOCK verdicts
Full project audit — evaluate 16 security categories with A-F grading and prioritized action items
Plugin trust assessment — dedicated plugin audit with Install/Review/Do Not Install verdict
MCP server audit — focused analysis of all installed MCP configurations with trust scoring
Threat modeling — interactive STRIDE × MAESTRO 7-layer session with risk matrix
Pre-deployment checklist — 10 automated + 3 manual checks before going to production
Automated remediation — scan-and-fix pipeline with 3-tier approach (auto/semi-auto/manual)
Continuous monitoring — recurring diff scanning via /security watch (uses built-in /loop) or system cron via watch-cron.mjs
Quick posture check — 30-second scorecard showing your security baseline (16 categories)

Tip

Start with /security posture for a 30-second baseline, then /security audit for the full picture.

The Extension Security Problem

Claude Code's extensibility model — skills, MCP servers, plugins, hooks — creates an attack surface that mirrors the npm/PyPI supply chain problem, but with a critical difference: extensions run with LLM agency. A malicious plugin doesn't just execute code in a sandbox; it can instruct an AI agent to read your SSH keys, exfiltrate environment variables, install persistence mechanisms, and modify its own configuration — all while appearing to be a helpful "Project Health Dashboard."

This is not theoretical. The ToxicSkills research (Xi'an Jiaotong, 2025) and ClawHavoc campaign (Repello AI, 2025) documented real attack patterns against agentic AI systems. The OWASP LLM Top 10 and OWASP Agentic AI Top 10 now formally categorize these threats.

We built a proof-of-concept — a single plugin called "Project Health Dashboard" that looks legitimate but embeds attacks across every threat category. When scanned with this plugin's combined LLM + deterministic analysis, it produced 85 findings: prompt injection via HTML comments, environment exfiltration via base64-encoded payloads, Unicode steganography invisible to human review, 6 typosquatting packages, 6 source-to-sink taint flows, persistence via crontab and LaunchAgents, and more. Verdict: BLOCK 100/100.

A human reviewing the plugin's README and SKILL.md would likely miss most of these. The Unicode Tag steganography is literally invisible. The base64 payload looks like a configuration block. The typosquatting packages are one character off from the real ones.

What organizations need:

A pre-installation scan gate — automated analysis before any extension is installed (this plugin provides /security scan and /security plugin-audit)
A trusted, curated marketplace — vetted extensions with security review as a prerequisite for listing
Deterministic scanning — byte-level analysis for things LLMs cannot detect: Unicode codepoints, Shannon entropy, Levenshtein distance, source-to-sink taint flows
Automated hooks — always-on primary defense blocking secrets in code, writes to sensitive paths, and destructive commands in real time

Important

Always scan repos remotely before cloning them. A poisoned CLAUDE.md injects instructions into the model context the moment you open a cloned repo — before any hooks can intervene. /security scan https://repo-url --deep analyzes everything safely via pre-extraction, without loading anything into your session. This is the primary defense against CLAUDE.md poisoning.

Quick Start

Prerequisites

Claude Code installed
Node.js (for automated hooks — .mjs scripts)

Important

If you use Opus with extended context (1M tokens): Subagents inherit the parent session's context limit but do not support extended context, causing API errors ("limit reached" or "extra usage required"). Fix: run /model Opus in your session before using any security commands. This resets the session to standard 200K context, which subagents handle correctly.

Installation

Add the marketplace and browse plugins with /plugin:

claude plugin marketplace add https://git.fromaitochitta.com/open/ktg-plugin-marketplace.git

Or enable directly in ~/.claude/settings.json:

{
  "enabledPlugins": {
    "llm-security@ktg-plugin-marketplace": true
  }
}

Note

Hooks activate immediately on installation. Secret detection, path guarding, and destructive command blocking start working without any commands.

First Scan

> /security posture

┌──────────────────────────────────────────────┐
│  Security Posture: 8/16  [B]  77%            │
│  ████████████████░░░░░░░░░░                  │
├──────────────────────────────────────────────┤
│  ✅ Deny-First Config                        │
│  ✅ Secrets Protection                        │
│  ✅ Path Guarding                             │
│  ⚠️  MCP Server Trust                         │
│  ✅ Destructive Command Blocking              │
│  ⚠️  Sandbox Config                           │
│  ⚠️  Human Review                             │
│  ✅ Skill/Plugin Sources                      │
│  ⚠️  Session Isolation                        │
│  ✅ Cognitive State Security                  │
│  ✅ Prompt Injection Hardening                │
│  ⚠️  Rule of Two                              │
│  ⚠️  Long-Horizon Monitoring                  │
│  ✅ EU AI Act                                 │
│  ⚠️  NIST AI RMF                              │
│  —  ISO 42001                                │
├──────────────────────────────────────────────┤
│  6 findings (1 high, 3 medium, 2 low)        │
└──────────────────────────────────────────────┘

Commands

Scanning & Assessment

Command	Description
`/security`	Overview of all commands and quick start guide
`/security scan [path\|url]`	Scan skills, MCP servers, directories, or GitHub repos for security issues
`/security scan [path\|url] --deep`	Enhanced scan: LLM agents + 10 deterministic scanners
`/security deep-scan [path]`	Run 10 deterministic Node.js scanners directly (entropy, unicode, taint, deps, git, permissions, network, memory poisoning, supply chain recheck, toxic flow). Supports `--fail-on <severity>`, `--compact`, `--format sarif`, `--output-file <path>`
`/security audit`	Full project security audit with A-F grading and remediation plan
`/security plugin-audit [path\|url]`	Dedicated plugin security audit with Install/Review/Do Not Install verdict (local or GitHub URL)
`/security mcp-audit [--live]`	Focused audit of all installed MCP server configurations (add `--live` for runtime inspection)
`/security mcp-inspect`	Connect to running MCP stdio servers and scan live tool descriptions
`/security ide-scan [target\|url]`	Scan installed VS Code (+ Cursor, Windsurf, VSCodium, code-server) / JetBrains extensions — OR fetch a remote VSIX from VS Code Marketplace, OpenVSX, or direct `.vsix` URL (v6.4.0). Typosquat, theme-with-code, sideload, broad activation, uninstall hooks, plus UNI/ENT/NET/TNT/MEM/SCR per extension. Offline by default
`/security posture`	Quick security posture scorecard (16 categories incl. compliance)
`/security diff [path]`	Compare scan against stored baseline — shows new/resolved/unchanged/moved findings
`/security watch [path] [--interval 6h]`	Continuous monitoring — runs diff on a recurring interval via /loop
`/security registry [scan\|search]`	Skill signature registry — view stats, scan+register skills, search known fingerprints

Remediation

Command	Description
`/security clean [path]`	Scan and remediate findings — auto-fix, semi-auto confirm, manual report
`/security clean [path] --dry-run`	Preview what would be fixed without modifying files
`/security harden [path]`	Generate Grade A security config — settings.json, CLAUDE.md, .gitignore
`/security harden [path] --apply`	Apply generated config with automatic backup

Threat Modeling & Planning

Command	Description
`/security threat-model`	Interactive STRIDE/MAESTRO threat modeling session (15-30 min)
`/security red-team [--category] [--adaptive]`	Attack simulation — 64 scenarios across 12 categories test hook defenses. `--adaptive` for mutation-based evasion testing
`/security pre-deploy`	Pre-deployment security checklist (10 automated + 3 manual checks)

Scan

/security scan is a supply chain gate. Point it at any local path or GitHub URL before installation. It spawns specialized agents sequentially to analyze:

Skills/agents: 7 threat categories (injection, exfiltration, privilege escalation, scope creep, hidden instructions, toolchain manipulation, persistence)
MCP servers: 5-phase analysis (tool descriptions, source code, dependencies, configuration, rug pull detection)

Remote repo support (v2.4+): Pass a GitHub URL directly — the plugin clones to a temp directory, scans, and cleans up. Use --branch <name> for non-default branches:

/security scan https://github.com/org/repo --branch dev --deep

Injection-safe remote scanning (v2.5+): Remote scans pre-extract structured evidence via content-extractor.mjs and strip injection patterns BEFORE LLM agents see the content. Agents analyze a JSON evidence package, never raw files from untrusted repos. [INJECTION-PATTERN-STRIPPED] markers are confirmed findings.

Sandboxed cloning (v5.1+): git clone can execute arbitrary code via .gitattributes filter/smudge drivers (CVE-2024-32002 and related). Remote clones are now hardened with defense-in-depth:

Layer 1 — Git config hardening (all platforms): 8 config flags neutralize known attack vectors:

Flag	Mitigates
`core.hooksPath=/dev/null`	Git hooks executed during clone/checkout
`core.symlinks=false`	Symlink traversal out of temp directory
`core.fsmonitor=false`	Arbitrary command execution via fsmonitor
`filter.lfs.{process,smudge,clean}=`	Filter/smudge driver code execution (`.gitattributes`)
`protocol.file.allow=never`	Local file protocol traversal
`transfer.fsckObjects=true`	Malformed git objects

Environment variables (GIT_CONFIG_NOSYSTEM=1, GIT_CONFIG_GLOBAL=/dev/null, GIT_ATTR_NOSYSTEM=1, GIT_TERMINAL_PROMPT=0) isolate from system/user git config and block interactive prompts.

Layer 2 — OS-level filesystem sandbox (platform-dependent):

Platform	Sandbox	How it works	Limitations
macOS	`sandbox-exec`	Restricts file writes to only the clone temp dir via Seatbelt profiles	Deprecated by Apple but still functional (no replacement exists)
Linux	`bubblewrap` (bwrap)	Read-only root bind mount + writable clone dir + namespace isolation	Requires `bwrap` package. Fails on Ubuntu 24.04+ without admin AppArmor config. Works on Fedora/Arch
Windows	None available	Git config hardening only (Layer 1)	See Windows guidance below

The plugin probe-tests sandbox availability at runtime and falls back gracefully. When no OS sandbox is available, a WARN is logged and cloning proceeds with git config hardening only.

Additional protections: Post-clone size check (100MB max), UUID-unique evidence filenames (prevents race conditions), cleanup guarantee (temp files removed even on error).

Windows guidance: Windows has no CLI-level filesystem sandbox equivalent to sandbox-exec or bwrap. The alternatives either require additional software or admin privileges:

Option	Isolation level	Requirements
Windows Sandbox	Full VM (Hyper-V)	Windows Pro/Enterprise, Hyper-V enabled. GUI-oriented, not scriptable
Docker Desktop	Container	Requires Docker install. Best option for automated isolation
WSL2	Linux VM	Requires WSL2 install. Once inside, `bwrap` is available (except Ubuntu 24.04+ caveat)
AppContainer	Process sandbox	Requires native C++ helper binary — not practical to ship in a Node.js plugin

Recommendation for Windows users: Run Claude Code inside WSL2 or Docker Desktop for full sandbox coverage. The git config hardening (Layer 1) provides baseline protection on all platforms and neutralizes all known .gitattributes attack vectors even without an OS sandbox.

Why not Node.js --permission? Node's permission model restricts fs module access within the Node process, but does not sandbox child processes like git which run as separate OS processes. It is therefore not useful for this use case.

Output: structured report with ALLOW / WARNING / BLOCK verdict, risk score (0-100), and findings sorted by severity.

Audit

/security audit is a comprehensive project review. It spawns up to 3 agents to evaluate 9 security categories:

Secret management
Permission model
Input validation
Output handling
Supply chain
Data protection
Logging and monitoring
Network security
Agent autonomy controls

Output: A-F letter grade, risk matrix, and prioritized action items.

Plugin Audit

/security plugin-audit [path|url] is a dedicated trust assessment for Claude Code plugins. Point it at any local plugin directory or GitHub URL to get a comprehensive evaluation before installation. It analyzes:

Manifest metadata — name, version, author, auto_discover settings
Component inventory — commands, agents, hooks, skills with tool grants
Permission matrix — aggregated tool access across all components, flagging Bash, Write+Bash, and Task access
Hook safety — classifies hook behavior (block/warn/advisory), flags state-modifying or network-calling hooks
Content scan — spawns skill-scanner-agent for 7 threat categories

Output: structured report with Install / Review / Do Not Install trust verdict.

Clean

/security clean is a scan-and-remediate pipeline. It runs the full deterministic scanner suite, classifies each finding into one of three tiers, and acts accordingly:

Auto — Deterministic, safe fixes applied without confirmation (e.g., removing zero-width characters, BIDI overrides, Unicode Tag steganography, upgrading haiku models)
Semi-auto — Fixes generated by an LLM agent, presented for user confirmation before applying (e.g., homoglyph replacement, permission adjustments, dependency fixes)
Manual — Findings that require human judgment, included in the report but not auto-fixed (e.g., taint flow refactoring, architecture changes)

The remediation engine (auto-cleaner.mjs) performs 16 fix operations as pure functions (content → content) with atomic writes and post-fix validation. Use --dry-run to preview all proposed changes without modifying any files.

Threat Model

/security threat-model runs a guided 15-30 minute interview session that maps your system through two frameworks:

STRIDE — Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, Elevation of Privilege
MAESTRO 7-layer model — Foundation Models, Data/Knowledge, Agent Frameworks, Tool Integration, Agent Capabilities, Multi-Agent Systems, Ecosystem

Output: complete threat model document with prioritized threats, risk scores, and mitigation status.

Agent Architecture

The plugin delegates specialized work to 6 purpose-built agents. Each agent has focused threat detection capabilities and its own knowledge base routing.

Agent	Role	Model	Spawned By	Tools
`skill-scanner-agent`	7 threat categories (injection, exfiltration, escalation, scope creep, hidden instructions, toolchain manipulation, persistence)	Opus	`/security scan`, `/security audit`, `/security plugin-audit`	Read, Glob, Grep
`mcp-scanner-agent`	5-phase MCP analysis (tool descriptions, source code, dependencies, config, rug pull detection)	Opus	`/security scan`, `/security mcp-audit`	Read, Glob, Grep, Bash
`posture-assessor-agent`	16-category assessment with PASS/PARTIAL/FAIL scoring and A-F grading	Opus	`/security audit`, `/security posture`	Read, Glob, Grep
`threat-modeler-agent`	Interactive STRIDE × MAESTRO 7-layer interview with 5-phase workflow	Opus	`/security threat-model`	Read, Glob, Grep, AskUserQuestion
`deep-scan-synthesizer-agent`	Interprets deterministic scanner JSON into human-readable report with executive summary and prioritized recommendations	Opus	`/security deep-scan`, `/security scan --deep`	Read, Glob, Grep
`cleaner-agent`	Generates semi-auto remediation proposals for findings requiring human judgment (read-only, returns JSON proposals)	Opus	`/security clean`	Read, Glob, Grep

Scan Pipelines

For commands like /security audit, the plugin orchestrates multiple agents in parallel:

                  ┌──────────────┐
                  │  /security   │
                  │    audit     │
                  └──────┬───────┘
                         │
            ┌────────────┼────────────┐
            ▼            ▼            ▼
   ┌─────────────┐ ┌───────────┐ ┌──────────┐
   │    Skill    │ │    MCP    │ │ Posture  │
   │   Scanner   │ │  Scanner  │ │ Assessor │
   └──────┬──────┘ └─────┬─────┘ └────┬─────┘
          │              │             │
          └──────────────┼─────────────┘
                         ▼
                ┌────────────────┐
                │  Audit Report  │
                │  (A-F grade)   │
                └────────────────┘

For deep scans (/security scan --deep or /security deep-scan), deterministic scanners run in parallel followed by synthesis:

                  ┌──────────────┐
                  │  /security   │
                  │  scan --deep │
                  └──────┬───────┘
                         │
         ┌───────────────┼───────────────┐
         ▼               ▼               ▼
   ┌───────────┐  ┌────────────┐  ┌────────────┐
   │ LLM Skill │  │   10 Det.  │  │    MCP     │
   │  Scanner  │  │  Scanners  │  │  Scanner   │
   └─────┬─────┘  └──────┬─────┘  └──────┬─────┘
         │        UNI ENT PRM     │
         │        DEP TNT GIT     │
         │        NET MEM SCR TFA │
         │               │               │
         │        ┌──────┴─────┐         │
         │        │ Synthesizer│         │
         │        │   Agent    │         │
         │        └──────┬─────┘         │
         └───────────────┼───────────────┘
                         ▼
                ┌────────────────┐
                │ Combined Report│
                │ (BLOCK/WARN/OK)│
                └────────────────┘

Deterministic Scanners

10 orchestrated + 12 standalone Node.js scanner scripts that perform byte-level analysis an LLM cannot. Zero external dependencies. Orchestrated scanners run via node scanners/scan-orchestrator.mjs <target> or through /security deep-scan. Supports --fail-on <severity>, --compact, --format sarif, --output-file <path>.

Orchestrated (10)

Scanner	Prefix	Detects	OWASP
`unicode-scanner.mjs`	UNI	Zero-width chars, Unicode Tag steganography, BIDI overrides, Cyrillic homoglyphs	LLM01
`entropy-scanner.mjs`	ENT	High-entropy strings, base64/hex blobs, encoded payloads via Shannon entropy	LLM01, LLM03
`permission-mapper.mjs`	PRM	Purpose-vs-tools mismatch, ghost hooks, haiku on sensitive agents, overprivileged components	LLM06
`dep-auditor.mjs`	DEP	CVEs (npm/pip audit), typosquatting (Levenshtein distance), malicious install scripts, unpinned versions	LLM03
`taint-tracer.mjs`	TNT	Source-to-sink data flow (process.env/req.body to eval/exec/fetch/writeFile), 3-pass analysis	LLM01, LLM02
`git-forensics.mjs`	GIT	Force pushes, description drift, hook modifications, new outbound URLs, author changes	LLM03
`network-mapper.mjs`	NET	Undisclosed URLs, suspicious domains (ngrok, webhook.site), IP-based URLs, DNS analysis	LLM02, LLM03
`memory-poisoning-scanner.mjs`	MEM	Injection patterns, shell commands, credential paths, permission expansion, suspicious URLs, encoded payloads in CLAUDE.md/memory/rules files	LLM01, ASI02
`supply-chain-recheck.mjs`	SCR	Re-audit installed deps from lockfiles against blocklists, OSV.dev batch API, typosquat detection	LLM03
`toxic-flow-analyzer.mjs`	TFA	Lethal trifecta detection: untrusted input + sensitive data access + exfiltration sink. Cross-component correlation (runs last)	ASI01, ASI02, ASI05

Standalone (12)

Scanner	Prefix	Purpose
`scan-orchestrator.mjs`	—	Entry point: runs all 10 orchestrated scanners, outputs JSON
`posture-scanner.mjs`	PST	Deterministic posture assessment, 16 categories (incl. EU AI Act, NIST AI RMF, ISO 42001), <50ms
`mcp-live-inspect.mjs`	MCI	Live MCP server inspection via JSON-RPC 2.0 (tool injection, shadowing, URL/IP)
`ide-extension-scanner.mjs`	IDE	VS Code (+ Cursor, Windsurf, VSCodium, code-server) / JetBrains extension prescan: blocklist, theme-with-code, sideload, broad activation, typosquat, extension-pack expansion, dangerous uninstall hooks — then UNI/ENT/NET/TNT/MEM/SCR per extension
`attack-simulator.mjs`	—	Red-team harness: 64 scenarios, 12 categories, adaptive mutation mode
`ai-bom-generator.mjs`	BOM	CycloneDX 1.6 AI Bill of Materials
`dashboard-aggregator.mjs`	—	Cross-project security dashboard, machine-grade aggregation
`reference-config-generator.mjs`	—	Grade A config generation based on posture gaps
`supply-chain-recheck-cli.mjs`	—	CLI wrapper for SCR scanner
`auto-cleaner.mjs`	—	Remediation engine: 16 fix operations, atomic writes, post-fix validation
`content-extractor.mjs`	—	Pre-extracts evidence from untrusted repos, strips injection patterns
`watch-cron.mjs`	—	Cron wrapper: scans all targets in config, writes summary, exits with verdict code

Why deterministic? LLMs are powerful at semantic analysis — understanding intent, detecting social engineering, assessing context. But they cannot reliably calculate Shannon entropy, measure Levenshtein distance between package names, trace taint flow across function boundaries, or detect individual Unicode codepoints. These scanners fill that gap.

Shared library (scanners/lib/): severity classification, string utilities (entropy, Levenshtein, base64 detection), output formatting, file discovery, and YAML frontmatter parsing.

Automated Hooks

These hooks run on every operation — no commands needed. They activate the moment the plugin is installed.

Hook	Event	What It Does
Prompt injection scan	UserPromptSubmit	Blocks direct prompt injection (override instructions, spoofed headers, identity redefinition); warns on subtle manipulation signals. Decodes obfuscated payloads (unicode, hex, URL, base64) before matching. Configurable: `LLM_SECURITY_INJECTION_MODE=block\|warn\|off` (default: block)
Secret detection	Edit, Write	Blocks AWS keys, Azure tokens, GitHub PATs, npm tokens, PEM keys, database URLs, Bearer tokens, passwords (13 patterns)
Path guarding	Write	Blocks writes to `.env`, `.ssh/`, `.aws/`, `.gnupg/`, credentials files, hook scripts, `/etc/`, `settings.json` (8 path categories)
Destructive commands	Bash	Blocks `rm -rf /`, `chmod 777`, pipe-to-shell, fork bombs, eval injection (8 block rules + 6 warnings)
Supply chain guardrail	Bash	Blocks known-compromised npm/pip packages, typosquatting (Levenshtein), age-gated installs (<72h), OSV.dev CVE checks across 7 package managers
Output verification	All tools (post)	Advisory: scans ALL tool output for indirect prompt injection (LLM01). Bash-specific: also flags leaked secrets, unexpected URLs, oversized MCP responses. Skips short output (<100 chars) for performance
Session guard	All tools (post)	Advisory: monitors tool call sequences for lethal trifecta (untrusted input + sensitive data access + exfiltration sink). Sliding window of 20 calls, per-session JSONL state, warns when all 3 legs present (OWASP ASI01, ASI02)
Update check	UserPromptSubmit	Checks for newer plugin versions (max 1x/24h, cached). Disable: `LLM_SECURITY_UPDATE_CHECK=off`

All hooks are Node.js (.mjs) for cross-platform compatibility (macOS, Linux, Windows).

Important

Prompt injection scan, secret detection, path guarding, destructive commands, and supply chain guardrail are blocking — they prevent the operation if a pattern matches. Output verification and session guard are advisory — they warn but do not block. Update check is informational — notifies when a newer version is available. Prompt injection blocking can be changed to warn-only (LLM_SECURITY_INJECTION_MODE=warn) or disabled (off) for security research or testing environments. Update check can be disabled with LLM_SECURITY_UPDATE_CHECK=off.

Knowledge Base

18 research-backed reference files grounding all analysis in published threat intelligence:

File	Scope
`owasp-llm-top10.md`	OWASP LLM Top 10 (2025) — attack vectors, detection signals, Claude Code mitigations
`owasp-agentic-top10.md`	OWASP Agentic AI Top 10 (ASI01-ASI10) — agent-specific threats mapped to Claude Code
`owasp-skills-top10.md`	OWASP Skills Top 10 (AST01-AST10) — skill-specific threats and mitigations
`skill-threat-patterns.md`	7 threat categories from ToxicSkills/ClawHavoc research with concrete detection patterns
`mcp-threat-patterns.md`	9 MCP threat categories from MCPTox/Pillar Security/Invariant Labs/Operant AI research
`secrets-patterns.md`	30+ regex patterns for secret detection across 10 provider categories
`mitigation-matrix.md`	OWASP LLM Top 10 → Claude Code control mapping with verification checks and coverage scores
`top-packages.json`	Top 200 npm + top 100 PyPI package names for typosquatting detection (Levenshtein baseline)
`skill-registry.json`	Seed data for skill signature registry — known fingerprints and risk profiles
`compliance-mapping.md`	EU AI Act, NIST AI RMF, ISO 42001, MITRE ATLAS — article/control mappings to plugin capabilities
`norwegian-context.md`	Norwegian regulatory landscape — Datatilsynet, NSM, Digitaliseringsdirektoratet guidance for AI security
`prompt-injection-research-2025-2026.md`	7 research papers (2025-2026) with implications for hook defenses
`deepmind-agent-traps.md`	DeepMind AI Agent Traps — 6 categories, 43 techniques, coverage matrix
`attack-scenarios.json`	64 red-team scenarios across 12 categories for attack simulation
`attack-mutations.json`	Synonym tables and mutation rules for adaptive red-team testing
`typosquat-allowlist.json`	Allowlisted package names to reduce false positives in typosquatting detection
`ide-extension-threat-patterns.md`	10 IDE-extension detection categories (VS Code + JetBrains) with 2024-2026 case studies (GlassWorm, WhiteCobra, TigerJack, Material Theme)
`top-vscode-extensions.json`	Top ~100 VS Code Marketplace extension IDs (Levenshtein typosquat seed) + blocklist of known-malicious publisher.name entries

Note

All knowledge base content is derived from published OWASP standards and peer-reviewed security research. The knowledge files provide grounding for agent analysis — agents read relevant sections before producing findings.

Compliance & Governance

v6.0.0 adds an enterprise governance layer for standards-aware security operations:

Capability	Description
Compliance Mapping	Maps plugin capabilities to EU AI Act (Art. 9, 15, 17), NIST AI RMF (Map, Measure, Manage, Govern), ISO 42001 (Annex A), and MITRE ATLAS techniques. Posture categories 14-16 assess compliance readiness.
Norwegian Context	Regulatory guidance from Datatilsynet (DPIA for AI), NSM (basic security principles), and Digitaliseringsdirektoratet. Relevant for Norwegian public sector AI deployments.
SARIF 2.1.0 Output	`--format sarif` flag on scan/deep-scan produces OASIS SARIF standard output for CI/CD integration (GitHub Advanced Security, Azure DevOps, SonarQube).
Structured Audit Trail	JSONL audit events (`audit-trail.mjs`) with ISO 8601 timestamps, OWASP category tags, and SIEM-ready schema. Configurable via `LLM_SECURITY_AUDIT_*` env vars.
AI-BOM	CycloneDX 1.6 Bill of Materials for AI components — models, MCP servers, plugins, knowledge files, hooks. `llm-security audit-bom <target>`.
Policy-as-Code	`.llm-security/policy.json` for distributable hook configuration. Teams can enforce consistent security thresholds without per-developer env var setup.
Standalone CLI	`node bin/llm-security.mjs scan <target>` — runs scanners without Claude Code. Subcommands: `scan`, `posture`, `audit-bom`, `benchmark`.
CI/CD Integration	`--fail-on <severity>` for threshold-based exit codes, `--compact` for one-liner output. Pipeline templates for GitHub Actions, Azure DevOps, GitLab CI in `ci/`. Guide: `docs/ci-cd-guide.md`.

Benchmarks

The attack simulator (llm-security benchmark) tests hook defenses with 64 crafted scenarios across 12 categories. Adaptive mode (--adaptive) applies 5 mutation rounds per passing scenario (homoglyph substitution, encoding variations, zero-width injection, case alternation, synonym replacement).

OWASP Coverage

Category	Automated (Hooks)	Deterministic (Scanners)	Advisory (Commands)	Coverage
LLM01 Prompt Injection	Strong (input + output)	UNI + ENT + TNT	Scan, Audit	95%
LLM02 Sensitive Info Disclosure	Strong	TNT + NET	Audit	83%
LLM03 Supply Chain	Partial	ENT + DEP + GIT + NET	Scan, Plugin Audit, MCP Audit	60%
LLM04 Data Poisoning	—	—	Threat Model	40%
LLM05 Improper Output Handling	Strong (output scan)	—	Audit	83%
LLM06 Excessive Agency	Strong	PRM	Posture	100%
LLM07 System Prompt Leakage	—	—	Audit	60%
LLM08 Vector/Embedding Weaknesses	—	—	Threat Model	40%
LLM09 Misinformation	—	—	Advisory	50%
LLM10 Unbounded Consumption	—	—	Pre-Deploy	83%

Average coverage: ~69%. Percentages reflect control-count coverage from knowledge/mitigation-matrix.md. Strongest in prompt injection (LLM01, 95% with runtime input/output scanning + obfuscation decoding) and agency controls (LLM06, 100%). Weakest in areas requiring model-provider or infrastructure controls (LLM04, LLM08), which are better addressed at the platform level.

Workflow Examples

1. Pre-Installation Gate

Evaluate a plugin or MCP server before installing it — locally or from a remote repo:

/security scan path/to/plugin          # Quick scan with ALLOW/WARNING/BLOCK verdict
/security plugin-audit path/to/plugin  # Deep trust assessment with Install/Review/Do Not Install
                                       # → Install if both pass, investigate if flagged

# Remote repo — scans without installing (v2.4+)
/security scan https://github.com/org/repo --deep
/security scan https://github.com/org/repo --branch dev --deep
/security plugin-audit https://github.com/org/repo

2. Monthly Security Review

Regular cadence for maintaining security posture:

/security posture                      # 30-second baseline scorecard (16 categories)
/security audit                        # Full audit with A-F grade and action items
                                       # → Fix critical/high findings
/security posture                      # Verify improvement

3. Track Security Over Time

Compare scan results against a stored baseline to see what changed:

/security diff path/to/project         # First run creates baseline, subsequent runs show delta
                                       # → Shows new, resolved, unchanged, and moved findings
/security watch path/to/project        # Continuous: runs diff every 6h via /loop

4. Deep Threat Analysis

For new architectures, major changes, or compliance requirements:

/security threat-model                 # 15-30 min guided STRIDE × MAESTRO session
/security audit                        # Verify current controls against identified threats
/security pre-deploy                   # Pre-deployment checklist before production

5. Remediation

Fix findings from scans and audits:

/security clean path/to/project --dry-run   # Preview fixes without modifying files
/security clean path/to/project             # Auto-fix safe issues, confirm semi-auto, report manual
                                            # → Review semi-auto proposals, handle manual findings

Prompt Injection Showcase (v5.0)

The examples/prompt-injection-showcase/ demonstrates runtime hook detection against 61 attack payloads across 19 categories — from classic instruction overrides to v5.0's Unicode steganography, HITL traps, NL indirection, hybrid P2SQL, and bash evasion techniques. Includes 6 false positive checks.

node examples/prompt-injection-showcase/run-showcase.mjs           # Run all 61 payloads
node examples/prompt-injection-showcase/run-showcase.mjs --verbose # Show hook output

See examples/prompt-injection-showcase/README.md for the full category breakdown.

Security Assessment Demo

The examples/malicious-skill-demo/ directory contains a deliberately malicious plugin called "Project Health Dashboard" and a full security assessment produced by the combined LLM + deterministic scanning pipeline.

What it demonstrates: A single plugin that looks like a legitimate project health monitoring tool but embeds attacks across every threat category — prompt injection, data exfiltration, Unicode steganography, typosquatting, taint flows, persistence mechanisms, and more.

Key stats:

85 total findings (24 Critical, 24 High, 20 Medium, 6 Low, 11 Info)
Verdict: BLOCK 100/100 — both LLM and deterministic scanners independently maxed the risk score
All 9 deterministic scanners active — every scanner found findings
25 LLM findings detecting semantic patterns (social engineering, intent, context normalization)
60 deterministic findings detecting byte-level patterns (entropy, Unicode codepoints, taint flow, Levenshtein distance)

Run it yourself:

# Deterministic scanners only (~5 seconds)
node scanners/scan-orchestrator.mjs examples/malicious-skill-demo/evil-project-health/

# Full LLM-enhanced deep scan (both layers)
/security scan examples/malicious-skill-demo/evil-project-health/ --deep

Key takeaway: A single "Project Health Dashboard" plugin embedded 7 categories of attacks invisible to human review. The Unicode Tag steganography, base64-encoded exfiltration payloads, and one-character-off typosquatting packages would pass casual inspection. Automated scanning caught all of them.

Self-scan: scanning the scanner

Running node scanners/scan-orchestrator.mjs . on this plugin produces 0 findings (ALLOW) with ~190 suppressions via .llm-security-ignore.

Why ~190 suppressed? A security plugin that documents attack patterns, ships a malicious demo fixture, and tests against deliberately evil code will trigger its own scanners. The entropy scanner flags regex patterns in knowledge/secrets-patterns.md. The taint scanner flags eval(user_input) in test fixtures. The network scanner flags evil.com in documentation. The toxic flow analyzer flags the plugin's own commands that use Read+Bash (they're security scanners). Every suppression is explained in the ignore file. Remove .llm-security-ignore and re-run to see all ~190.

Architecture

flowchart TB
    subgraph Runtime["Runtime Defense (9 hooks)"]
        direction LR
        H1["UserPromptSubmit<br/>Injection scan"]
        H2["PreToolUse<br/>Secrets · Paths · Bash · Supply chain"]
        H3["PostToolUse<br/>Output verify · Session guard"]
        H4["Update check"]
    end

    subgraph Scanning["Deterministic Analysis (10+11 scanners)"]
        direction LR
        S1["UNI · ENT · PRM · DEP<br/>TNT · GIT · NET · MEM · SCR"]
        S2["TFA<br/>Toxic flow correlator"]
        S3["MCI · PST · BOM<br/>Standalone scanners"]
    end

    subgraph Advisory["Advisory Analysis (6 agents, 19 commands)"]
        direction LR
        A1["Skill Scanner<br/>7 threat categories"]
        A2["MCP Scanner<br/>5-phase analysis"]
        A3["Posture · Audit<br/>16 categories, A-F grade"]
        A4["Threat Model<br/>STRIDE × MAESTRO"]
    end

    subgraph Knowledge["Knowledge Base (16 files)"]
        direction LR
        K1["5 OWASP frameworks"]
        K2["Threat patterns<br/>Skills · MCP · Secrets"]
        K3["Compliance · Research<br/>Registry · Packages"]
    end

    Runtime -->|"blocks/warns in real time"| User["Claude Code Session"]
    User -->|"/security scan"| Scanning
    User -->|"/security audit"| Advisory
    Advisory -.->|"grounded by"| Knowledge
    Scanning -->|"enriches"| Advisory
    S1 -->|"prior results"| S2

Directory Structure

llm-security/
├── .claude-plugin/plugin.json     # Manifest (v3.0.0)
├── CLAUDE.md                      # Plugin documentation
├── README.md                      # This file
├── LICENSE                        # MIT License
├── SECURITY.md                    # Vulnerability disclosure policy
├── package.json                   # type: module, engines, test script, bin field
├── bin/                           # Standalone CLI
│   └── llm-security.mjs          #   node bin/llm-security.mjs scan/posture/audit-bom/benchmark
├── ci/                            # CI/CD pipeline templates
│   ├── github-action.yml          #   GitHub Actions with SARIF upload
│   ├── azure-pipelines.yml        #   Azure DevOps with SARIF upload
│   └── gitlab-ci.yml              #   GitLab CI with SARIF upload
├── docs/                          # Guides
│   └── ci-cd-guide.md             #   CI/CD integration guide (Schrems II, NSM)
├── commands/                      # 18 slash commands
│   ├── security.md                #   Router + quick start
│   ├── scan.md                    #   Supply chain gate (+ --deep, --fail-on, --compact, --format sarif)
│   ├── deep-scan.md               #   Deterministic-only deep scan
│   ├── diff.md                    #   Compare scan against stored baseline
│   ├── watch.md                   #   Continuous monitoring via /loop
│   ├── registry.md                #   Skill signature registry
│   ├── supply-check.md            #   Re-audit installed dependencies
│   ├── clean.md                   #   Scan + remediate (auto/semi-auto/manual)
│   ├── dashboard.md               #   Cross-project security dashboard
│   ├── audit.md                   #   Full project audit
│   ├── plugin-audit.md            #   Plugin trust assessment
│   ├── mcp-audit.md               #   MCP-focused audit (+ --live flag)
│   ├── mcp-inspect.md             #   Live MCP server inspection via JSON-RPC 2.0
│   ├── posture.md                 #   Quick scorecard (16 categories)
│   ├── harden.md                  #   Generate Grade A security config
│   ├── red-team.md                #   Attack simulation (64 scenarios, adaptive mode)
│   ├── threat-model.md            #   Interactive STRIDE/MAESTRO
│   └── pre-deploy.md              #   Deployment checklist
├── agents/                        # 6 specialized agents
│   ├── skill-scanner-agent.md     #   7 threat categories
│   ├── mcp-scanner-agent.md       #   5-phase MCP analysis
│   ├── posture-assessor-agent.md  #   16-category assessment
│   ├── threat-modeler-agent.md    #   STRIDE × MAESTRO interview
│   ├── deep-scan-synthesizer-agent.md # JSON → human-readable report
│   └── cleaner-agent.md           #   Semi-auto remediation proposals
├── scanners/                      # 10 orchestrated + 11 standalone
│   ├── scan-orchestrator.mjs      #   Entry point — runs all 10 orchestrated, outputs JSON
│   ├── posture-scanner.mjs        #   Standalone: 16-category posture assessment, <50ms
│   ├── attack-simulator.mjs       #   Standalone: red-team harness, 64 scenarios, adaptive mode
│   ├── ai-bom-generator.mjs       #   Standalone: CycloneDX 1.6 AI Bill of Materials
│   ├── dashboard-aggregator.mjs   #   Standalone: cross-project dashboard aggregation
│   ├── reference-config-generator.mjs # Standalone: Grade A config generation
│   ├── supply-chain-recheck-cli.mjs #  Standalone: CLI for supply chain re-audit
│   ├── auto-cleaner.mjs           #   Standalone: remediation engine — 16 fix ops, atomic writes
│   ├── content-extractor.mjs      #   Standalone: pre-extracts evidence, strips injection patterns
│   ├── mcp-live-inspect.mjs       #   Standalone: live MCP server inspection via JSON-RPC 2.0
│   ├── watch-cron.mjs             #   Standalone: cron wrapper for background scanning
│   ├── lib/
│   │   ├── severity.mjs           #   Constants, risk score, verdict logic
│   │   ├── string-utils.mjs       #   Entropy, Levenshtein, base64, redact, obfuscation decoders
│   │   ├── injection-patterns.mjs #   Shared prompt injection patterns (21 critical, 8 high, 15 medium)
│   │   ├── output.mjs             #   Finding/result builders, JSON envelope
│   │   ├── diff-engine.mjs        #   Baseline storage, fingerprinting, diff categorization
│   │   ├── skill-registry.mjs     #   Fingerprinting, caching, pattern search
│   │   ├── file-discovery.mjs     #   Walk tree, filter, binary detect
│   │   ├── yaml-frontmatter.mjs   #   Regex-based frontmatter parser
│   │   ├── git-clone.mjs          #   Sandboxed clone/cleanup (sandbox-exec + git config hardening)
│   │   ├── fs-utils.mjs           #   Backup, restore, cleanup, tmppath (UUID-unique) utilities
│   │   ├── bash-normalize.mjs     #   Bash evasion normalization (empty quotes, ${}, backslash)
│   │   ├── supply-chain-data.mjs  #   Shared blocklists and supply chain data
│   │   ├── sarif-formatter.mjs    #   OASIS SARIF 2.1.0 output formatter
│   │   ├── audit-trail.mjs        #   Structured JSONL audit events (ISO 8601, OWASP tags)
│   │   ├── bom-builder.mjs        #   CycloneDX BOM construction
│   │   ├── distribution-stats.mjs #   Statistical analysis (Jensen-Shannon divergence)
│   │   ├── policy-loader.mjs      #   Reads .llm-security/policy.json for distributable config
│   │   └── mcp-description-cache.mjs # MCP tool description caching + drift detection
│   ├── unicode-scanner.mjs        #   Zero-width, Tags, BIDI, homoglyphs
│   ├── entropy-scanner.mjs        #   Shannon entropy, base64/hex detection
│   ├── permission-mapper.mjs      #   Plugin permission analysis
│   ├── dep-auditor.mjs            #   CVE, typosquatting, install scripts
│   ├── taint-tracer.mjs           #   Source-to-sink data flow tracing
│   ├── git-forensics.mjs          #   Rug pull signals, history analysis
│   ├── network-mapper.mjs         #   URL discovery, DNS, domain classification
│   ├── memory-poisoning-scanner.mjs # Injection in CLAUDE.md, memory, rules files
│   ├── supply-chain-recheck.mjs   #   Re-audit installed deps from lockfiles
│   └── toxic-flow-analyzer.mjs    #   Post-processing correlator: lethal trifecta detection
├── hooks/                         # 9 automated hooks
│   ├── hooks.json                 #   Hook registration
│   └── scripts/
│       ├── pre-prompt-inject-scan.mjs # 21 critical + 8 high + 15 medium patterns, obfuscation decode, configurable mode
│       ├── pre-edit-secrets.mjs   #   13 secret patterns, knowledge/ exclusion
│       ├── pre-write-pathguard.mjs #  8 path categories (env, ssh, aws, gnupg, creds, hooks, system, settings)
│       ├── pre-bash-destructive.mjs # 8 block + 6 warn rules, T1-T6 bash-normalize
│       ├── pre-install-supply-chain.mjs # 7 package managers, CVE/typosquat/age-gate
│       ├── pre-compact-scan.mjs   #   PreCompact: scans transcript tail (500 KB) for injection before compaction, mode: block/warn/off
│       ├── post-mcp-verify.mjs    #   Advisory: ALL tools injection scan, Bash secrets/URLs/size
│       ├── post-session-guard.mjs #   Advisory: runtime trifecta detection (sliding window, JSONL state)
│       └── update-check.mjs      #   Informational: version check (1x/24h, cached, disable: LLM_SECURITY_UPDATE_CHECK=off)
├── knowledge/                     # 16 reference files
│   ├── owasp-llm-top10.md
│   ├── owasp-agentic-top10.md
│   ├── owasp-skills-top10.md      #   OWASP Skills Top 10 (AST01-AST10)
│   ├── skill-threat-patterns.md
│   ├── mcp-threat-patterns.md
│   ├── secrets-patterns.md
│   ├── mitigation-matrix.md
│   ├── compliance-mapping.md      #   EU AI Act, NIST AI RMF, ISO 42001, MITRE ATLAS
│   ├── norwegian-context.md       #   Datatilsynet, NSM, Digitaliseringsdirektoratet
│   ├── deepmind-agent-traps.md    #   6 categories, 43 techniques
│   ├── prompt-injection-research-2025-2026.md # 7 research papers
│   ├── attack-scenarios.json      #   64 red-team scenarios across 12 categories
│   ├── attack-mutations.json      #   Synonym tables for adaptive testing
│   ├── typosquat-allowlist.json   #   False positive reduction
│   ├── top-packages.json          #   Top 200 npm + 100 PyPI for typosquatting
│   └── skill-registry.json        #   Seed data for skill signature registry
├── tests/                         # Test suite (node:test, zero external deps)
│   ├── lib/                       #   Unit tests for shared library
│   ├── scanners/                  #   Integration tests against fixture
│   └── fixtures/                  #   Test-specific data (dep-test)
├── reports/                        # Scan reports (.docx + .md source)
│   ├── baselines/                 #   Stored scan baselines for diff comparison
│   └── watch/                     #   Cron scan results (latest.json) + config
├── examples/                      # Demo fixtures
│   └── malicious-skill-demo/      #   Regression test (47+ findings, BLOCK)
└── templates/                     # Report templates (1 unified + archive)
    ├── unified-report.md          #   All 9 analysis types via conditional sections
    └── archive/                   #   9 original templates preserved for reference

~25,400 lines across ~100 active files (+10 archived). Minimal persistent state: scan baselines in reports/baselines/, watch results in reports/watch/, skill registry in reports/skill-registry.json, session guard JSONL in /tmp/, update-check cache in ~/.cache/. All scan outputs generated fresh per invocation.

What This Plugin Does Not Cover

Area	Why	Alternative
CLAUDE.md poisoning (post-clone)	Once a repo is cloned, CLAUDE.md loads into the system prompt before any hooks run. No hook-based solution can intercept this after cloning. This is exactly why you should scan repos remotely before cloning: `/security scan https://repo-url --deep` analyzes CLAUDE.md and all other files via the pre-extraction layer without ever loading them into your session.	Always scan before cloning unknown repos. For repos already cloned: manually review CLAUDE.md before opening with Claude Code. See context-filter for experimental OS-level interposition (macOS only, requires re-signing after Claude Code updates).
ML-based injection classification	Regex patterns cannot catch novel phrasings, multilingual injection, or adversarial rephrasing that semantic models can.	Use parry-guard alongside this plugin for DeBERTa/Llama Prompt Guard 2 ML classification.
Enterprise SSO/SCIM	Platform-level configuration	Anthropic Admin Console
RAG infrastructure	Vector DB / embedding pipeline security	Dedicated RAG security tools
LLM gateway/proxy	Network infrastructure	API gateway solutions
SIEM integration	Organization security stack	Splunk, Sentinel, etc.
Runtime scheming detection	The session guard hook detects lethal trifecta patterns (a known attack sequence), but general scheming — where an agent pursues hidden goals through novel strategies — remains fundamentally hard for any tool.	Session guard provides partial coverage. Full scheming detection requires monitoring + human oversight

These gaps are surfaced advisorily through /security threat-model and /security pre-deploy.

Complementary Tools

This plugin provides full-stack security hardening (static analysis + supply chain + audit + threat modeling). For organizations wanting defense in depth, these tools cover areas we intentionally leave to specialists:

Tool	What It Adds	How It Complements
parry-guard	ML-based prompt injection detection (DeBERTa v3 + Llama Prompt Guard 2 86M) in Rust. Fail-closed: uncertain = unsafe.	Our regex patterns catch known injection signatures. parry-guard catches novel phrasings, multilingual injection, and adversarial rephrasing via semantic ML models. No overlap, no conflict.
Lasso claude-hooks	Warn-and-continue PostToolUse hook. 96 patterns across 5 categories. `allowManagedHooksOnly` for team deployment.	Different philosophy: Lasso warns but never blocks, letting Claude decide with context. Our hooks block critical patterns. Both can run together; hooks execute sequentially.
Snyk agent-scan	Commercial skills/MCP scanning with a larger dataset (3,984 skills analyzed). Tool poisoning and shadowing detection.	Our skill-scanner-agent covers the same 7 threat categories. Snyk has a larger training set from scanning the full ClawHub marketplace. Use both for maximum coverage.

Tip

Recommended combo: llm-security (breadth: static + supply chain + audit + posture + threat modeling) + parry-guard (depth: ML injection classification). They cover different layers with no conflicts.

Compatibility

Claude Code: v2.x+
Platform: macOS, Linux, Windows (all hooks are Node.js .mjs)
Node.js: Required for hook scripts (any recent LTS version)
Overlap with claude-code-essentials: Safe to run both. This plugin extends claude-code-essentials with path guarding and MCP verification. Duplicate blocking is harmless — hooks run sequentially.

Version History

Version	Date	Highlights
6.5.0	2026-04-17	OS sandbox for `/security ide-scan <url>`. VSIX fetch + extract now runs in a sub-process wrapped by `sandbox-exec` (macOS) or `bwrap` (Linux), reusing the same primitives proven by the v5.1 git-clone sandbox. Defense-in-depth — even if `lib/zip-extract.mjs` ever has a bypass, the kernel refuses any write outside the per-scan temp directory. New: `lib/vsix-fetch-worker.mjs` (sub-process worker with deterministic JSON-line IPC) and `lib/vsix-sandbox.mjs` (`buildSandboxProfile` / `buildBwrapArgs` / `buildSandboxedWorker` / `runVsixWorker`, 35 s timeout, 1 MB stdout cap). New `scan(target, { useSandbox })` option (default `true` for CLI; tests use `false` since `globalThis.fetch` mocks do not cross processes). Windows fallback: in-process with `meta.warnings` advisory. Envelope `meta.source.sandbox` field: `'sandbox-exec' \| 'bwrap' \| 'none' \| 'in-process'`. 1352 tests (was 1344).
6.4.0	2026-04-17	`/security ide-scan <url>` — pre-install verification. The IDE extension scanner now accepts URLs and fetches the VSIX before scanning. Supported: VS Code Marketplace (`https://marketplace.visualstudio.com/items?itemName=publisher.name`), OpenVSX (`https://open-vsx.org/extension/publisher/name[/version]`), and direct `.vsix` URLs. New libraries: `lib/vsix-fetch.mjs` (HTTPS-only fetch with 50MB cap, 30s timeout, SHA-256, manual host-whitelisted redirects) and `lib/zip-extract.mjs` (zero-dep ZIP parser, rejects zip-slip / symlinks / absolute paths / drive letters / encrypted entries / ZIP64; caps: 10 000 entries, 500MB uncompressed, 100x expansion ratio, depth 20). Temp dir always cleaned in `try/finally`. Envelope `meta.source` carries `{ type: "url", kind, url, finalUrl, sha256, size, publisher, name, version }`. New knowledge file: `marketplace-api-notes.md`. GitHub repo URLs intentionally not supported (would require a build step). 1344 tests (was 1296).
6.3.0	2026-04-17	IDE extension prescan. New `/security ide-scan` command and `ide-extension-scanner.mjs` (prefix IDE) discover and audit installed VS Code extensions (and forks: Cursor, Windsurf, VSCodium, code-server, Insiders, Remote-SSH; JetBrains is a v1.1 stub). 7 IDE-specific checks: blocklist match, theme-with-code, sideload (`.vsix`), broad activation (`*`, `onStartupFinished`), Levenshtein typosquat ≤2 vs top-100, extension-pack expansion, dangerous `vscode:uninstall` hooks. Per-extension orchestration of UNI/ENT/NET/TNT/MEM/SCR scanners with bounded concurrency. OS-aware discovery via `lib/ide-extension-discovery.mjs` (Platform-specific suffix parsing for `darwin-x64`, `linux-arm64`, etc.). Offline-first; `--online` opt-in for future Marketplace/OSV.dev lookups. New knowledge files: `ide-extension-threat-patterns.md` (10 categories, 2024-2026 case studies from Koi Security — GlassWorm, WhiteCobra, TigerJack, Material Theme), `top-vscode-extensions.json` (typosquat seed + blocklist), `top-jetbrains-plugins.json` (stub). 1296 tests (was 1274).
6.2.0	2026-04-17	Opus 4.7 + Claude Code 2.1.112 alignment. Bash-normalize extended with T5 (`${IFS}` word-splitting) and T6 (ANSI-C `$'\xHH'` hex quoting) layers. New `pre-compact-scan.mjs` PreCompact hook — scans transcript tail (500 KB cap, <500 ms) for injection + credentials before context compaction. Modes: `block` / `warn` / `off` via `LLM_SECURITY_PRECOMPACT_MODE`. Agent files reframed for Opus 4.7's more literal instruction-following (Step 0 generaliseringsgrense + parallell Read-hint in skill-scanner + mcp-scanner). New `docs/security-hardening-guide.md` with env-var reference, sandboxing notes, system-card §5.2.1 / §6.3.1.1 mapping. CLAUDE.md Defense Philosophy links to system card. 1274 tests (was 1264).
6.1.0	2026-04-10	CI/CD integration. `--fail-on <severity>` flag for threshold-based exit codes (exit 1 if findings at/above level). `--compact` output mode (one-liner per finding). Policy `ci` section in `policy.json`. Pipeline templates: GitHub Actions, Azure DevOps, GitLab CI with SARIF upload. CI/CD guide (`docs/ci-cd-guide.md`) with Schrems II/NSM compliance docs. npm publish preparation (`files` whitelist). 1264 tests.
6.0.0	2026-04-10	CAISS-readiness release. Enterprise compliance and governance layer: compliance mapping (EU AI Act, NIST AI RMF, ISO 42001, MITRE ATLAS), Norwegian regulatory context (Datatilsynet, NSM, Digitaliseringsdirektoratet), SARIF 2.1.0 output format (`--format sarif`), structured JSONL audit trail (`audit-trail.mjs`), AI-BOM generator (CycloneDX 1.6), policy-as-code (`.llm-security/policy.json`), standalone CLI (`bin/llm-security.mjs` — `node bin/llm-security.mjs scan`). Posture scanner expanded to 16 categories (+EU AI Act, NIST AI RMF, ISO 42001). Attack simulator benchmark mode (`--benchmark`). 15 knowledge docs, 16 scanners, 1242+ tests.
5.1.0	2026-04-07	Sandboxed remote cloning. Defense-in-depth for `git clone` attack surface: (1) 8 git config flags disable hooks, symlinks, filter/smudge drivers, fsmonitor, local file protocol; 4 env vars isolate from system/user config. (2) OS sandbox: macOS `sandbox-exec` + Linux `bubblewrap` restrict file writes to only the clone temp dir. Graceful fallback on Windows (git config only). Post-clone size check (100MB max). UUID-unique evidence filenames prevent race conditions. Cleanup guarantee in scan/plugin-audit commands. 1147 tests (was 1115).
5.0.0	2026-04-06	Prompt Injection Hardening (v5.0). 8-session defense-in-depth overhaul driven by 7 research papers (2025-2026). MEDIUM advisory for obfuscation signals (leetspeak, homoglyphs, zero-width, multi-language). Unicode Tag steganography detection (U+E0000-E007F). Bash expansion normalization (`bash-normalize.mjs`). Rule of Two enforcement (configurable `LLM_SECURITY_TRIFECTA_MODE=block\|warn\|off`). 100-call long-horizon monitoring window with slow-burn trifecta detection. Behavioral drift via Jensen-Shannon divergence. HITL trap detection (approval urgency, summary suppression, scope minimization). Sub-agent delegation tracking (escalation-after-input advisory). NL indirection patterns. Hybrid attacks (P2SQL, recursive injection, XSS-in-agent). CaMeL-inspired data flow tagging (SHA-256 provenance, output-to-input linking). Adaptive red-team (5 mutation rounds per scenario: homoglyph, encoding, zero-width, case alternation, synonym). Knowledge base expanded: `prompt-injection-research-2025-2026.md`, `deepmind-agent-traps.md`, `attack-mutations.json`. Posture scanner expanded to 13 categories (+Prompt Injection Hardening, Rule of Two, Long-Horizon Monitoring). Defense Philosophy section documenting honest limitations. 1115 tests.
4.5.1	2026-04-04	Cross-platform support. Windows/Linux compatibility: `fileURLToPath()`, `path.dirname()`, native `fetch()` replaces `curl` subprocess, fixed tilde expansion regex. 11 files, 782 tests pass.
4.5.0	2026-04-04	Attack simulation / red-team mode. New `attack-simulator.mjs` runs 38 crafted attack scenarios across 7 categories (secrets, destructive, supply-chain, prompt-injection, pathguard, mcp-output, session-trifecta) against the plugin's own hooks. Data-driven via `knowledge/attack-scenarios.json` with runtime payload assembly. New `/security red-team` command with `--category` filter. Capstone release: v4.0 roadmap complete (S1-S6). 18 commands, 16 scanners (10 orchestrated + 6 standalone). 782 tests.
4.4.0	2026-04-03	Cross-project security dashboard. New `dashboard-aggregator.mjs` discovers all Claude Code projects under ~/ (depth 3) and ~/.claude/plugins/, runs posture-scanner on each. Machine grade = weakest link. Cache in `~/.cache/llm-security/dashboard-latest.json` (24h staleness). New `/security dashboard` command. 17 commands, 15 scanners (10 orchestrated + 5 standalone). 751 tests.
4.3.0	2026-04-03	Enhanced MCP session monitoring. MCP description drift detection via `mcp-description-cache.mjs` — caches tool descriptions, alerts on >10% Levenshtein drift (OWASP MCP05 rug-pull). MCP-concentrated trifecta in `post-session-guard.mjs` — elevated severity when all 3 lethal trifecta legs trace to the same MCP server. Cumulative data volume tracking (100KB/500KB/1MB thresholds, OWASP ASI02). Per-MCP-tool volume tracking in `post-mcp-verify.mjs` (>100KB per tool = advisory). 735 tests.
4.2.0	2026-04-03	Supply chain re-check scanner. New `supply-chain-recheck.mjs` (prefix SCR) periodically re-audits installed dependencies from lockfiles against blocklists, OSV.dev batch API, and typosquat detection. Shared data module extracts blocklists from hook. New `/security supply-check` command. 16 commands, 14 scanners (10 orchestrated + 4 standalone). 700 tests.
4.1.0	2026-04-03	Reference configuration generator. New `/security harden` command generates Grade A security config based on posture scanner gaps. New `reference-config-generator.mjs` standalone scanner detects project type (plugin/monorepo/standalone) and generates `settings.json` (deny-first), CLAUDE.md security section, and `.gitignore` additions. `--dry-run` (default) shows JSON output; `--apply` writes files with backup. Post-apply verification re-runs posture scanner. Templates in `templates/reference-config/`. 15 commands, 12 scanners (9 orchestrated + 4 standalone). 670 tests.
4.0.0	2026-04-03	Deterministic posture scanner. New `posture-scanner.mjs` — standalone scanner (prefix PST) replacing Opus agent for `/security posture`. 10 categories assessed in <50ms (was ~6 min). Categories: Deny-First, Secrets, Path Guarding, MCP Trust, Destructive Blocking, Sandbox, Human Review, Plugin Sources, Session Isolation, Cognitive State Security. Reuses `scanForInjection()` and `gradeFromPassRate()`. `/security audit` now runs scanner first for instant data, then agents for narrative. 12 scanners (9 orchestrated + 3 standalone). 647 tests.
3.1.1	2026-04-03	Memory poisoning scanner (Cognitive State Traps). New `memory-poisoning-scanner.mjs` — scanner #9 in orchestrator (prefix MEM, OWASP: LLM01+ASI02). Detects 6 threat categories in CLAUDE.md, memory files, `.claude/rules`, REMEMBER.md, and `*.local.md`: injection patterns (via shared injection-patterns.mjs), shell commands in memory files, suspicious exfiltration URLs (webhook.site/ngrok/pipedream/etc.), credential path references (.ssh/.aws/id_rsa/credentials.json), permission expansion directives (bypassPermissions/dangerouslySkipPermissions), encoded payloads (base64 >40 chars, hex >64 chars). Posture assessor gains Category 10: Cognitive State Security. 11 scanners (9 orchestrated + 2 standalone). 606 tests (was 588).
3.1.0	2026-04-03	AI Agent Traps defense. Gap analysis against AI Agent Traps (Franklin et al., Google DeepMind, 2025). New detections: HTML/CSS content obfuscation (6 patterns for `display:none`, `visibility:hidden`, off-screen positioning, zero font-size/opacity, `aria-label` injection), oversight evasion (9 patterns for educational/hypothetical/red-team/research framing), markdown syntactic masking (anchor text injection payloads). Encoding hardening: HTML entity decoding (named, decimal, hex), recursive multi-layer decode (max 3 iterations), letter-spacing collapse. `post-mcp-verify` hook gains HTML content trap detection for WebFetch/Read/MCP output. Knowledge base updated with Agent Traps taxonomy mapping. 588 tests (was 544).
3.0.0	2026-04-03	Public release. 8 sessions from v2.5→v3.0. New in v3: toxic flow analysis (TFA scanner — lethal trifecta detection via cross-component correlation), runtime session guard (PostToolUse trifecta monitoring with sliding window), MCP live inspection (JSON-RPC 2.0 connect to running servers), report diffing with baselines (fuzzy matching, new/resolved/moved), continuous scanning (watch command + cron wrapper), skill signature registry (SHA-256 fingerprinting + cache). 4 OWASP frameworks (LLM Top 10, Agentic AI, Skills, MCP). 15 commands, 8 hooks, 10 scanners (8 orchestrated + 2 standalone), 6 agents, 9 knowledge files, 544 tests. Architecture diagram added.
2.9.2	2026-04-03	Skill signature registry. New `skill-registry.mjs` library for SHA-256 fingerprinting of normalized skill content, scan result caching, and pattern search. New `/security registry` command with stats, scan+register, and search sub-commands. `/security scan` now checks registry before full scan — instant result for known fingerprints (7-day staleness threshold). Seed data in `knowledge/skill-registry.json`, active registry in `reports/skill-registry.json`. 15 commands, 9 knowledge files total.
2.9.1	2026-04-03	Continuous/background scanning. New `/security watch [path] [--interval 6h]` command uses the built-in /loop skill to run `/security diff` on a recurring interval. New `watch-cron.mjs` standalone script for system cron/launchd — reads multi-target config from `reports/watch/config.json`, writes summary to `reports/watch/latest.json`, exits with worst verdict code (0/1/2). 13 commands total.
2.9.0	2026-04-03	Report diffing & baseline. New `diff-engine.mjs` library for finding fingerprinting, fuzzy line matching (±3), and diff categorization (new/resolved/unchanged/moved). Scan orchestrator gains `--baseline` and `--save-baseline` flags. Baselines stored per target hash in `reports/baselines/`. New `/security diff` command compares current scan against stored baseline and shows delta. 12 commands total.
2.8.1	2026-04-03	Auto update notifications. New `update-check.mjs` UserPromptSubmit hook checks for newer plugin versions against the public Forgejo repo (max 1x/24h, cached in `~/.cache/llm-security/`). Notifies via systemMessage when a newer version is available. Disable: `LLM_SECURITY_UPDATE_CHECK=off`. 8 hooks total.
2.8.0	2026-04-02	MCP Runtime Inspection. New `mcp-live-inspect.mjs` standalone scanner connects to MCP stdio servers via JSON-RPC 2.0, fetches live tool/prompt/resource lists, scans descriptions for injection (MCP03, MCP06), tool shadowing across servers (MCP09), URL/IP in descriptions. New `/security mcp-inspect` command. `/security mcp-audit --live` flag for combined static + live analysis with cross-reference escalation. Scanner prefix: MCI. 9 scanners (8 orchestrated + 1 standalone), 11 commands total.
2.7.1	2026-04-02	Runtime session guard hook. PostToolUse hook monitoring tool call sequences for lethal trifecta (untrusted input + sensitive data access + exfiltration sink). Sliding window (20 calls), per-session JSONL state, advisory warning. 7 hooks total.
2.7.0	2026-04-02	Toxic flow analysis scanner. 8th deterministic scanner detecting lethal trifecta patterns in plugin component definitions. Post-processing correlator consuming output from all prior scanners. Direct, cross-component, and project-level trifecta detection with mitigation downgrades.
2.6.0	2026-04-02	MEDIUM injection patterns + 4-framework OWASP mapping. Added ~15 MEDIUM-severity patterns (base64 payloads, leetspeak, homoglyphs). Full OWASP mapping: LLM Top 10, Agentic AI Top 10 (ASI), Skills Top 10 (AST), MCP Top 10. New knowledge file `owasp-skills-top10.md`. 8 knowledge files total.
2.5.0	2026-04-02	Pre-extraction indirection layer for remote scan defense. Remote scans now pre-extract structured evidence via `content-extractor.mjs` and strip injection patterns BEFORE LLM agents see the content. Agents analyze a JSON evidence package, never raw files from untrusted repos. `[INJECTION-PATTERN-STRIPPED]` markers are confirmed findings.
2.4.0	2026-04-01	GitHub repo URL support for scan and plugin-audit. `scan` and `plugin-audit` accept `https://github.com/...` URLs directly. Clones to temp dir via `scanners/lib/git-clone.mjs`, scans locally, cleans up. `--branch <name>` flag for non-default branches.
2.3.0	2026-04-01	PostToolUse expanded to ALL tools + configurable injection mode. 498 tests (was 470). PostToolUse hook now scans Read, WebFetch, MCP, and all other tool output for indirect prompt injection (was Bash-only). Bash-specific checks (secrets, URLs, large output) preserved. Short output skip (<100 chars) for performance. `LLM_SECURITY_INJECTION_MODE` env var: `block` (default), `warn` (advisory-only), `off` (disable). Complementary Tools section documenting parry-guard, Lasso, Snyk compatibility. CLAUDE.md poisoning gap documented as known limitation.
2.2.0	2026-04-01	Prompt injection runtime defense (Gaps 1-3). 470 tests (was 383). New `UserPromptSubmit` hook blocks injection in user input. `post-mcp-verify` extended with indirect injection scanning in tool output (LLM01). Obfuscation decoding: unicode-escape, hex-escape, URL-encoding, base64 normalization before pattern matching. Shared `injection-patterns.mjs` module with 21 critical + 8 high patterns from skill-scanner-agent Category 1. LLM01 coverage 83%->95%, LLM05 80%->83%.
2.1.0	2026-04-01	383 tests (was 177): full hook coverage (66 tests), auto-cleaner coverage (140 tests), auto-cleaner import guard fix, solo project (CONTRIBUTING.md removed), HTTPS install URL under fromaitochitta org, model defaults set to sonnet
2.0.0	2026-03-31	Open-source release: MIT LICENSE, SECURITY.md, test suite (`node:test`), path guarding hook (`pre-write-pathguard.mjs`), supply chain hook documentation, version alignment, `.gitignore`, `.editorconfig`
1.4.0	2026-02-21	Unified risk scoring formula (25/10/4/1 weights), score-based verdicts, risk bands (Low→Extreme), OWASP categorization, A-F grading function, single `unified-report.md` template replacing 9 separate templates with conditional sections per analysis type
1.3.0	2026-02-21	`/security clean` command with 3-tier remediation (auto/semi-auto/manual), `auto-cleaner.mjs` engine (16 fix operations, atomic writes, post-fix validation), `cleaner-agent` for semi-auto proposals, `clean-report.md` template, `--dry-run` flag
1.2.0	2026-02-19	7 deterministic Node.js scanners (unicode, entropy, permissions, dependencies, taint, git forensics, network), deep-scan command + `--deep` flag, synthesizer agent, shared scanner library, demo fixture with 85-finding security assessment, OWASP coverage improvements (LLM01 70→85%, LLM02 90→95%, LLM03 80→90%, LLM06 85→95%)
1.1.0	2026-02-19	Plugin audit command (`/security plugin-audit`), MCP audit command (`/security mcp-audit`), pre-deployment checklist (`/security pre-deploy`), 3 new report templates, updated OWASP coverage (LLM03 75%→80%)
1.0.0	2026-02-19	Initial release — 4 agents, 4 hooks, 6 knowledge files (2,771 lines), 8 commands, 7 report templates. OWASP LLM Top 10 + Agentic AI Top 10 coverage

License & Attribution

This project is licensed under the MIT License.

Knowledge base files in knowledge/ are derived from published OWASP standards and security research papers. OWASP content is used under the CC BY-SA 4.0 license.

Threat intelligence sources: AI Agent Traps (Franklin et al., Google DeepMind, 2025), ToxicSkills (Xi'an Jiaotong, 2025), ClawHavoc (Repello AI, 2025), MCPTox (Invariant Labs, 2025), Pillar Security MCP Research (2025), Operant AI Agentic Security (2025).

The plugin architecture, scan pipeline, threat detection patterns, and security assessment methodology are original work.

Part of From AI to Chitta. Source: git.fromaitochitta.com/open/claude-code-llm-security.

Feedback & Requests

Bug reports: Open an issue on Forgejo
Feature requests: Open an issue with a [Request] prefix
Security vulnerabilities: See SECURITY.md — do not open a public issue
General questions: Email security@fromaitochitta.com or use the contact form

Contributing

This is a solo project. See Feedback & Requests for how to report bugs or suggest features. Pull requests are not accepted.

Microsoft and OWASP product names are trademarks of their respective owners. This project is not endorsed by or affiliated with any referenced organization.

README.md Unescape Escape