Transparency: all code in this marketplace is produced by Claude Code through dialog-driven development. Root README gets a full disclosure section; each plugin README gets a one-line disclosure linking back to the marketplace section. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| .claude-plugin | ||
| agents | ||
| bin | ||
| ci | ||
| commands | ||
| docs | ||
| examples | ||
| hooks | ||
| knowledge | ||
| reports | ||
| scanners | ||
| scripts | ||
| templates | ||
| test-fixtures/trifecta-plugin | ||
| tests | ||
| --json | ||
| .editorconfig | ||
| .gitignore | ||
| .llm-security-ignore | ||
| .npmignore | ||
| .orphaned_at | ||
| CHANGELOG.md | ||
| CLAUDE.md | ||
| LICENSE | ||
| package.json | ||
| README.md | ||
| SECURITY.md | ||
| V3-ANNOUNCEMENT.md | ||
| V3-UPGRADE.md | ||
LLM Security Plugin for Claude Code
Automated defense and advisory analysis for the agentic AI attack surface.
Built for my own Claude Code workflow and shared openly for anyone who finds it useful. This is a solo project — bug reports and feature requests are welcome, but pull requests are not accepted.
AI-generated: all code produced by Claude Code through dialog-driven development. Full disclosure →
A Claude Code plugin that provides security scanning, auditing, and threat modeling for agentic AI projects. Built on OWASP LLM Top 10 (2025), OWASP Agentic AI Top 10, and the AI Agent Traps taxonomy (Google DeepMind, 2025), with threat intelligence from ToxicSkills, ClawHavoc, MCPTox, Pillar Security, Invariant Labs, and Operant AI research.
Table of Contents
- What Is This?
- The Extension Security Problem
- Quick Start
- Commands
- Agent Architecture
- Deterministic Scanners
- Automated Hooks
- Knowledge Base
- OWASP Coverage
- Workflow Examples
- Security Assessment Demo
- Architecture
- What This Plugin Does Not Cover
- Compatibility
- Version History
- Feedback & Requests
- Contributing
- License & Attribution
What Is This?
Claude Code plugins, MCP servers, and agentic workflows introduce attack surfaces that traditional security tools don't cover: prompt injection, tool poisoning, secret exfiltration through tool outputs, supply chain attacks via malicious skills, and excessive agency.
This plugin provides three layers of protection:
- Automated enforcement — 9 hooks that block dangerous operations in real time (prompt injection in user input, secrets in code, writes to sensitive paths, destructive shell commands, supply chain guardrails, suspicious tool output, runtime trifecta detection, transcript scanning before context compaction, update notifications)
- Deterministic scanning — 22 Node.js scanners (10 orchestrated + 12 standalone) that perform byte-level analysis LLMs cannot: Shannon entropy, Unicode codepoints, Levenshtein distance for typosquatting, source-to-sink taint flow, DNS resolution, git history forensics, toxic flow analysis, memory poisoning, live MCP inspection, AI-BOM generation, attack simulation, IDE extension prescan
- Advisory analysis — 19 commands that scan, audit, and model threats with structured reports, letter grades, and actionable remediation plans
Key capabilities:
- Supply chain gate — scan any plugin, MCP server, or agent file before installation with ALLOW/WARNING/BLOCK verdicts
- Full project audit — evaluate 16 security categories with A-F grading and prioritized action items
- Plugin trust assessment — dedicated plugin audit with Install/Review/Do Not Install verdict
- MCP server audit — focused analysis of all installed MCP configurations with trust scoring
- Threat modeling — interactive STRIDE × MAESTRO 7-layer session with risk matrix
- Pre-deployment checklist — 10 automated + 3 manual checks before going to production
- Automated remediation — scan-and-fix pipeline with 3-tier approach (auto/semi-auto/manual)
- Continuous monitoring — recurring diff scanning via
/security watch(uses built-in /loop) or system cron viawatch-cron.mjs - Quick posture check — 30-second scorecard showing your security baseline (16 categories)
Tip
Start with
/security posturefor a 30-second baseline, then/security auditfor the full picture.
The Extension Security Problem
Claude Code's extensibility model — skills, MCP servers, plugins, hooks — creates an attack surface that mirrors the npm/PyPI supply chain problem, but with a critical difference: extensions run with LLM agency. A malicious plugin doesn't just execute code in a sandbox; it can instruct an AI agent to read your SSH keys, exfiltrate environment variables, install persistence mechanisms, and modify its own configuration — all while appearing to be a helpful "Project Health Dashboard."
This is not theoretical. The ToxicSkills research (Xi'an Jiaotong, 2025) and ClawHavoc campaign (Repello AI, 2025) documented real attack patterns against agentic AI systems. The OWASP LLM Top 10 and OWASP Agentic AI Top 10 now formally categorize these threats.
We built a proof-of-concept — a single plugin called "Project Health Dashboard" that looks legitimate but embeds attacks across every threat category. When scanned with this plugin's combined LLM + deterministic analysis, it produced 85 findings: prompt injection via HTML comments, environment exfiltration via base64-encoded payloads, Unicode steganography invisible to human review, 6 typosquatting packages, 6 source-to-sink taint flows, persistence via crontab and LaunchAgents, and more. Verdict: BLOCK 100/100.
A human reviewing the plugin's README and SKILL.md would likely miss most of these. The Unicode Tag steganography is literally invisible. The base64 payload looks like a configuration block. The typosquatting packages are one character off from the real ones.
What organizations need:
- A pre-installation scan gate — automated analysis before any extension is installed (this plugin provides
/security scanand/security plugin-audit) - A trusted, curated marketplace — vetted extensions with security review as a prerequisite for listing
- Deterministic scanning — byte-level analysis for things LLMs cannot detect: Unicode codepoints, Shannon entropy, Levenshtein distance, source-to-sink taint flows
- Automated hooks — always-on primary defense blocking secrets in code, writes to sensitive paths, and destructive commands in real time
Important
Always scan repos remotely before cloning them. A poisoned CLAUDE.md injects instructions into the model context the moment you open a cloned repo — before any hooks can intervene.
/security scan https://repo-url --deepanalyzes everything safely via pre-extraction, without loading anything into your session. This is the primary defense against CLAUDE.md poisoning.
Quick Start
Prerequisites
- Claude Code installed
- Node.js (for automated hooks —
.mjsscripts)
Important
If you use Opus with extended context (1M tokens): Subagents inherit the parent session's context limit but do not support extended context, causing API errors ("limit reached" or "extra usage required"). Fix: run
/model Opusin your session before using any security commands. This resets the session to standard 200K context, which subagents handle correctly.
Installation
Add the marketplace and browse plugins with /plugin:
claude plugin marketplace add https://git.fromaitochitta.com/open/ktg-plugin-marketplace.git
Or enable directly in ~/.claude/settings.json:
{
"enabledPlugins": {
"llm-security@ktg-plugin-marketplace": true
}
}
Note
Hooks activate immediately on installation. Secret detection, path guarding, and destructive command blocking start working without any commands.
First Scan
> /security posture
┌──────────────────────────────────────────────┐
│ Security Posture: 8/16 [B] 77% │
│ ████████████████░░░░░░░░░░ │
├──────────────────────────────────────────────┤
│ ✅ Deny-First Config │
│ ✅ Secrets Protection │
│ ✅ Path Guarding │
│ ⚠️ MCP Server Trust │
│ ✅ Destructive Command Blocking │
│ ⚠️ Sandbox Config │
│ ⚠️ Human Review │
│ ✅ Skill/Plugin Sources │
│ ⚠️ Session Isolation │
│ ✅ Cognitive State Security │
│ ✅ Prompt Injection Hardening │
│ ⚠️ Rule of Two │
│ ⚠️ Long-Horizon Monitoring │
│ ✅ EU AI Act │
│ ⚠️ NIST AI RMF │
│ — ISO 42001 │
├──────────────────────────────────────────────┤
│ 6 findings (1 high, 3 medium, 2 low) │
└──────────────────────────────────────────────┘
Commands
Scanning & Assessment
| Command | Description |
|---|---|
/security |
Overview of all commands and quick start guide |
/security scan [path|url] |
Scan skills, MCP servers, directories, or GitHub repos for security issues |
/security scan [path|url] --deep |
Enhanced scan: LLM agents + 10 deterministic scanners |
/security deep-scan [path] |
Run 10 deterministic Node.js scanners directly (entropy, unicode, taint, deps, git, permissions, network, memory poisoning, supply chain recheck, toxic flow). Supports --fail-on <severity>, --compact, --format sarif, --output-file <path> |
/security audit |
Full project security audit with A-F grading and remediation plan |
/security plugin-audit [path|url] |
Dedicated plugin security audit with Install/Review/Do Not Install verdict (local or GitHub URL) |
/security mcp-audit [--live] |
Focused audit of all installed MCP server configurations (add --live for runtime inspection) |
/security mcp-inspect |
Connect to running MCP stdio servers and scan live tool descriptions |
/security ide-scan [target|url] |
Scan installed VS Code (+ Cursor, Windsurf, VSCodium, code-server) / JetBrains extensions — OR fetch a remote VSIX from VS Code Marketplace, OpenVSX, or direct .vsix URL (v6.4.0). Typosquat, theme-with-code, sideload, broad activation, uninstall hooks, plus UNI/ENT/NET/TNT/MEM/SCR per extension. Offline by default |
/security posture |
Quick security posture scorecard (16 categories incl. compliance) |
/security diff [path] |
Compare scan against stored baseline — shows new/resolved/unchanged/moved findings |
/security watch [path] [--interval 6h] |
Continuous monitoring — runs diff on a recurring interval via /loop |
/security registry [scan|search] |
Skill signature registry — view stats, scan+register skills, search known fingerprints |
Remediation
| Command | Description |
|---|---|
/security clean [path] |
Scan and remediate findings — auto-fix, semi-auto confirm, manual report |
/security clean [path] --dry-run |
Preview what would be fixed without modifying files |
/security harden [path] |
Generate Grade A security config — settings.json, CLAUDE.md, .gitignore |
/security harden [path] --apply |
Apply generated config with automatic backup |
Threat Modeling & Planning
| Command | Description |
|---|---|
/security threat-model |
Interactive STRIDE/MAESTRO threat modeling session (15-30 min) |
/security red-team [--category] [--adaptive] |
Attack simulation — 64 scenarios across 12 categories test hook defenses. --adaptive for mutation-based evasion testing |
/security pre-deploy |
Pre-deployment security checklist (10 automated + 3 manual checks) |
Scan
/security scan is a supply chain gate. Point it at any local path or GitHub URL before installation. It spawns specialized agents sequentially to analyze:
- Skills/agents: 7 threat categories (injection, exfiltration, privilege escalation, scope creep, hidden instructions, toolchain manipulation, persistence)
- MCP servers: 5-phase analysis (tool descriptions, source code, dependencies, configuration, rug pull detection)
Remote repo support (v2.4+): Pass a GitHub URL directly — the plugin clones to a temp directory, scans, and cleans up. Use --branch <name> for non-default branches:
/security scan https://github.com/org/repo --branch dev --deep
Injection-safe remote scanning (v2.5+): Remote scans pre-extract structured evidence via content-extractor.mjs and strip injection patterns BEFORE LLM agents see the content. Agents analyze a JSON evidence package, never raw files from untrusted repos. [INJECTION-PATTERN-STRIPPED] markers are confirmed findings.
Sandboxed cloning (v5.1+): git clone can execute arbitrary code via .gitattributes filter/smudge drivers (CVE-2024-32002 and related). Remote clones are now hardened with defense-in-depth:
Layer 1 — Git config hardening (all platforms): 8 config flags neutralize known attack vectors:
| Flag | Mitigates |
|---|---|
core.hooksPath=/dev/null |
Git hooks executed during clone/checkout |
core.symlinks=false |
Symlink traversal out of temp directory |
core.fsmonitor=false |
Arbitrary command execution via fsmonitor |
filter.lfs.{process,smudge,clean}= |
Filter/smudge driver code execution (.gitattributes) |
protocol.file.allow=never |
Local file protocol traversal |
transfer.fsckObjects=true |
Malformed git objects |
Environment variables (GIT_CONFIG_NOSYSTEM=1, GIT_CONFIG_GLOBAL=/dev/null, GIT_ATTR_NOSYSTEM=1, GIT_TERMINAL_PROMPT=0) isolate from system/user git config and block interactive prompts.
Layer 2 — OS-level filesystem sandbox (platform-dependent):
| Platform | Sandbox | How it works | Limitations |
|---|---|---|---|
| macOS | sandbox-exec |
Restricts file writes to only the clone temp dir via Seatbelt profiles | Deprecated by Apple but still functional (no replacement exists) |
| Linux | bubblewrap (bwrap) |
Read-only root bind mount + writable clone dir + namespace isolation | Requires bwrap package. Fails on Ubuntu 24.04+ without admin AppArmor config. Works on Fedora/Arch |
| Windows | None available | Git config hardening only (Layer 1) | See Windows guidance below |
The plugin probe-tests sandbox availability at runtime and falls back gracefully. When no OS sandbox is available, a WARN is logged and cloning proceeds with git config hardening only.
Additional protections: Post-clone size check (100MB max), UUID-unique evidence filenames (prevents race conditions), cleanup guarantee (temp files removed even on error).
Windows guidance: Windows has no CLI-level filesystem sandbox equivalent to sandbox-exec or bwrap. The alternatives either require additional software or admin privileges:
| Option | Isolation level | Requirements |
|---|---|---|
| Windows Sandbox | Full VM (Hyper-V) | Windows Pro/Enterprise, Hyper-V enabled. GUI-oriented, not scriptable |
| Docker Desktop | Container | Requires Docker install. Best option for automated isolation |
| WSL2 | Linux VM | Requires WSL2 install. Once inside, bwrap is available (except Ubuntu 24.04+ caveat) |
| AppContainer | Process sandbox | Requires native C++ helper binary — not practical to ship in a Node.js plugin |
Recommendation for Windows users: Run Claude Code inside WSL2 or Docker Desktop for full sandbox coverage. The git config hardening (Layer 1) provides baseline protection on all platforms and neutralizes all known .gitattributes attack vectors even without an OS sandbox.
Why not Node.js
--permission? Node's permission model restrictsfsmodule access within the Node process, but does not sandbox child processes likegitwhich run as separate OS processes. It is therefore not useful for this use case.
Output: structured report with ALLOW / WARNING / BLOCK verdict, risk score (0-100), and findings sorted by severity.
Audit
/security audit is a comprehensive project review. It spawns up to 3 agents to evaluate 9 security categories:
- Secret management
- Permission model
- Input validation
- Output handling
- Supply chain
- Data protection
- Logging and monitoring
- Network security
- Agent autonomy controls
Output: A-F letter grade, risk matrix, and prioritized action items.
Plugin Audit
/security plugin-audit [path|url] is a dedicated trust assessment for Claude Code plugins. Point it at any local plugin directory or GitHub URL to get a comprehensive evaluation before installation. It analyzes:
- Manifest metadata — name, version, author, auto_discover settings
- Component inventory — commands, agents, hooks, skills with tool grants
- Permission matrix — aggregated tool access across all components, flagging Bash, Write+Bash, and Task access
- Hook safety — classifies hook behavior (block/warn/advisory), flags state-modifying or network-calling hooks
- Content scan — spawns skill-scanner-agent for 7 threat categories
Output: structured report with Install / Review / Do Not Install trust verdict.
Clean
/security clean is a scan-and-remediate pipeline. It runs the full deterministic scanner suite, classifies each finding into one of three tiers, and acts accordingly:
- Auto — Deterministic, safe fixes applied without confirmation (e.g., removing zero-width characters, BIDI overrides, Unicode Tag steganography, upgrading haiku models)
- Semi-auto — Fixes generated by an LLM agent, presented for user confirmation before applying (e.g., homoglyph replacement, permission adjustments, dependency fixes)
- Manual — Findings that require human judgment, included in the report but not auto-fixed (e.g., taint flow refactoring, architecture changes)
The remediation engine (auto-cleaner.mjs) performs 16 fix operations as pure functions (content → content) with atomic writes and post-fix validation. Use --dry-run to preview all proposed changes without modifying any files.
Threat Model
/security threat-model runs a guided 15-30 minute interview session that maps your system through two frameworks:
- STRIDE — Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, Elevation of Privilege
- MAESTRO 7-layer model — Foundation Models, Data/Knowledge, Agent Frameworks, Tool Integration, Agent Capabilities, Multi-Agent Systems, Ecosystem
Output: complete threat model document with prioritized threats, risk scores, and mitigation status.
Agent Architecture
The plugin delegates specialized work to 6 purpose-built agents. Each agent has focused threat detection capabilities and its own knowledge base routing.
| Agent | Role | Model | Spawned By | Tools |
|---|---|---|---|---|
skill-scanner-agent |
7 threat categories (injection, exfiltration, escalation, scope creep, hidden instructions, toolchain manipulation, persistence) | Opus | /security scan, /security audit, /security plugin-audit |
Read, Glob, Grep |
mcp-scanner-agent |
5-phase MCP analysis (tool descriptions, source code, dependencies, config, rug pull detection) | Opus | /security scan, /security mcp-audit |
Read, Glob, Grep, Bash |
posture-assessor-agent |
16-category assessment with PASS/PARTIAL/FAIL scoring and A-F grading | Opus | /security audit, /security posture |
Read, Glob, Grep |
threat-modeler-agent |
Interactive STRIDE × MAESTRO 7-layer interview with 5-phase workflow | Opus | /security threat-model |
Read, Glob, Grep, AskUserQuestion |
deep-scan-synthesizer-agent |
Interprets deterministic scanner JSON into human-readable report with executive summary and prioritized recommendations | Opus | /security deep-scan, /security scan --deep |
Read, Glob, Grep |
cleaner-agent |
Generates semi-auto remediation proposals for findings requiring human judgment (read-only, returns JSON proposals) | Opus | /security clean |
Read, Glob, Grep |
Scan Pipelines
For commands like /security audit, the plugin orchestrates multiple agents in parallel:
┌──────────────┐
│ /security │
│ audit │
└──────┬───────┘
│
┌────────────┼────────────┐
▼ ▼ ▼
┌─────────────┐ ┌───────────┐ ┌──────────┐
│ Skill │ │ MCP │ │ Posture │
│ Scanner │ │ Scanner │ │ Assessor │
└──────┬──────┘ └─────┬─────┘ └────┬─────┘
│ │ │
└──────────────┼─────────────┘
▼
┌────────────────┐
│ Audit Report │
│ (A-F grade) │
└────────────────┘
For deep scans (/security scan --deep or /security deep-scan), deterministic scanners run in parallel followed by synthesis:
┌──────────────┐
│ /security │
│ scan --deep │
└──────┬───────┘
│
┌───────────────┼───────────────┐
▼ ▼ ▼
┌───────────┐ ┌────────────┐ ┌────────────┐
│ LLM Skill │ │ 10 Det. │ │ MCP │
│ Scanner │ │ Scanners │ │ Scanner │
└─────┬─────┘ └──────┬─────┘ └──────┬─────┘
│ UNI ENT PRM │
│ DEP TNT GIT │
│ NET MEM SCR TFA │
│ │ │
│ ┌──────┴─────┐ │
│ │ Synthesizer│ │
│ │ Agent │ │
│ └──────┬─────┘ │
└───────────────┼───────────────┘
▼
┌────────────────┐
│ Combined Report│
│ (BLOCK/WARN/OK)│
└────────────────┘
Deterministic Scanners
10 orchestrated + 12 standalone Node.js scanner scripts that perform byte-level analysis an LLM cannot. Zero external dependencies. Orchestrated scanners run via node scanners/scan-orchestrator.mjs <target> or through /security deep-scan. Supports --fail-on <severity>, --compact, --format sarif, --output-file <path>.
Orchestrated (10)
| Scanner | Prefix | Detects | OWASP |
|---|---|---|---|
unicode-scanner.mjs |
UNI | Zero-width chars, Unicode Tag steganography, BIDI overrides, Cyrillic homoglyphs | LLM01 |
entropy-scanner.mjs |
ENT | High-entropy strings, base64/hex blobs, encoded payloads via Shannon entropy | LLM01, LLM03 |
permission-mapper.mjs |
PRM | Purpose-vs-tools mismatch, ghost hooks, haiku on sensitive agents, overprivileged components | LLM06 |
dep-auditor.mjs |
DEP | CVEs (npm/pip audit), typosquatting (Levenshtein distance), malicious install scripts, unpinned versions | LLM03 |
taint-tracer.mjs |
TNT | Source-to-sink data flow (process.env/req.body to eval/exec/fetch/writeFile), 3-pass analysis | LLM01, LLM02 |
git-forensics.mjs |
GIT | Force pushes, description drift, hook modifications, new outbound URLs, author changes | LLM03 |
network-mapper.mjs |
NET | Undisclosed URLs, suspicious domains (ngrok, webhook.site), IP-based URLs, DNS analysis | LLM02, LLM03 |
memory-poisoning-scanner.mjs |
MEM | Injection patterns, shell commands, credential paths, permission expansion, suspicious URLs, encoded payloads in CLAUDE.md/memory/rules files | LLM01, ASI02 |
supply-chain-recheck.mjs |
SCR | Re-audit installed deps from lockfiles against blocklists, OSV.dev batch API, typosquat detection | LLM03 |
toxic-flow-analyzer.mjs |
TFA | Lethal trifecta detection: untrusted input + sensitive data access + exfiltration sink. Cross-component correlation (runs last) | ASI01, ASI02, ASI05 |
Standalone (12)
| Scanner | Prefix | Purpose |
|---|---|---|
scan-orchestrator.mjs |
— | Entry point: runs all 10 orchestrated scanners, outputs JSON |
posture-scanner.mjs |
PST | Deterministic posture assessment, 16 categories (incl. EU AI Act, NIST AI RMF, ISO 42001), <50ms |
mcp-live-inspect.mjs |
MCI | Live MCP server inspection via JSON-RPC 2.0 (tool injection, shadowing, URL/IP) |
ide-extension-scanner.mjs |
IDE | VS Code (+ Cursor, Windsurf, VSCodium, code-server) / JetBrains extension prescan: blocklist, theme-with-code, sideload, broad activation, typosquat, extension-pack expansion, dangerous uninstall hooks — then UNI/ENT/NET/TNT/MEM/SCR per extension |
attack-simulator.mjs |
— | Red-team harness: 64 scenarios, 12 categories, adaptive mutation mode |
ai-bom-generator.mjs |
BOM | CycloneDX 1.6 AI Bill of Materials |
dashboard-aggregator.mjs |
— | Cross-project security dashboard, machine-grade aggregation |
reference-config-generator.mjs |
— | Grade A config generation based on posture gaps |
supply-chain-recheck-cli.mjs |
— | CLI wrapper for SCR scanner |
auto-cleaner.mjs |
— | Remediation engine: 16 fix operations, atomic writes, post-fix validation |
content-extractor.mjs |
— | Pre-extracts evidence from untrusted repos, strips injection patterns |
watch-cron.mjs |
— | Cron wrapper: scans all targets in config, writes summary, exits with verdict code |
Why deterministic? LLMs are powerful at semantic analysis — understanding intent, detecting social engineering, assessing context. But they cannot reliably calculate Shannon entropy, measure Levenshtein distance between package names, trace taint flow across function boundaries, or detect individual Unicode codepoints. These scanners fill that gap.
Shared library (scanners/lib/): severity classification, string utilities (entropy, Levenshtein, base64 detection), output formatting, file discovery, and YAML frontmatter parsing.
Automated Hooks
These hooks run on every operation — no commands needed. They activate the moment the plugin is installed.
| Hook | Event | What It Does |
|---|---|---|
| Prompt injection scan | UserPromptSubmit | Blocks direct prompt injection (override instructions, spoofed headers, identity redefinition); warns on subtle manipulation signals. Decodes obfuscated payloads (unicode, hex, URL, base64) before matching. Configurable: LLM_SECURITY_INJECTION_MODE=block|warn|off (default: block) |
| Secret detection | Edit, Write | Blocks AWS keys, Azure tokens, GitHub PATs, npm tokens, PEM keys, database URLs, Bearer tokens, passwords (13 patterns) |
| Path guarding | Write | Blocks writes to .env, .ssh/, .aws/, .gnupg/, credentials files, hook scripts, /etc/, settings.json (8 path categories) |
| Destructive commands | Bash | Blocks rm -rf /, chmod 777, pipe-to-shell, fork bombs, eval injection (8 block rules + 6 warnings) |
| Supply chain guardrail | Bash | Blocks known-compromised npm/pip packages, typosquatting (Levenshtein), age-gated installs (<72h), OSV.dev CVE checks across 7 package managers |
| Output verification | All tools (post) | Advisory: scans ALL tool output for indirect prompt injection (LLM01). Bash-specific: also flags leaked secrets, unexpected URLs, oversized MCP responses. Skips short output (<100 chars) for performance |
| Session guard | All tools (post) | Advisory: monitors tool call sequences for lethal trifecta (untrusted input + sensitive data access + exfiltration sink). Sliding window of 20 calls, per-session JSONL state, warns when all 3 legs present (OWASP ASI01, ASI02) |
| Update check | UserPromptSubmit | Checks for newer plugin versions (max 1x/24h, cached). Disable: LLM_SECURITY_UPDATE_CHECK=off |
All hooks are Node.js (.mjs) for cross-platform compatibility (macOS, Linux, Windows).
Important
Prompt injection scan, secret detection, path guarding, destructive commands, and supply chain guardrail are blocking — they prevent the operation if a pattern matches. Output verification and session guard are advisory — they warn but do not block. Update check is informational — notifies when a newer version is available. Prompt injection blocking can be changed to warn-only (
LLM_SECURITY_INJECTION_MODE=warn) or disabled (off) for security research or testing environments. Update check can be disabled withLLM_SECURITY_UPDATE_CHECK=off.
Knowledge Base
18 research-backed reference files grounding all analysis in published threat intelligence:
| File | Scope |
|---|---|
owasp-llm-top10.md |
OWASP LLM Top 10 (2025) — attack vectors, detection signals, Claude Code mitigations |
owasp-agentic-top10.md |
OWASP Agentic AI Top 10 (ASI01-ASI10) — agent-specific threats mapped to Claude Code |
owasp-skills-top10.md |
OWASP Skills Top 10 (AST01-AST10) — skill-specific threats and mitigations |
skill-threat-patterns.md |
7 threat categories from ToxicSkills/ClawHavoc research with concrete detection patterns |
mcp-threat-patterns.md |
9 MCP threat categories from MCPTox/Pillar Security/Invariant Labs/Operant AI research |
secrets-patterns.md |
30+ regex patterns for secret detection across 10 provider categories |
mitigation-matrix.md |
OWASP LLM Top 10 → Claude Code control mapping with verification checks and coverage scores |
top-packages.json |
Top 200 npm + top 100 PyPI package names for typosquatting detection (Levenshtein baseline) |
skill-registry.json |
Seed data for skill signature registry — known fingerprints and risk profiles |
compliance-mapping.md |
EU AI Act, NIST AI RMF, ISO 42001, MITRE ATLAS — article/control mappings to plugin capabilities |
norwegian-context.md |
Norwegian regulatory landscape — Datatilsynet, NSM, Digitaliseringsdirektoratet guidance for AI security |
prompt-injection-research-2025-2026.md |
7 research papers (2025-2026) with implications for hook defenses |
deepmind-agent-traps.md |
DeepMind AI Agent Traps — 6 categories, 43 techniques, coverage matrix |
attack-scenarios.json |
64 red-team scenarios across 12 categories for attack simulation |
attack-mutations.json |
Synonym tables and mutation rules for adaptive red-team testing |
typosquat-allowlist.json |
Allowlisted package names to reduce false positives in typosquatting detection |
ide-extension-threat-patterns.md |
10 IDE-extension detection categories (VS Code + JetBrains) with 2024-2026 case studies (GlassWorm, WhiteCobra, TigerJack, Material Theme) |
top-vscode-extensions.json |
Top ~100 VS Code Marketplace extension IDs (Levenshtein typosquat seed) + blocklist of known-malicious publisher.name entries |
Note
All knowledge base content is derived from published OWASP standards and peer-reviewed security research. The knowledge files provide grounding for agent analysis — agents read relevant sections before producing findings.
Compliance & Governance
v6.0.0 adds an enterprise governance layer for standards-aware security operations:
| Capability | Description |
|---|---|
| Compliance Mapping | Maps plugin capabilities to EU AI Act (Art. 9, 15, 17), NIST AI RMF (Map, Measure, Manage, Govern), ISO 42001 (Annex A), and MITRE ATLAS techniques. Posture categories 14-16 assess compliance readiness. |
| Norwegian Context | Regulatory guidance from Datatilsynet (DPIA for AI), NSM (basic security principles), and Digitaliseringsdirektoratet. Relevant for Norwegian public sector AI deployments. |
| SARIF 2.1.0 Output | --format sarif flag on scan/deep-scan produces OASIS SARIF standard output for CI/CD integration (GitHub Advanced Security, Azure DevOps, SonarQube). |
| Structured Audit Trail | JSONL audit events (audit-trail.mjs) with ISO 8601 timestamps, OWASP category tags, and SIEM-ready schema. Configurable via LLM_SECURITY_AUDIT_* env vars. |
| AI-BOM | CycloneDX 1.6 Bill of Materials for AI components — models, MCP servers, plugins, knowledge files, hooks. llm-security audit-bom <target>. |
| Policy-as-Code | .llm-security/policy.json for distributable hook configuration. Teams can enforce consistent security thresholds without per-developer env var setup. |
| Standalone CLI | node bin/llm-security.mjs scan <target> — runs scanners without Claude Code. Subcommands: scan, posture, audit-bom, benchmark. |
| CI/CD Integration | --fail-on <severity> for threshold-based exit codes, --compact for one-liner output. Pipeline templates for GitHub Actions, Azure DevOps, GitLab CI in ci/. Guide: docs/ci-cd-guide.md. |
Benchmarks
The attack simulator (llm-security benchmark) tests hook defenses with 64 crafted scenarios across 12 categories. Adaptive mode (--adaptive) applies 5 mutation rounds per passing scenario (homoglyph substitution, encoding variations, zero-width injection, case alternation, synonym replacement).
OWASP Coverage
| Category | Automated (Hooks) | Deterministic (Scanners) | Advisory (Commands) | Coverage |
|---|---|---|---|---|
| LLM01 Prompt Injection | Strong (input + output) | UNI + ENT + TNT | Scan, Audit | 95% |
| LLM02 Sensitive Info Disclosure | Strong | TNT + NET | Audit | 83% |
| LLM03 Supply Chain | Partial | ENT + DEP + GIT + NET | Scan, Plugin Audit, MCP Audit | 60% |
| LLM04 Data Poisoning | — | — | Threat Model | 40% |
| LLM05 Improper Output Handling | Strong (output scan) | — | Audit | 83% |
| LLM06 Excessive Agency | Strong | PRM | Posture | 100% |
| LLM07 System Prompt Leakage | — | — | Audit | 60% |
| LLM08 Vector/Embedding Weaknesses | — | — | Threat Model | 40% |
| LLM09 Misinformation | — | — | Advisory | 50% |
| LLM10 Unbounded Consumption | — | — | Pre-Deploy | 83% |
Average coverage: ~69%. Percentages reflect control-count coverage from knowledge/mitigation-matrix.md. Strongest in prompt injection (LLM01, 95% with runtime input/output scanning + obfuscation decoding) and agency controls (LLM06, 100%). Weakest in areas requiring model-provider or infrastructure controls (LLM04, LLM08), which are better addressed at the platform level.
Workflow Examples
1. Pre-Installation Gate
Evaluate a plugin or MCP server before installing it — locally or from a remote repo:
/security scan path/to/plugin # Quick scan with ALLOW/WARNING/BLOCK verdict
/security plugin-audit path/to/plugin # Deep trust assessment with Install/Review/Do Not Install
# → Install if both pass, investigate if flagged
# Remote repo — scans without installing (v2.4+)
/security scan https://github.com/org/repo --deep
/security scan https://github.com/org/repo --branch dev --deep
/security plugin-audit https://github.com/org/repo
2. Monthly Security Review
Regular cadence for maintaining security posture:
/security posture # 30-second baseline scorecard (16 categories)
/security audit # Full audit with A-F grade and action items
# → Fix critical/high findings
/security posture # Verify improvement
3. Track Security Over Time
Compare scan results against a stored baseline to see what changed:
/security diff path/to/project # First run creates baseline, subsequent runs show delta
# → Shows new, resolved, unchanged, and moved findings
/security watch path/to/project # Continuous: runs diff every 6h via /loop
4. Deep Threat Analysis
For new architectures, major changes, or compliance requirements:
/security threat-model # 15-30 min guided STRIDE × MAESTRO session
/security audit # Verify current controls against identified threats
/security pre-deploy # Pre-deployment checklist before production
5. Remediation
Fix findings from scans and audits:
/security clean path/to/project --dry-run # Preview fixes without modifying files
/security clean path/to/project # Auto-fix safe issues, confirm semi-auto, report manual
# → Review semi-auto proposals, handle manual findings
Prompt Injection Showcase (v5.0)
The examples/prompt-injection-showcase/ demonstrates runtime hook detection against 61 attack payloads across 19 categories — from classic instruction overrides to v5.0's Unicode steganography, HITL traps, NL indirection, hybrid P2SQL, and bash evasion techniques. Includes 6 false positive checks.
node examples/prompt-injection-showcase/run-showcase.mjs # Run all 61 payloads
node examples/prompt-injection-showcase/run-showcase.mjs --verbose # Show hook output
See examples/prompt-injection-showcase/README.md for the full category breakdown.
Security Assessment Demo
The examples/malicious-skill-demo/ directory contains a deliberately malicious plugin called "Project Health Dashboard" and a full security assessment produced by the combined LLM + deterministic scanning pipeline.
What it demonstrates: A single plugin that looks like a legitimate project health monitoring tool but embeds attacks across every threat category — prompt injection, data exfiltration, Unicode steganography, typosquatting, taint flows, persistence mechanisms, and more.
Key stats:
- 85 total findings (24 Critical, 24 High, 20 Medium, 6 Low, 11 Info)
- Verdict: BLOCK 100/100 — both LLM and deterministic scanners independently maxed the risk score
- All 9 deterministic scanners active — every scanner found findings
- 25 LLM findings detecting semantic patterns (social engineering, intent, context normalization)
- 60 deterministic findings detecting byte-level patterns (entropy, Unicode codepoints, taint flow, Levenshtein distance)
Run it yourself:
# Deterministic scanners only (~5 seconds)
node scanners/scan-orchestrator.mjs examples/malicious-skill-demo/evil-project-health/
# Full LLM-enhanced deep scan (both layers)
/security scan examples/malicious-skill-demo/evil-project-health/ --deep
Key takeaway: A single "Project Health Dashboard" plugin embedded 7 categories of attacks invisible to human review. The Unicode Tag steganography, base64-encoded exfiltration payloads, and one-character-off typosquatting packages would pass casual inspection. Automated scanning caught all of them.
Self-scan: scanning the scanner
Running node scanners/scan-orchestrator.mjs . on this plugin produces 0 findings (ALLOW) with ~190 suppressions via .llm-security-ignore.
Why ~190 suppressed? A security plugin that documents attack patterns, ships a malicious demo fixture, and tests against deliberately evil code will trigger its own scanners. The entropy scanner flags regex patterns in knowledge/secrets-patterns.md. The taint scanner flags eval(user_input) in test fixtures. The network scanner flags evil.com in documentation. The toxic flow analyzer flags the plugin's own commands that use Read+Bash (they're security scanners). Every suppression is explained in the ignore file. Remove .llm-security-ignore and re-run to see all ~190.
Architecture
flowchart TB
subgraph Runtime["Runtime Defense (9 hooks)"]
direction LR
H1["UserPromptSubmit<br/>Injection scan"]
H2["PreToolUse<br/>Secrets · Paths · Bash · Supply chain"]
H3["PostToolUse<br/>Output verify · Session guard"]
H4["Update check"]
end
subgraph Scanning["Deterministic Analysis (10+11 scanners)"]
direction LR
S1["UNI · ENT · PRM · DEP<br/>TNT · GIT · NET · MEM · SCR"]
S2["TFA<br/>Toxic flow correlator"]
S3["MCI · PST · BOM<br/>Standalone scanners"]
end
subgraph Advisory["Advisory Analysis (6 agents, 19 commands)"]
direction LR
A1["Skill Scanner<br/>7 threat categories"]
A2["MCP Scanner<br/>5-phase analysis"]
A3["Posture · Audit<br/>16 categories, A-F grade"]
A4["Threat Model<br/>STRIDE × MAESTRO"]
end
subgraph Knowledge["Knowledge Base (16 files)"]
direction LR
K1["5 OWASP frameworks"]
K2["Threat patterns<br/>Skills · MCP · Secrets"]
K3["Compliance · Research<br/>Registry · Packages"]
end
Runtime -->|"blocks/warns in real time"| User["Claude Code Session"]
User -->|"/security scan"| Scanning
User -->|"/security audit"| Advisory
Advisory -.->|"grounded by"| Knowledge
Scanning -->|"enriches"| Advisory
S1 -->|"prior results"| S2
Directory Structure
llm-security/
├── .claude-plugin/plugin.json # Manifest (v3.0.0)
├── CLAUDE.md # Plugin documentation
├── README.md # This file
├── LICENSE # MIT License
├── SECURITY.md # Vulnerability disclosure policy
├── package.json # type: module, engines, test script, bin field
├── bin/ # Standalone CLI
│ └── llm-security.mjs # node bin/llm-security.mjs scan/posture/audit-bom/benchmark
├── ci/ # CI/CD pipeline templates
│ ├── github-action.yml # GitHub Actions with SARIF upload
│ ├── azure-pipelines.yml # Azure DevOps with SARIF upload
│ └── gitlab-ci.yml # GitLab CI with SARIF upload
├── docs/ # Guides
│ └── ci-cd-guide.md # CI/CD integration guide (Schrems II, NSM)
├── commands/ # 18 slash commands
│ ├── security.md # Router + quick start
│ ├── scan.md # Supply chain gate (+ --deep, --fail-on, --compact, --format sarif)
│ ├── deep-scan.md # Deterministic-only deep scan
│ ├── diff.md # Compare scan against stored baseline
│ ├── watch.md # Continuous monitoring via /loop
│ ├── registry.md # Skill signature registry
│ ├── supply-check.md # Re-audit installed dependencies
│ ├── clean.md # Scan + remediate (auto/semi-auto/manual)
│ ├── dashboard.md # Cross-project security dashboard
│ ├── audit.md # Full project audit
│ ├── plugin-audit.md # Plugin trust assessment
│ ├── mcp-audit.md # MCP-focused audit (+ --live flag)
│ ├── mcp-inspect.md # Live MCP server inspection via JSON-RPC 2.0
│ ├── posture.md # Quick scorecard (16 categories)
│ ├── harden.md # Generate Grade A security config
│ ├── red-team.md # Attack simulation (64 scenarios, adaptive mode)
│ ├── threat-model.md # Interactive STRIDE/MAESTRO
│ └── pre-deploy.md # Deployment checklist
├── agents/ # 6 specialized agents
│ ├── skill-scanner-agent.md # 7 threat categories
│ ├── mcp-scanner-agent.md # 5-phase MCP analysis
│ ├── posture-assessor-agent.md # 16-category assessment
│ ├── threat-modeler-agent.md # STRIDE × MAESTRO interview
│ ├── deep-scan-synthesizer-agent.md # JSON → human-readable report
│ └── cleaner-agent.md # Semi-auto remediation proposals
├── scanners/ # 10 orchestrated + 11 standalone
│ ├── scan-orchestrator.mjs # Entry point — runs all 10 orchestrated, outputs JSON
│ ├── posture-scanner.mjs # Standalone: 16-category posture assessment, <50ms
│ ├── attack-simulator.mjs # Standalone: red-team harness, 64 scenarios, adaptive mode
│ ├── ai-bom-generator.mjs # Standalone: CycloneDX 1.6 AI Bill of Materials
│ ├── dashboard-aggregator.mjs # Standalone: cross-project dashboard aggregation
│ ├── reference-config-generator.mjs # Standalone: Grade A config generation
│ ├── supply-chain-recheck-cli.mjs # Standalone: CLI for supply chain re-audit
│ ├── auto-cleaner.mjs # Standalone: remediation engine — 16 fix ops, atomic writes
│ ├── content-extractor.mjs # Standalone: pre-extracts evidence, strips injection patterns
│ ├── mcp-live-inspect.mjs # Standalone: live MCP server inspection via JSON-RPC 2.0
│ ├── watch-cron.mjs # Standalone: cron wrapper for background scanning
│ ├── lib/
│ │ ├── severity.mjs # Constants, risk score, verdict logic
│ │ ├── string-utils.mjs # Entropy, Levenshtein, base64, redact, obfuscation decoders
│ │ ├── injection-patterns.mjs # Shared prompt injection patterns (21 critical, 8 high, 15 medium)
│ │ ├── output.mjs # Finding/result builders, JSON envelope
│ │ ├── diff-engine.mjs # Baseline storage, fingerprinting, diff categorization
│ │ ├── skill-registry.mjs # Fingerprinting, caching, pattern search
│ │ ├── file-discovery.mjs # Walk tree, filter, binary detect
│ │ ├── yaml-frontmatter.mjs # Regex-based frontmatter parser
│ │ ├── git-clone.mjs # Sandboxed clone/cleanup (sandbox-exec + git config hardening)
│ │ ├── fs-utils.mjs # Backup, restore, cleanup, tmppath (UUID-unique) utilities
│ │ ├── bash-normalize.mjs # Bash evasion normalization (empty quotes, ${}, backslash)
│ │ ├── supply-chain-data.mjs # Shared blocklists and supply chain data
│ │ ├── sarif-formatter.mjs # OASIS SARIF 2.1.0 output formatter
│ │ ├── audit-trail.mjs # Structured JSONL audit events (ISO 8601, OWASP tags)
│ │ ├── bom-builder.mjs # CycloneDX BOM construction
│ │ ├── distribution-stats.mjs # Statistical analysis (Jensen-Shannon divergence)
│ │ ├── policy-loader.mjs # Reads .llm-security/policy.json for distributable config
│ │ └── mcp-description-cache.mjs # MCP tool description caching + drift detection
│ ├── unicode-scanner.mjs # Zero-width, Tags, BIDI, homoglyphs
│ ├── entropy-scanner.mjs # Shannon entropy, base64/hex detection
│ ├── permission-mapper.mjs # Plugin permission analysis
│ ├── dep-auditor.mjs # CVE, typosquatting, install scripts
│ ├── taint-tracer.mjs # Source-to-sink data flow tracing
│ ├── git-forensics.mjs # Rug pull signals, history analysis
│ ├── network-mapper.mjs # URL discovery, DNS, domain classification
│ ├── memory-poisoning-scanner.mjs # Injection in CLAUDE.md, memory, rules files
│ ├── supply-chain-recheck.mjs # Re-audit installed deps from lockfiles
│ └── toxic-flow-analyzer.mjs # Post-processing correlator: lethal trifecta detection
├── hooks/ # 9 automated hooks
│ ├── hooks.json # Hook registration
│ └── scripts/
│ ├── pre-prompt-inject-scan.mjs # 21 critical + 8 high + 15 medium patterns, obfuscation decode, configurable mode
│ ├── pre-edit-secrets.mjs # 13 secret patterns, knowledge/ exclusion
│ ├── pre-write-pathguard.mjs # 8 path categories (env, ssh, aws, gnupg, creds, hooks, system, settings)
│ ├── pre-bash-destructive.mjs # 8 block + 6 warn rules, T1-T6 bash-normalize
│ ├── pre-install-supply-chain.mjs # 7 package managers, CVE/typosquat/age-gate
│ ├── pre-compact-scan.mjs # PreCompact: scans transcript tail (500 KB) for injection before compaction, mode: block/warn/off
│ ├── post-mcp-verify.mjs # Advisory: ALL tools injection scan, Bash secrets/URLs/size
│ ├── post-session-guard.mjs # Advisory: runtime trifecta detection (sliding window, JSONL state)
│ └── update-check.mjs # Informational: version check (1x/24h, cached, disable: LLM_SECURITY_UPDATE_CHECK=off)
├── knowledge/ # 16 reference files
│ ├── owasp-llm-top10.md
│ ├── owasp-agentic-top10.md
│ ├── owasp-skills-top10.md # OWASP Skills Top 10 (AST01-AST10)
│ ├── skill-threat-patterns.md
│ ├── mcp-threat-patterns.md
│ ├── secrets-patterns.md
│ ├── mitigation-matrix.md
│ ├── compliance-mapping.md # EU AI Act, NIST AI RMF, ISO 42001, MITRE ATLAS
│ ├── norwegian-context.md # Datatilsynet, NSM, Digitaliseringsdirektoratet
│ ├── deepmind-agent-traps.md # 6 categories, 43 techniques
│ ├── prompt-injection-research-2025-2026.md # 7 research papers
│ ├── attack-scenarios.json # 64 red-team scenarios across 12 categories
│ ├── attack-mutations.json # Synonym tables for adaptive testing
│ ├── typosquat-allowlist.json # False positive reduction
│ ├── top-packages.json # Top 200 npm + 100 PyPI for typosquatting
│ └── skill-registry.json # Seed data for skill signature registry
├── tests/ # Test suite (node:test, zero external deps)
│ ├── lib/ # Unit tests for shared library
│ ├── scanners/ # Integration tests against fixture
│ └── fixtures/ # Test-specific data (dep-test)
├── reports/ # Scan reports (.docx + .md source)
│ ├── baselines/ # Stored scan baselines for diff comparison
│ └── watch/ # Cron scan results (latest.json) + config
├── examples/ # Demo fixtures
│ └── malicious-skill-demo/ # Regression test (47+ findings, BLOCK)
└── templates/ # Report templates (1 unified + archive)
├── unified-report.md # All 9 analysis types via conditional sections
└── archive/ # 9 original templates preserved for reference
~25,400 lines across ~100 active files (+10 archived). Minimal persistent state: scan baselines in reports/baselines/, watch results in reports/watch/, skill registry in reports/skill-registry.json, session guard JSONL in /tmp/, update-check cache in ~/.cache/. All scan outputs generated fresh per invocation.
What This Plugin Does Not Cover
| Area | Why | Alternative |
|---|---|---|
| CLAUDE.md poisoning (post-clone) | Once a repo is cloned, CLAUDE.md loads into the system prompt before any hooks run. No hook-based solution can intercept this after cloning. This is exactly why you should scan repos remotely before cloning: /security scan https://repo-url --deep analyzes CLAUDE.md and all other files via the pre-extraction layer without ever loading them into your session. |
Always scan before cloning unknown repos. For repos already cloned: manually review CLAUDE.md before opening with Claude Code. See context-filter for experimental OS-level interposition (macOS only, requires re-signing after Claude Code updates). |
| ML-based injection classification | Regex patterns cannot catch novel phrasings, multilingual injection, or adversarial rephrasing that semantic models can. | Use parry-guard alongside this plugin for DeBERTa/Llama Prompt Guard 2 ML classification. |
| Enterprise SSO/SCIM | Platform-level configuration | Anthropic Admin Console |
| RAG infrastructure | Vector DB / embedding pipeline security | Dedicated RAG security tools |
| LLM gateway/proxy | Network infrastructure | API gateway solutions |
| SIEM integration | Organization security stack | Splunk, Sentinel, etc. |
| Runtime scheming detection | The session guard hook detects lethal trifecta patterns (a known attack sequence), but general scheming — where an agent pursues hidden goals through novel strategies — remains fundamentally hard for any tool. | Session guard provides partial coverage. Full scheming detection requires monitoring + human oversight |
These gaps are surfaced advisorily through /security threat-model and /security pre-deploy.
Complementary Tools
This plugin provides full-stack security hardening (static analysis + supply chain + audit + threat modeling). For organizations wanting defense in depth, these tools cover areas we intentionally leave to specialists:
| Tool | What It Adds | How It Complements |
|---|---|---|
| parry-guard | ML-based prompt injection detection (DeBERTa v3 + Llama Prompt Guard 2 86M) in Rust. Fail-closed: uncertain = unsafe. | Our regex patterns catch known injection signatures. parry-guard catches novel phrasings, multilingual injection, and adversarial rephrasing via semantic ML models. No overlap, no conflict. |
| Lasso claude-hooks | Warn-and-continue PostToolUse hook. 96 patterns across 5 categories. allowManagedHooksOnly for team deployment. |
Different philosophy: Lasso warns but never blocks, letting Claude decide with context. Our hooks block critical patterns. Both can run together; hooks execute sequentially. |
| Snyk agent-scan | Commercial skills/MCP scanning with a larger dataset (3,984 skills analyzed). Tool poisoning and shadowing detection. | Our skill-scanner-agent covers the same 7 threat categories. Snyk has a larger training set from scanning the full ClawHub marketplace. Use both for maximum coverage. |
Tip
Recommended combo: llm-security (breadth: static + supply chain + audit + posture + threat modeling) + parry-guard (depth: ML injection classification). They cover different layers with no conflicts.
Compatibility
- Claude Code: v2.x+
- Platform: macOS, Linux, Windows (all hooks are Node.js
.mjs) - Node.js: Required for hook scripts (any recent LTS version)
- Overlap with claude-code-essentials: Safe to run both. This plugin extends
claude-code-essentialswith path guarding and MCP verification. Duplicate blocking is harmless — hooks run sequentially.
Version History
| Version | Date | Highlights |
|---|---|---|
| 6.6.0 | 2026-04-18 | JetBrains/IntelliJ plugin scanning. /security ide-scan now covers JetBrains IDEs (IntelliJ IDEA, PyCharm, GoLand, WebStorm, RubyMine, PhpStorm, CLion, DataGrip, RustRover, Rider, Aqua, Writerside, Android Studio) — Fleet and Toolbox excluded. OS-aware discovery of ~/Library/Application Support/JetBrains/<IDE><version>/plugins/ (macOS), %APPDATA%\JetBrains\... (Windows), ~/.config/JetBrains/... (Linux). Zero-dep parsers for META-INF/plugin.xml and META-INF/MANIFEST.MF with nested-jar extraction. 7 JetBrains-specific checks: theme-with-code, broad activation (application-components), Premain-Class instrumentation (HIGH — javaagent retransform), native binaries (.so/.dylib/.dll/.jnilib), long <depends> chains (supply-chain pressure), typosquat vs top JetBrains plugins, shaded-jar advisory. URL fetch for plugins.jetbrains.com/plugin/<numericId>-<slug> + direct /plugin/download?pluginId=<xmlId>; metadata resolves numericId → xmlId before download. .kt, .groovy, .scala added to taint-tracer code extensions. Reuses existing OS sandbox (lib/vsix-sandbox.mjs parameterized via buildSandboxedWorker(..., workerPath)). Knowledge: knowledge/jetbrains-marketplace-api-notes.md, expanded knowledge/ide-extension-threat-patterns.md, seeded knowledge/top-jetbrains-plugins.json. 1461 tests (was 1352). |
| 6.5.0 | 2026-04-17 | OS sandbox for /security ide-scan <url>. VSIX fetch + extract now runs in a sub-process wrapped by sandbox-exec (macOS) or bwrap (Linux), reusing the same primitives proven by the v5.1 git-clone sandbox. Defense-in-depth — even if lib/zip-extract.mjs ever has a bypass, the kernel refuses any write outside the per-scan temp directory. New: lib/vsix-fetch-worker.mjs (sub-process worker with deterministic JSON-line IPC) and lib/vsix-sandbox.mjs (buildSandboxProfile / buildBwrapArgs / buildSandboxedWorker / runVsixWorker, 35 s timeout, 1 MB stdout cap). New scan(target, { useSandbox }) option (default true for CLI; tests use false since globalThis.fetch mocks do not cross processes). Windows fallback: in-process with meta.warnings advisory. Envelope meta.source.sandbox field: 'sandbox-exec' | 'bwrap' | 'none' | 'in-process'. 1352 tests (was 1344). |
| 6.4.0 | 2026-04-17 | /security ide-scan <url> — pre-install verification. The IDE extension scanner now accepts URLs and fetches the VSIX before scanning. Supported: VS Code Marketplace (https://marketplace.visualstudio.com/items?itemName=publisher.name), OpenVSX (https://open-vsx.org/extension/publisher/name[/version]), and direct .vsix URLs. New libraries: lib/vsix-fetch.mjs (HTTPS-only fetch with 50MB cap, 30s timeout, SHA-256, manual host-whitelisted redirects) and lib/zip-extract.mjs (zero-dep ZIP parser, rejects zip-slip / symlinks / absolute paths / drive letters / encrypted entries / ZIP64; caps: 10 000 entries, 500MB uncompressed, 100x expansion ratio, depth 20). Temp dir always cleaned in try/finally. Envelope meta.source carries { type: "url", kind, url, finalUrl, sha256, size, publisher, name, version }. New knowledge file: marketplace-api-notes.md. GitHub repo URLs intentionally not supported (would require a build step). 1344 tests (was 1296). |
| 6.3.0 | 2026-04-17 | IDE extension prescan. New /security ide-scan command and ide-extension-scanner.mjs (prefix IDE) discover and audit installed VS Code extensions (and forks: Cursor, Windsurf, VSCodium, code-server, Insiders, Remote-SSH; JetBrains is a v1.1 stub). 7 IDE-specific checks: blocklist match, theme-with-code, sideload (.vsix), broad activation (*, onStartupFinished), Levenshtein typosquat ≤2 vs top-100, extension-pack expansion, dangerous vscode:uninstall hooks. Per-extension orchestration of UNI/ENT/NET/TNT/MEM/SCR scanners with bounded concurrency. OS-aware discovery via lib/ide-extension-discovery.mjs (Platform-specific suffix parsing for darwin-x64, linux-arm64, etc.). Offline-first; --online opt-in for future Marketplace/OSV.dev lookups. New knowledge files: ide-extension-threat-patterns.md (10 categories, 2024-2026 case studies from Koi Security — GlassWorm, WhiteCobra, TigerJack, Material Theme), top-vscode-extensions.json (typosquat seed + blocklist), top-jetbrains-plugins.json (stub). 1296 tests (was 1274). |
| 6.2.0 | 2026-04-17 | Opus 4.7 + Claude Code 2.1.112 alignment. Bash-normalize extended with T5 (${IFS} word-splitting) and T6 (ANSI-C $'\xHH' hex quoting) layers. New pre-compact-scan.mjs PreCompact hook — scans transcript tail (500 KB cap, <500 ms) for injection + credentials before context compaction. Modes: block / warn / off via LLM_SECURITY_PRECOMPACT_MODE. Agent files reframed for Opus 4.7's more literal instruction-following (Step 0 generaliseringsgrense + parallell Read-hint in skill-scanner + mcp-scanner). New docs/security-hardening-guide.md with env-var reference, sandboxing notes, system-card §5.2.1 / §6.3.1.1 mapping. CLAUDE.md Defense Philosophy links to system card. 1274 tests (was 1264). |
| 6.1.0 | 2026-04-10 | CI/CD integration. --fail-on <severity> flag for threshold-based exit codes (exit 1 if findings at/above level). --compact output mode (one-liner per finding). Policy ci section in policy.json. Pipeline templates: GitHub Actions, Azure DevOps, GitLab CI with SARIF upload. CI/CD guide (docs/ci-cd-guide.md) with Schrems II/NSM compliance docs. npm publish preparation (files whitelist). 1264 tests. |
| 6.0.0 | 2026-04-10 | CAISS-readiness release. Enterprise compliance and governance layer: compliance mapping (EU AI Act, NIST AI RMF, ISO 42001, MITRE ATLAS), Norwegian regulatory context (Datatilsynet, NSM, Digitaliseringsdirektoratet), SARIF 2.1.0 output format (--format sarif), structured JSONL audit trail (audit-trail.mjs), AI-BOM generator (CycloneDX 1.6), policy-as-code (.llm-security/policy.json), standalone CLI (bin/llm-security.mjs — node bin/llm-security.mjs scan). Posture scanner expanded to 16 categories (+EU AI Act, NIST AI RMF, ISO 42001). Attack simulator benchmark mode (--benchmark). 15 knowledge docs, 16 scanners, 1242+ tests. |
| 5.1.0 | 2026-04-07 | Sandboxed remote cloning. Defense-in-depth for git clone attack surface: (1) 8 git config flags disable hooks, symlinks, filter/smudge drivers, fsmonitor, local file protocol; 4 env vars isolate from system/user config. (2) OS sandbox: macOS sandbox-exec + Linux bubblewrap restrict file writes to only the clone temp dir. Graceful fallback on Windows (git config only). Post-clone size check (100MB max). UUID-unique evidence filenames prevent race conditions. Cleanup guarantee in scan/plugin-audit commands. 1147 tests (was 1115). |
| 5.0.0 | 2026-04-06 | Prompt Injection Hardening (v5.0). 8-session defense-in-depth overhaul driven by 7 research papers (2025-2026). MEDIUM advisory for obfuscation signals (leetspeak, homoglyphs, zero-width, multi-language). Unicode Tag steganography detection (U+E0000-E007F). Bash expansion normalization (bash-normalize.mjs). Rule of Two enforcement (configurable LLM_SECURITY_TRIFECTA_MODE=block|warn|off). 100-call long-horizon monitoring window with slow-burn trifecta detection. Behavioral drift via Jensen-Shannon divergence. HITL trap detection (approval urgency, summary suppression, scope minimization). Sub-agent delegation tracking (escalation-after-input advisory). NL indirection patterns. Hybrid attacks (P2SQL, recursive injection, XSS-in-agent). CaMeL-inspired data flow tagging (SHA-256 provenance, output-to-input linking). Adaptive red-team (5 mutation rounds per scenario: homoglyph, encoding, zero-width, case alternation, synonym). Knowledge base expanded: prompt-injection-research-2025-2026.md, deepmind-agent-traps.md, attack-mutations.json. Posture scanner expanded to 13 categories (+Prompt Injection Hardening, Rule of Two, Long-Horizon Monitoring). Defense Philosophy section documenting honest limitations. 1115 tests. |
| 4.5.1 | 2026-04-04 | Cross-platform support. Windows/Linux compatibility: fileURLToPath(), path.dirname(), native fetch() replaces curl subprocess, fixed tilde expansion regex. 11 files, 782 tests pass. |
| 4.5.0 | 2026-04-04 | Attack simulation / red-team mode. New attack-simulator.mjs runs 38 crafted attack scenarios across 7 categories (secrets, destructive, supply-chain, prompt-injection, pathguard, mcp-output, session-trifecta) against the plugin's own hooks. Data-driven via knowledge/attack-scenarios.json with runtime payload assembly. New /security red-team command with --category filter. Capstone release: v4.0 roadmap complete (S1-S6). 18 commands, 16 scanners (10 orchestrated + 6 standalone). 782 tests. |
| 4.4.0 | 2026-04-03 | Cross-project security dashboard. New dashboard-aggregator.mjs discovers all Claude Code projects under ~/ (depth 3) and ~/.claude/plugins/, runs posture-scanner on each. Machine grade = weakest link. Cache in ~/.cache/llm-security/dashboard-latest.json (24h staleness). New /security dashboard command. 17 commands, 15 scanners (10 orchestrated + 5 standalone). 751 tests. |
| 4.3.0 | 2026-04-03 | Enhanced MCP session monitoring. MCP description drift detection via mcp-description-cache.mjs — caches tool descriptions, alerts on >10% Levenshtein drift (OWASP MCP05 rug-pull). MCP-concentrated trifecta in post-session-guard.mjs — elevated severity when all 3 lethal trifecta legs trace to the same MCP server. Cumulative data volume tracking (100KB/500KB/1MB thresholds, OWASP ASI02). Per-MCP-tool volume tracking in post-mcp-verify.mjs (>100KB per tool = advisory). 735 tests. |
| 4.2.0 | 2026-04-03 | Supply chain re-check scanner. New supply-chain-recheck.mjs (prefix SCR) periodically re-audits installed dependencies from lockfiles against blocklists, OSV.dev batch API, and typosquat detection. Shared data module extracts blocklists from hook. New /security supply-check command. 16 commands, 14 scanners (10 orchestrated + 4 standalone). 700 tests. |
| 4.1.0 | 2026-04-03 | Reference configuration generator. New /security harden command generates Grade A security config based on posture scanner gaps. New reference-config-generator.mjs standalone scanner detects project type (plugin/monorepo/standalone) and generates settings.json (deny-first), CLAUDE.md security section, and .gitignore additions. --dry-run (default) shows JSON output; --apply writes files with backup. Post-apply verification re-runs posture scanner. Templates in templates/reference-config/. 15 commands, 12 scanners (9 orchestrated + 4 standalone). 670 tests. |
| 4.0.0 | 2026-04-03 | Deterministic posture scanner. New posture-scanner.mjs — standalone scanner (prefix PST) replacing Opus agent for /security posture. 10 categories assessed in <50ms (was ~6 min). Categories: Deny-First, Secrets, Path Guarding, MCP Trust, Destructive Blocking, Sandbox, Human Review, Plugin Sources, Session Isolation, Cognitive State Security. Reuses scanForInjection() and gradeFromPassRate(). /security audit now runs scanner first for instant data, then agents for narrative. 12 scanners (9 orchestrated + 3 standalone). 647 tests. |
| 3.1.1 | 2026-04-03 | Memory poisoning scanner (Cognitive State Traps). New memory-poisoning-scanner.mjs — scanner #9 in orchestrator (prefix MEM, OWASP: LLM01+ASI02). Detects 6 threat categories in CLAUDE.md, memory files, .claude/rules, REMEMBER.md, and *.local.md: injection patterns (via shared injection-patterns.mjs), shell commands in memory files, suspicious exfiltration URLs (webhook.site/ngrok/pipedream/etc.), credential path references (.ssh/.aws/id_rsa/credentials.json), permission expansion directives (bypassPermissions/dangerouslySkipPermissions), encoded payloads (base64 >40 chars, hex >64 chars). Posture assessor gains Category 10: Cognitive State Security. 11 scanners (9 orchestrated + 2 standalone). 606 tests (was 588). |
| 3.1.0 | 2026-04-03 | AI Agent Traps defense. Gap analysis against AI Agent Traps (Franklin et al., Google DeepMind, 2025). New detections: HTML/CSS content obfuscation (6 patterns for display:none, visibility:hidden, off-screen positioning, zero font-size/opacity, aria-label injection), oversight evasion (9 patterns for educational/hypothetical/red-team/research framing), markdown syntactic masking (anchor text injection payloads). Encoding hardening: HTML entity decoding (named, decimal, hex), recursive multi-layer decode (max 3 iterations), letter-spacing collapse. post-mcp-verify hook gains HTML content trap detection for WebFetch/Read/MCP output. Knowledge base updated with Agent Traps taxonomy mapping. 588 tests (was 544). |
| 3.0.0 | 2026-04-03 | Public release. 8 sessions from v2.5→v3.0. New in v3: toxic flow analysis (TFA scanner — lethal trifecta detection via cross-component correlation), runtime session guard (PostToolUse trifecta monitoring with sliding window), MCP live inspection (JSON-RPC 2.0 connect to running servers), report diffing with baselines (fuzzy matching, new/resolved/moved), continuous scanning (watch command + cron wrapper), skill signature registry (SHA-256 fingerprinting + cache). 4 OWASP frameworks (LLM Top 10, Agentic AI, Skills, MCP). 15 commands, 8 hooks, 10 scanners (8 orchestrated + 2 standalone), 6 agents, 9 knowledge files, 544 tests. Architecture diagram added. |
| 2.9.2 | 2026-04-03 | Skill signature registry. New skill-registry.mjs library for SHA-256 fingerprinting of normalized skill content, scan result caching, and pattern search. New /security registry command with stats, scan+register, and search sub-commands. /security scan now checks registry before full scan — instant result for known fingerprints (7-day staleness threshold). Seed data in knowledge/skill-registry.json, active registry in reports/skill-registry.json. 15 commands, 9 knowledge files total. |
| 2.9.1 | 2026-04-03 | Continuous/background scanning. New /security watch [path] [--interval 6h] command uses the built-in /loop skill to run /security diff on a recurring interval. New watch-cron.mjs standalone script for system cron/launchd — reads multi-target config from reports/watch/config.json, writes summary to reports/watch/latest.json, exits with worst verdict code (0/1/2). 13 commands total. |
| 2.9.0 | 2026-04-03 | Report diffing & baseline. New diff-engine.mjs library for finding fingerprinting, fuzzy line matching (±3), and diff categorization (new/resolved/unchanged/moved). Scan orchestrator gains --baseline and --save-baseline flags. Baselines stored per target hash in reports/baselines/. New /security diff command compares current scan against stored baseline and shows delta. 12 commands total. |
| 2.8.1 | 2026-04-03 | Auto update notifications. New update-check.mjs UserPromptSubmit hook checks for newer plugin versions against the public Forgejo repo (max 1x/24h, cached in ~/.cache/llm-security/). Notifies via systemMessage when a newer version is available. Disable: LLM_SECURITY_UPDATE_CHECK=off. 8 hooks total. |
| 2.8.0 | 2026-04-02 | MCP Runtime Inspection. New mcp-live-inspect.mjs standalone scanner connects to MCP stdio servers via JSON-RPC 2.0, fetches live tool/prompt/resource lists, scans descriptions for injection (MCP03, MCP06), tool shadowing across servers (MCP09), URL/IP in descriptions. New /security mcp-inspect command. /security mcp-audit --live flag for combined static + live analysis with cross-reference escalation. Scanner prefix: MCI. 9 scanners (8 orchestrated + 1 standalone), 11 commands total. |
| 2.7.1 | 2026-04-02 | Runtime session guard hook. PostToolUse hook monitoring tool call sequences for lethal trifecta (untrusted input + sensitive data access + exfiltration sink). Sliding window (20 calls), per-session JSONL state, advisory warning. 7 hooks total. |
| 2.7.0 | 2026-04-02 | Toxic flow analysis scanner. 8th deterministic scanner detecting lethal trifecta patterns in plugin component definitions. Post-processing correlator consuming output from all prior scanners. Direct, cross-component, and project-level trifecta detection with mitigation downgrades. |
| 2.6.0 | 2026-04-02 | MEDIUM injection patterns + 4-framework OWASP mapping. Added ~15 MEDIUM-severity patterns (base64 payloads, leetspeak, homoglyphs). Full OWASP mapping: LLM Top 10, Agentic AI Top 10 (ASI), Skills Top 10 (AST), MCP Top 10. New knowledge file owasp-skills-top10.md. 8 knowledge files total. |
| 2.5.0 | 2026-04-02 | Pre-extraction indirection layer for remote scan defense. Remote scans now pre-extract structured evidence via content-extractor.mjs and strip injection patterns BEFORE LLM agents see the content. Agents analyze a JSON evidence package, never raw files from untrusted repos. [INJECTION-PATTERN-STRIPPED] markers are confirmed findings. |
| 2.4.0 | 2026-04-01 | GitHub repo URL support for scan and plugin-audit. scan and plugin-audit accept https://github.com/... URLs directly. Clones to temp dir via scanners/lib/git-clone.mjs, scans locally, cleans up. --branch <name> flag for non-default branches. |
| 2.3.0 | 2026-04-01 | PostToolUse expanded to ALL tools + configurable injection mode. 498 tests (was 470). PostToolUse hook now scans Read, WebFetch, MCP, and all other tool output for indirect prompt injection (was Bash-only). Bash-specific checks (secrets, URLs, large output) preserved. Short output skip (<100 chars) for performance. LLM_SECURITY_INJECTION_MODE env var: block (default), warn (advisory-only), off (disable). Complementary Tools section documenting parry-guard, Lasso, Snyk compatibility. CLAUDE.md poisoning gap documented as known limitation. |
| 2.2.0 | 2026-04-01 | Prompt injection runtime defense (Gaps 1-3). 470 tests (was 383). New UserPromptSubmit hook blocks injection in user input. post-mcp-verify extended with indirect injection scanning in tool output (LLM01). Obfuscation decoding: unicode-escape, hex-escape, URL-encoding, base64 normalization before pattern matching. Shared injection-patterns.mjs module with 21 critical + 8 high patterns from skill-scanner-agent Category 1. LLM01 coverage 83%->95%, LLM05 80%->83%. |
| 2.1.0 | 2026-04-01 | 383 tests (was 177): full hook coverage (66 tests), auto-cleaner coverage (140 tests), auto-cleaner import guard fix, solo project (CONTRIBUTING.md removed), HTTPS install URL under fromaitochitta org, model defaults set to sonnet |
| 2.0.0 | 2026-03-31 | Open-source release: MIT LICENSE, SECURITY.md, test suite (node:test), path guarding hook (pre-write-pathguard.mjs), supply chain hook documentation, version alignment, .gitignore, .editorconfig |
| 1.4.0 | 2026-02-21 | Unified risk scoring formula (25/10/4/1 weights), score-based verdicts, risk bands (Low→Extreme), OWASP categorization, A-F grading function, single unified-report.md template replacing 9 separate templates with conditional sections per analysis type |
| 1.3.0 | 2026-02-21 | /security clean command with 3-tier remediation (auto/semi-auto/manual), auto-cleaner.mjs engine (16 fix operations, atomic writes, post-fix validation), cleaner-agent for semi-auto proposals, clean-report.md template, --dry-run flag |
| 1.2.0 | 2026-02-19 | 7 deterministic Node.js scanners (unicode, entropy, permissions, dependencies, taint, git forensics, network), deep-scan command + --deep flag, synthesizer agent, shared scanner library, demo fixture with 85-finding security assessment, OWASP coverage improvements (LLM01 70→85%, LLM02 90→95%, LLM03 80→90%, LLM06 85→95%) |
| 1.1.0 | 2026-02-19 | Plugin audit command (/security plugin-audit), MCP audit command (/security mcp-audit), pre-deployment checklist (/security pre-deploy), 3 new report templates, updated OWASP coverage (LLM03 75%→80%) |
| 1.0.0 | 2026-02-19 | Initial release — 4 agents, 4 hooks, 6 knowledge files (2,771 lines), 8 commands, 7 report templates. OWASP LLM Top 10 + Agentic AI Top 10 coverage |
License & Attribution
This project is licensed under the MIT License.
Knowledge base files in knowledge/ are derived from published OWASP standards and security research papers. OWASP content is used under the CC BY-SA 4.0 license.
Threat intelligence sources: AI Agent Traps (Franklin et al., Google DeepMind, 2025), ToxicSkills (Xi'an Jiaotong, 2025), ClawHavoc (Repello AI, 2025), MCPTox (Invariant Labs, 2025), Pillar Security MCP Research (2025), Operant AI Agentic Security (2025).
The plugin architecture, scan pipeline, threat detection patterns, and security assessment methodology are original work.
Part of From AI to Chitta. Source: git.fromaitochitta.com/open/claude-code-llm-security.
Feedback & Requests
- Bug reports: Open an issue on Forgejo
- Feature requests: Open an issue with a
[Request]prefix - Security vulnerabilities: See SECURITY.md — do not open a public issue
- General questions: Email security@fromaitochitta.com or use the contact form
Contributing
This is a solo project. See Feedback & Requests for how to report bugs or suggest features. Pull requests are not accepted.
Microsoft and OWASP product names are trademarks of their respective owners. This project is not endorsed by or affiliated with any referenced organization.