E2E verification against content-heavy repo (`content-claude-code`) revealed 413 entropy findings (8 HIGH / 405 MEDIUM) from markdown image CDN URLs in JSON content indexes — e.g., ``. These are legitimate content-repo artifacts, not credentials. The 40-char hash segment in the CDN URL trips Shannon entropy (H=5.29 over 300 chars), and rule 13 (inline <svg>) doesn't match since there's no literal `<svg>` tag — the `.svg` is just a URL path suffix. Added rule 18 `MARKDOWN_IMAGE = /!\[[^\]]*\]\(\s*https?:\/\//` — matches `` / ``. Line-level (not string-level) so URL is not over-specific. E2E impact on `content-claude-code`: - Before: BLOCK / 65 / 8H 437M 0L - After: WARNING / 56 / 3H 427M 0L Hyperframes unchanged: BLOCK / 80 / 1C 4H 92M — real CRITICAL SQL-injection and HIGH findings still detected. Tests: 2 new (positive + negative fixture) bringing entropy-context to 26, total suite 1485 → 1487. Docs updated to "rules 11-18" and "8 new line-suppression rules". Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
892 lines
80 KiB
Markdown
892 lines
80 KiB
Markdown
# LLM Security Plugin for Claude Code
|
||
|
||
> Automated defense and advisory analysis for the agentic AI attack surface.
|
||
|
||
*Built for my own Claude Code workflow and shared openly for anyone who finds it useful. This is a solo project — bug reports and feature requests are welcome, but pull requests are not accepted.*
|
||
|
||
*AI-generated: all code produced by Claude Code through dialog-driven development. [Full disclosure →](../../README.md#ai-generated-code-disclosure)*
|
||
|
||

|
||

|
||

|
||

|
||

|
||

|
||

|
||
|
||
A Claude Code plugin that provides security scanning, auditing, and threat modeling for agentic AI projects. Built on [OWASP LLM Top 10 (2025)](https://genai.owasp.org/llm-top-10/), [OWASP Agentic AI Top 10](https://genai.owasp.org/agentic-ai/), and the [AI Agent Traps](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6372438) taxonomy (Google DeepMind, 2025), with threat intelligence from ToxicSkills, ClawHavoc, MCPTox, Pillar Security, Invariant Labs, and Operant AI research.
|
||
|
||
---
|
||
|
||
## Table of Contents
|
||
|
||
- [What Is This?](#what-is-this)
|
||
- [The Extension Security Problem](#the-extension-security-problem)
|
||
- [Quick Start](#quick-start)
|
||
- [Commands](#commands)
|
||
- [Agent Architecture](#agent-architecture)
|
||
- [Deterministic Scanners](#deterministic-scanners)
|
||
- [Automated Hooks](#automated-hooks)
|
||
- [Knowledge Base](#knowledge-base)
|
||
- [OWASP Coverage](#owasp-coverage)
|
||
- [Workflow Examples](#workflow-examples)
|
||
- [Security Assessment Demo](#security-assessment-demo)
|
||
- [Architecture](#architecture)
|
||
- [What This Plugin Does Not Cover](#what-this-plugin-does-not-cover)
|
||
- [Compatibility](#compatibility)
|
||
- [Version History](#version-history)
|
||
- [Feedback & Requests](#feedback--requests)
|
||
- [Contributing](#contributing)
|
||
- [License & Attribution](#license--attribution)
|
||
|
||
---
|
||
|
||
## What Is This?
|
||
|
||
Claude Code plugins, MCP servers, and agentic workflows introduce attack surfaces that traditional security tools don't cover: prompt injection, tool poisoning, secret exfiltration through tool outputs, supply chain attacks via malicious skills, and excessive agency.
|
||
|
||
This plugin provides three layers of protection:
|
||
|
||
- **Automated enforcement** — 9 hooks that block dangerous operations in real time (prompt injection in user input, secrets in code, writes to sensitive paths, destructive shell commands, supply chain guardrails, suspicious tool output, runtime trifecta detection, transcript scanning before context compaction, update notifications)
|
||
- **Deterministic scanning** — 22 Node.js scanners (10 orchestrated + 12 standalone) that perform byte-level analysis LLMs cannot: Shannon entropy, Unicode codepoints, Levenshtein distance for typosquatting, source-to-sink taint flow, DNS resolution, git history forensics, toxic flow analysis, memory poisoning, live MCP inspection, AI-BOM generation, attack simulation, IDE extension prescan
|
||
- **Advisory analysis** — 19 commands that scan, audit, and model threats with structured reports, letter grades, and actionable remediation plans
|
||
|
||
Key capabilities:
|
||
|
||
- **Supply chain gate** — scan any plugin, MCP server, or agent file before installation with ALLOW/WARNING/BLOCK verdicts
|
||
- **Full project audit** — evaluate 16 security categories with A-F grading and prioritized action items
|
||
- **Plugin trust assessment** — dedicated plugin audit with Install/Review/Do Not Install verdict
|
||
- **MCP server audit** — focused analysis of all installed MCP configurations with trust scoring
|
||
- **Threat modeling** — interactive STRIDE × MAESTRO 7-layer session with risk matrix
|
||
- **Pre-deployment checklist** — 10 automated + 3 manual checks before going to production
|
||
- **Automated remediation** — scan-and-fix pipeline with 3-tier approach (auto/semi-auto/manual)
|
||
- **Continuous monitoring** — recurring diff scanning via `/security watch` (uses built-in /loop) or system cron via `watch-cron.mjs`
|
||
- **Quick posture check** — 30-second scorecard showing your security baseline (16 categories)
|
||
|
||
> [!TIP]
|
||
> Start with `/security posture` for a 30-second baseline, then `/security audit` for the full picture.
|
||
|
||
---
|
||
|
||
## The Extension Security Problem
|
||
|
||
Claude Code's extensibility model — skills, MCP servers, plugins, hooks — creates an attack surface that mirrors the npm/PyPI supply chain problem, but with a critical difference: **extensions run with LLM agency**. A malicious plugin doesn't just execute code in a sandbox; it can instruct an AI agent to read your SSH keys, exfiltrate environment variables, install persistence mechanisms, and modify its own configuration — all while appearing to be a helpful "Project Health Dashboard."
|
||
|
||
This is not theoretical. The [ToxicSkills research](https://arxiv.org/abs/2502.01063) (Xi'an Jiaotong, 2025) and [ClawHavoc campaign](https://blog.repello.ai/clawhavoc-framework) (Repello AI, 2025) documented real attack patterns against agentic AI systems. The [OWASP LLM Top 10](https://genai.owasp.org/llm-top-10/) and [OWASP Agentic AI Top 10](https://genai.owasp.org/agentic-ai/) now formally categorize these threats.
|
||
|
||
**We built a proof-of-concept** — a single plugin called "Project Health Dashboard" that looks legitimate but embeds attacks across every threat category. When scanned with this plugin's combined LLM + deterministic analysis, it produced **[85 findings](examples/malicious-skill-demo/security-assessment.md)**: prompt injection via HTML comments, environment exfiltration via base64-encoded payloads, Unicode steganography invisible to human review, 6 typosquatting packages, 6 source-to-sink taint flows, persistence via crontab and LaunchAgents, and more. Verdict: **BLOCK 100/100**.
|
||
|
||
A human reviewing the plugin's README and SKILL.md would likely miss most of these. The Unicode Tag steganography is literally invisible. The base64 payload looks like a configuration block. The typosquatting packages are one character off from the real ones.
|
||
|
||
**What organizations need:**
|
||
|
||
1. **A pre-installation scan gate** — automated analysis before any extension is installed (this plugin provides `/security scan` and `/security plugin-audit`)
|
||
2. **A trusted, curated marketplace** — vetted extensions with security review as a prerequisite for listing
|
||
3. **Deterministic scanning** — byte-level analysis for things LLMs cannot detect: Unicode codepoints, Shannon entropy, Levenshtein distance, source-to-sink taint flows
|
||
4. **Automated hooks** — always-on primary defense blocking secrets in code, writes to sensitive paths, and destructive commands in real time
|
||
|
||
> [!IMPORTANT]
|
||
> **Always scan repos remotely before cloning them.** A poisoned CLAUDE.md injects instructions into the model context the moment you open a cloned repo — before any hooks can intervene. `/security scan https://repo-url --deep` analyzes everything safely via pre-extraction, without loading anything into your session. This is the primary defense against CLAUDE.md poisoning.
|
||
|
||
---
|
||
|
||
## Quick Start
|
||
|
||
### Prerequisites
|
||
|
||
- [Claude Code](https://docs.anthropic.com/en/docs/claude-code) installed
|
||
- Node.js (for automated hooks — `.mjs` scripts)
|
||
|
||
> [!IMPORTANT]
|
||
> **If you use Opus with extended context (1M tokens):** Subagents inherit the parent session's context limit but do not support extended context, causing API errors ("limit reached" or "extra usage required"). Fix: run `/model Opus` in your session before using any security commands. This resets the session to standard 200K context, which subagents handle correctly.
|
||
|
||
### Installation
|
||
|
||
Add the marketplace and browse plugins with `/plugin`:
|
||
|
||
```bash
|
||
claude plugin marketplace add https://git.fromaitochitta.com/open/ktg-plugin-marketplace.git
|
||
```
|
||
|
||
Or enable directly in `~/.claude/settings.json`:
|
||
|
||
```json
|
||
{
|
||
"enabledPlugins": {
|
||
"llm-security@ktg-plugin-marketplace": true
|
||
}
|
||
}
|
||
```
|
||
|
||
> [!NOTE]
|
||
> Hooks activate immediately on installation. Secret detection, path guarding, and destructive command blocking start working without any commands.
|
||
|
||
### First Scan
|
||
|
||
```
|
||
> /security posture
|
||
|
||
┌──────────────────────────────────────────────┐
|
||
│ Security Posture: 8/16 [B] 77% │
|
||
│ ████████████████░░░░░░░░░░ │
|
||
├──────────────────────────────────────────────┤
|
||
│ ✅ Deny-First Config │
|
||
│ ✅ Secrets Protection │
|
||
│ ✅ Path Guarding │
|
||
│ ⚠️ MCP Server Trust │
|
||
│ ✅ Destructive Command Blocking │
|
||
│ ⚠️ Sandbox Config │
|
||
│ ⚠️ Human Review │
|
||
│ ✅ Skill/Plugin Sources │
|
||
│ ⚠️ Session Isolation │
|
||
│ ✅ Cognitive State Security │
|
||
│ ✅ Prompt Injection Hardening │
|
||
│ ⚠️ Rule of Two │
|
||
│ ⚠️ Long-Horizon Monitoring │
|
||
│ ✅ EU AI Act │
|
||
│ ⚠️ NIST AI RMF │
|
||
│ — ISO 42001 │
|
||
├──────────────────────────────────────────────┤
|
||
│ 6 findings (1 high, 3 medium, 2 low) │
|
||
└──────────────────────────────────────────────┘
|
||
```
|
||
|
||
---
|
||
|
||
## Commands
|
||
|
||
### Scanning & Assessment
|
||
|
||
| Command | Description |
|
||
|---------|-------------|
|
||
| `/security` | Overview of all commands and quick start guide |
|
||
| `/security scan [path\|url]` | Scan skills, MCP servers, directories, or GitHub repos for security issues |
|
||
| `/security scan [path\|url] --deep` | Enhanced scan: LLM agents + 10 deterministic scanners |
|
||
| `/security deep-scan [path]` | Run 10 deterministic Node.js scanners directly (entropy, unicode, taint, deps, git, permissions, network, memory poisoning, supply chain recheck, toxic flow). Supports `--fail-on <severity>`, `--compact`, `--format sarif`, `--output-file <path>` |
|
||
| `/security audit` | Full project security audit with A-F grading and remediation plan |
|
||
| `/security plugin-audit [path\|url]` | Dedicated plugin security audit with Install/Review/Do Not Install verdict (local or GitHub URL) |
|
||
| `/security mcp-audit [--live]` | Focused audit of all installed MCP server configurations (add `--live` for runtime inspection) |
|
||
| `/security mcp-inspect` | Connect to running MCP stdio servers and scan live tool descriptions |
|
||
| `/security ide-scan [target\|url]` | Scan installed VS Code (+ Cursor, Windsurf, VSCodium, code-server) / JetBrains extensions — OR fetch a remote VSIX from VS Code Marketplace, OpenVSX, or direct `.vsix` URL (v6.4.0). Typosquat, theme-with-code, sideload, broad activation, uninstall hooks, plus UNI/ENT/NET/TNT/MEM/SCR per extension. Offline by default |
|
||
| `/security posture` | Quick security posture scorecard (16 categories incl. compliance) |
|
||
| `/security diff [path]` | Compare scan against stored baseline — shows new/resolved/unchanged/moved findings |
|
||
| `/security watch [path] [--interval 6h]` | Continuous monitoring — runs diff on a recurring interval via /loop |
|
||
| `/security registry [scan\|search]` | Skill signature registry — view stats, scan+register skills, search known fingerprints |
|
||
|
||
### Remediation
|
||
|
||
| Command | Description |
|
||
|---------|-------------|
|
||
| `/security clean [path]` | Scan and remediate findings — auto-fix, semi-auto confirm, manual report |
|
||
| `/security clean [path] --dry-run` | Preview what would be fixed without modifying files |
|
||
| `/security harden [path]` | Generate Grade A security config — settings.json, CLAUDE.md, .gitignore |
|
||
| `/security harden [path] --apply` | Apply generated config with automatic backup |
|
||
|
||
### Threat Modeling & Planning
|
||
|
||
| Command | Description |
|
||
|---------|-------------|
|
||
| `/security threat-model` | Interactive STRIDE/MAESTRO threat modeling session (15-30 min) |
|
||
| `/security red-team [--category] [--adaptive]` | Attack simulation — 64 scenarios across 12 categories test hook defenses. `--adaptive` for mutation-based evasion testing |
|
||
| `/security pre-deploy` | Pre-deployment security checklist (10 automated + 3 manual checks) |
|
||
|
||
### Scan
|
||
|
||
`/security scan` is a supply chain gate. Point it at any local path or GitHub URL before installation. It spawns specialized agents sequentially to analyze:
|
||
|
||
- **Skills/agents:** 7 threat categories (injection, exfiltration, privilege escalation, scope creep, hidden instructions, toolchain manipulation, persistence)
|
||
- **MCP servers:** 5-phase analysis (tool descriptions, source code, dependencies, configuration, rug pull detection)
|
||
|
||
**Remote repo support (v2.4+):** Pass a GitHub URL directly — the plugin clones to a temp directory, scans, and cleans up. Use `--branch <name>` for non-default branches:
|
||
|
||
```
|
||
/security scan https://github.com/org/repo --branch dev --deep
|
||
```
|
||
|
||
**Injection-safe remote scanning (v2.5+):** Remote scans pre-extract structured evidence via `content-extractor.mjs` and strip injection patterns BEFORE LLM agents see the content. Agents analyze a JSON evidence package, never raw files from untrusted repos. `[INJECTION-PATTERN-STRIPPED]` markers are confirmed findings.
|
||
|
||
**Sandboxed cloning (v5.1+):** `git clone` can execute arbitrary code via `.gitattributes` filter/smudge drivers ([CVE-2024-32002](https://github.blog/open-source/git/git-security-vulnerabilities-announced-5/) and related). Remote clones are now hardened with defense-in-depth:
|
||
|
||
**Layer 1 — Git config hardening (all platforms):** 8 config flags neutralize known attack vectors:
|
||
|
||
| Flag | Mitigates |
|
||
|------|-----------|
|
||
| `core.hooksPath=/dev/null` | Git hooks executed during clone/checkout |
|
||
| `core.symlinks=false` | Symlink traversal out of temp directory |
|
||
| `core.fsmonitor=false` | Arbitrary command execution via fsmonitor |
|
||
| `filter.lfs.{process,smudge,clean}=` | Filter/smudge driver code execution (`.gitattributes`) |
|
||
| `protocol.file.allow=never` | Local file protocol traversal |
|
||
| `transfer.fsckObjects=true` | Malformed git objects |
|
||
|
||
Environment variables (`GIT_CONFIG_NOSYSTEM=1`, `GIT_CONFIG_GLOBAL=/dev/null`, `GIT_ATTR_NOSYSTEM=1`, `GIT_TERMINAL_PROMPT=0`) isolate from system/user git config and block interactive prompts.
|
||
|
||
**Layer 2 — OS-level filesystem sandbox (platform-dependent):**
|
||
|
||
| Platform | Sandbox | How it works | Limitations |
|
||
|----------|---------|-------------|-------------|
|
||
| **macOS** | [`sandbox-exec`](https://keith.github.io/xcode-man-pages/sandbox-exec.1.html) | Restricts file writes to only the clone temp dir via Seatbelt profiles | Deprecated by Apple but still functional (no replacement exists) |
|
||
| **Linux** | [`bubblewrap`](https://github.com/containers/bubblewrap) (bwrap) | Read-only root bind mount + writable clone dir + namespace isolation | Requires `bwrap` package. [Fails on Ubuntu 24.04+](https://discourse.ubuntu.com/t/understanding-apparmor-user-namespace-restriction/58007) without admin AppArmor config. Works on Fedora/Arch |
|
||
| **Windows** | None available | Git config hardening only (Layer 1) | See Windows guidance below |
|
||
|
||
The plugin probe-tests sandbox availability at runtime and falls back gracefully. When no OS sandbox is available, a WARN is logged and cloning proceeds with git config hardening only.
|
||
|
||
**Additional protections:** Post-clone size check (100MB max), UUID-unique evidence filenames (prevents race conditions), cleanup guarantee (temp files removed even on error).
|
||
|
||
**Windows guidance:** Windows has no CLI-level filesystem sandbox equivalent to `sandbox-exec` or `bwrap`. The alternatives either require additional software or admin privileges:
|
||
|
||
| Option | Isolation level | Requirements |
|
||
|--------|----------------|--------------|
|
||
| [Windows Sandbox](https://learn.microsoft.com/en-us/windows/security/application-security/application-isolation/windows-sandbox/windows-sandbox-overview) | Full VM (Hyper-V) | Windows Pro/Enterprise, Hyper-V enabled. GUI-oriented, not scriptable |
|
||
| [Docker Desktop](https://docs.docker.com/desktop/setup/install/windows-install/) | Container | Requires Docker install. Best option for automated isolation |
|
||
| [WSL2](https://learn.microsoft.com/en-us/windows/wsl/install) | Linux VM | Requires WSL2 install. Once inside, `bwrap` is available (except Ubuntu 24.04+ caveat) |
|
||
| [AppContainer](https://learn.microsoft.com/en-us/windows/win32/secauthz/appcontainer-isolation) | Process sandbox | Requires native C++ helper binary — not practical to ship in a Node.js plugin |
|
||
|
||
**Recommendation for Windows users:** Run Claude Code inside WSL2 or Docker Desktop for full sandbox coverage. The git config hardening (Layer 1) provides baseline protection on all platforms and neutralizes all known `.gitattributes` attack vectors even without an OS sandbox.
|
||
|
||
> **Why not Node.js `--permission`?** Node's [permission model](https://nodejs.org/api/permissions.html) restricts `fs` module access within the Node process, but does **not** sandbox child processes like `git` which run as separate OS processes. It is therefore not useful for this use case.
|
||
|
||
Output: structured report with ALLOW / WARNING / BLOCK verdict, risk score (0-100), and findings sorted by severity.
|
||
|
||
### Audit
|
||
|
||
`/security audit` is a comprehensive project review. It spawns up to 3 agents to evaluate 9 security categories:
|
||
|
||
1. Secret management
|
||
2. Permission model
|
||
3. Input validation
|
||
4. Output handling
|
||
5. Supply chain
|
||
6. Data protection
|
||
7. Logging and monitoring
|
||
8. Network security
|
||
9. Agent autonomy controls
|
||
|
||
Output: A-F letter grade, risk matrix, and prioritized action items.
|
||
|
||
### Plugin Audit
|
||
|
||
`/security plugin-audit [path|url]` is a dedicated trust assessment for Claude Code plugins. Point it at any local plugin directory or GitHub URL to get a comprehensive evaluation before installation. It analyzes:
|
||
|
||
- **Manifest metadata** — name, version, author, auto_discover settings
|
||
- **Component inventory** — commands, agents, hooks, skills with tool grants
|
||
- **Permission matrix** — aggregated tool access across all components, flagging Bash, Write+Bash, and Task access
|
||
- **Hook safety** — classifies hook behavior (block/warn/advisory), flags state-modifying or network-calling hooks
|
||
- **Content scan** — spawns skill-scanner-agent for 7 threat categories
|
||
|
||
Output: structured report with **Install / Review / Do Not Install** trust verdict.
|
||
|
||
### Clean
|
||
|
||
`/security clean` is a scan-and-remediate pipeline. It runs the full deterministic scanner suite, classifies each finding into one of three tiers, and acts accordingly:
|
||
|
||
- **Auto** — Deterministic, safe fixes applied without confirmation (e.g., removing zero-width characters, BIDI overrides, Unicode Tag steganography, upgrading haiku models)
|
||
- **Semi-auto** — Fixes generated by an LLM agent, presented for user confirmation before applying (e.g., homoglyph replacement, permission adjustments, dependency fixes)
|
||
- **Manual** — Findings that require human judgment, included in the report but not auto-fixed (e.g., taint flow refactoring, architecture changes)
|
||
|
||
The remediation engine (`auto-cleaner.mjs`) performs 16 fix operations as pure functions (content → content) with atomic writes and post-fix validation. Use `--dry-run` to preview all proposed changes without modifying any files.
|
||
|
||
### Threat Model
|
||
|
||
`/security threat-model` runs a guided 15-30 minute interview session that maps your system through two frameworks:
|
||
|
||
- **STRIDE** — Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, Elevation of Privilege
|
||
- **MAESTRO 7-layer model** — Foundation Models, Data/Knowledge, Agent Frameworks, Tool Integration, Agent Capabilities, Multi-Agent Systems, Ecosystem
|
||
|
||
Output: complete threat model document with prioritized threats, risk scores, and mitigation status.
|
||
|
||
---
|
||
|
||
## Agent Architecture
|
||
|
||
The plugin delegates specialized work to 6 purpose-built agents. Each agent has focused threat detection capabilities and its own knowledge base routing.
|
||
|
||
| Agent | Role | Model | Spawned By | Tools |
|
||
|-------|------|-------|------------|-------|
|
||
| `skill-scanner-agent` | 7 threat categories (injection, exfiltration, escalation, scope creep, hidden instructions, toolchain manipulation, persistence) | Opus | `/security scan`, `/security audit`, `/security plugin-audit` | Read, Glob, Grep |
|
||
| `mcp-scanner-agent` | 5-phase MCP analysis (tool descriptions, source code, dependencies, config, rug pull detection) | Opus | `/security scan`, `/security mcp-audit` | Read, Glob, Grep, Bash |
|
||
| `posture-assessor-agent` | 16-category assessment with PASS/PARTIAL/FAIL scoring and A-F grading | Opus | `/security audit`, `/security posture` | Read, Glob, Grep |
|
||
| `threat-modeler-agent` | Interactive STRIDE × MAESTRO 7-layer interview with 5-phase workflow | Opus | `/security threat-model` | Read, Glob, Grep, AskUserQuestion |
|
||
| `deep-scan-synthesizer-agent` | Interprets deterministic scanner JSON into human-readable report with executive summary and prioritized recommendations | Opus | `/security deep-scan`, `/security scan --deep` | Read, Glob, Grep |
|
||
| `cleaner-agent` | Generates semi-auto remediation proposals for findings requiring human judgment (read-only, returns JSON proposals) | Opus | `/security clean` | Read, Glob, Grep |
|
||
|
||
### Scan Pipelines
|
||
|
||
For commands like `/security audit`, the plugin orchestrates multiple agents in parallel:
|
||
|
||
```
|
||
┌──────────────┐
|
||
│ /security │
|
||
│ audit │
|
||
└──────┬───────┘
|
||
│
|
||
┌────────────┼────────────┐
|
||
▼ ▼ ▼
|
||
┌─────────────┐ ┌───────────┐ ┌──────────┐
|
||
│ Skill │ │ MCP │ │ Posture │
|
||
│ Scanner │ │ Scanner │ │ Assessor │
|
||
└──────┬──────┘ └─────┬─────┘ └────┬─────┘
|
||
│ │ │
|
||
└──────────────┼─────────────┘
|
||
▼
|
||
┌────────────────┐
|
||
│ Audit Report │
|
||
│ (A-F grade) │
|
||
└────────────────┘
|
||
```
|
||
|
||
For deep scans (`/security scan --deep` or `/security deep-scan`), deterministic scanners run in parallel followed by synthesis:
|
||
|
||
```
|
||
┌──────────────┐
|
||
│ /security │
|
||
│ scan --deep │
|
||
└──────┬───────┘
|
||
│
|
||
┌───────────────┼───────────────┐
|
||
▼ ▼ ▼
|
||
┌───────────┐ ┌────────────┐ ┌────────────┐
|
||
│ LLM Skill │ │ 10 Det. │ │ MCP │
|
||
│ Scanner │ │ Scanners │ │ Scanner │
|
||
└─────┬─────┘ └──────┬─────┘ └──────┬─────┘
|
||
│ UNI ENT PRM │
|
||
│ DEP TNT GIT │
|
||
│ NET MEM SCR TFA │
|
||
│ │ │
|
||
│ ┌──────┴─────┐ │
|
||
│ │ Synthesizer│ │
|
||
│ │ Agent │ │
|
||
│ └──────┬─────┘ │
|
||
└───────────────┼───────────────┘
|
||
▼
|
||
┌────────────────┐
|
||
│ Combined Report│
|
||
│ (BLOCK/WARN/OK)│
|
||
└────────────────┘
|
||
```
|
||
|
||
---
|
||
|
||
## Deterministic Scanners
|
||
|
||
10 orchestrated + 12 standalone Node.js scanner scripts that perform byte-level analysis an LLM cannot. Zero external dependencies. Orchestrated scanners run via `node scanners/scan-orchestrator.mjs <target>` or through `/security deep-scan`. Supports `--fail-on <severity>`, `--compact`, `--format sarif`, `--output-file <path>`.
|
||
|
||
### Orchestrated (10)
|
||
|
||
| Scanner | Prefix | Detects | OWASP |
|
||
|---------|--------|---------|-------|
|
||
| `unicode-scanner.mjs` | UNI | Zero-width chars, Unicode Tag steganography, BIDI overrides, Cyrillic homoglyphs | LLM01 |
|
||
| `entropy-scanner.mjs` | ENT | High-entropy strings, base64/hex blobs, encoded payloads via Shannon entropy | LLM01, LLM03 |
|
||
| `permission-mapper.mjs` | PRM | Purpose-vs-tools mismatch, ghost hooks, haiku on sensitive agents, overprivileged components | LLM06 |
|
||
| `dep-auditor.mjs` | DEP | CVEs (npm/pip audit), typosquatting (Levenshtein distance), malicious install scripts, unpinned versions | LLM03 |
|
||
| `taint-tracer.mjs` | TNT | Source-to-sink data flow (process.env/req.body to eval/exec/fetch/writeFile), 3-pass analysis | LLM01, LLM02 |
|
||
| `git-forensics.mjs` | GIT | Force pushes, description drift, hook modifications, new outbound URLs, author changes | LLM03 |
|
||
| `network-mapper.mjs` | NET | Undisclosed URLs, suspicious domains (ngrok, webhook.site), IP-based URLs, DNS analysis | LLM02, LLM03 |
|
||
| `memory-poisoning-scanner.mjs` | MEM | Injection patterns, shell commands, credential paths, permission expansion, suspicious URLs, encoded payloads in CLAUDE.md/memory/rules files | LLM01, ASI02 |
|
||
| `supply-chain-recheck.mjs` | SCR | Re-audit installed deps from lockfiles against blocklists, OSV.dev batch API, typosquat detection | LLM03 |
|
||
| `toxic-flow-analyzer.mjs` | TFA | Lethal trifecta detection: untrusted input + sensitive data access + exfiltration sink. Cross-component correlation (runs last) | ASI01, ASI02, ASI05 |
|
||
|
||
### Standalone (12)
|
||
|
||
| Scanner | Prefix | Purpose |
|
||
|---------|--------|---------|
|
||
| `scan-orchestrator.mjs` | — | Entry point: runs all 10 orchestrated scanners, outputs JSON |
|
||
| `posture-scanner.mjs` | PST | Deterministic posture assessment, 16 categories (incl. EU AI Act, NIST AI RMF, ISO 42001), <50ms |
|
||
| `mcp-live-inspect.mjs` | MCI | Live MCP server inspection via JSON-RPC 2.0 (tool injection, shadowing, URL/IP) |
|
||
| `ide-extension-scanner.mjs` | IDE | VS Code (+ Cursor, Windsurf, VSCodium, code-server) / JetBrains extension prescan: blocklist, theme-with-code, sideload, broad activation, typosquat, extension-pack expansion, dangerous uninstall hooks — then UNI/ENT/NET/TNT/MEM/SCR per extension |
|
||
| `attack-simulator.mjs` | — | Red-team harness: 64 scenarios, 12 categories, adaptive mutation mode |
|
||
| `ai-bom-generator.mjs` | BOM | CycloneDX 1.6 AI Bill of Materials |
|
||
| `dashboard-aggregator.mjs` | — | Cross-project security dashboard, machine-grade aggregation |
|
||
| `reference-config-generator.mjs` | — | Grade A config generation based on posture gaps |
|
||
| `supply-chain-recheck-cli.mjs` | — | CLI wrapper for SCR scanner |
|
||
| `auto-cleaner.mjs` | — | Remediation engine: 16 fix operations, atomic writes, post-fix validation |
|
||
| `content-extractor.mjs` | — | Pre-extracts evidence from untrusted repos, strips injection patterns |
|
||
| `watch-cron.mjs` | — | Cron wrapper: scans all targets in config, writes summary, exits with verdict code |
|
||
|
||
**Why deterministic?** LLMs are powerful at semantic analysis — understanding intent, detecting social engineering, assessing context. But they cannot reliably calculate Shannon entropy, measure Levenshtein distance between package names, trace taint flow across function boundaries, or detect individual Unicode codepoints. These scanners fill that gap.
|
||
|
||
**Shared library** (`scanners/lib/`): severity classification, string utilities (entropy, Levenshtein, base64 detection), output formatting, file discovery, and YAML frontmatter parsing.
|
||
|
||
---
|
||
|
||
## Automated Hooks
|
||
|
||
These hooks run on every operation — no commands needed. They activate the moment the plugin is installed.
|
||
|
||
| Hook | Event | What It Does |
|
||
|------|-------|--------------|
|
||
| **Prompt injection scan** | UserPromptSubmit | Blocks direct prompt injection (override instructions, spoofed headers, identity redefinition); warns on subtle manipulation signals. Decodes obfuscated payloads (unicode, hex, URL, base64) before matching. Configurable: `LLM_SECURITY_INJECTION_MODE=block\|warn\|off` (default: block) |
|
||
| **Secret detection** | Edit, Write | Blocks AWS keys, Azure tokens, GitHub PATs, npm tokens, PEM keys, database URLs, Bearer tokens, passwords (13 patterns) |
|
||
| **Path guarding** | Write | Blocks writes to `.env`, `.ssh/`, `.aws/`, `.gnupg/`, credentials files, hook scripts, `/etc/`, `settings.json` (8 path categories) |
|
||
| **Destructive commands** | Bash | Blocks `rm -rf /`, `chmod 777`, pipe-to-shell, fork bombs, eval injection (8 block rules + 6 warnings) |
|
||
| **Supply chain guardrail** | Bash | Blocks known-compromised npm/pip packages, typosquatting (Levenshtein), age-gated installs (<72h), OSV.dev CVE checks across 7 package managers |
|
||
| **Output verification** | All tools (post) | Advisory: scans ALL tool output for indirect prompt injection (LLM01). Bash-specific: also flags leaked secrets, unexpected URLs, oversized MCP responses. Skips short output (<100 chars) for performance |
|
||
| **Session guard** | All tools (post) | Advisory: monitors tool call sequences for lethal trifecta (untrusted input + sensitive data access + exfiltration sink). Sliding window of 20 calls, per-session JSONL state, warns when all 3 legs present (OWASP ASI01, ASI02) |
|
||
| **Update check** | UserPromptSubmit | Checks for newer plugin versions (max 1x/24h, cached). Disable: `LLM_SECURITY_UPDATE_CHECK=off` |
|
||
|
||
All hooks are Node.js (`.mjs`) for cross-platform compatibility (macOS, Linux, Windows).
|
||
|
||
> [!IMPORTANT]
|
||
> Prompt injection scan, secret detection, path guarding, destructive commands, and supply chain guardrail are **blocking** — they prevent the operation if a pattern matches. Output verification and session guard are **advisory** — they warn but do not block. Update check is **informational** — notifies when a newer version is available. Prompt injection blocking can be changed to warn-only (`LLM_SECURITY_INJECTION_MODE=warn`) or disabled (`off`) for security research or testing environments. Update check can be disabled with `LLM_SECURITY_UPDATE_CHECK=off`.
|
||
|
||
---
|
||
|
||
## Knowledge Base
|
||
|
||
18 research-backed reference files grounding all analysis in published threat intelligence:
|
||
|
||
| File | Scope |
|
||
|------|-------|
|
||
| `owasp-llm-top10.md` | OWASP LLM Top 10 (2025) — attack vectors, detection signals, Claude Code mitigations |
|
||
| `owasp-agentic-top10.md` | OWASP Agentic AI Top 10 (ASI01-ASI10) — agent-specific threats mapped to Claude Code |
|
||
| `owasp-skills-top10.md` | OWASP Skills Top 10 (AST01-AST10) — skill-specific threats and mitigations |
|
||
| `skill-threat-patterns.md` | 7 threat categories from ToxicSkills/ClawHavoc research with concrete detection patterns |
|
||
| `mcp-threat-patterns.md` | 9 MCP threat categories from MCPTox/Pillar Security/Invariant Labs/Operant AI research |
|
||
| `secrets-patterns.md` | 30+ regex patterns for secret detection across 10 provider categories |
|
||
| `mitigation-matrix.md` | OWASP LLM Top 10 → Claude Code control mapping with verification checks and coverage scores |
|
||
| `top-packages.json` | Top 200 npm + top 100 PyPI package names for typosquatting detection (Levenshtein baseline) |
|
||
| `skill-registry.json` | Seed data for skill signature registry — known fingerprints and risk profiles |
|
||
| `compliance-mapping.md` | EU AI Act, NIST AI RMF, ISO 42001, MITRE ATLAS — article/control mappings to plugin capabilities |
|
||
| `norwegian-context.md` | Norwegian regulatory landscape — Datatilsynet, NSM, Digitaliseringsdirektoratet guidance for AI security |
|
||
| `prompt-injection-research-2025-2026.md` | 7 research papers (2025-2026) with implications for hook defenses |
|
||
| `deepmind-agent-traps.md` | DeepMind AI Agent Traps — 6 categories, 43 techniques, coverage matrix |
|
||
| `attack-scenarios.json` | 64 red-team scenarios across 12 categories for attack simulation |
|
||
| `attack-mutations.json` | Synonym tables and mutation rules for adaptive red-team testing |
|
||
| `typosquat-allowlist.json` | Allowlisted package names to reduce false positives in typosquatting detection |
|
||
| `ide-extension-threat-patterns.md` | 10 IDE-extension detection categories (VS Code + JetBrains) with 2024-2026 case studies (GlassWorm, WhiteCobra, TigerJack, Material Theme) |
|
||
| `top-vscode-extensions.json` | Top ~100 VS Code Marketplace extension IDs (Levenshtein typosquat seed) + blocklist of known-malicious publisher.name entries |
|
||
|
||
> [!NOTE]
|
||
> All knowledge base content is derived from published OWASP standards and peer-reviewed security research. The knowledge files provide grounding for agent analysis — agents read relevant sections before producing findings.
|
||
|
||
---
|
||
|
||
## Compliance & Governance
|
||
|
||
v6.0.0 adds an enterprise governance layer for standards-aware security operations:
|
||
|
||
| Capability | Description |
|
||
|------------|-------------|
|
||
| **Compliance Mapping** | Maps plugin capabilities to EU AI Act (Art. 9, 15, 17), NIST AI RMF (Map, Measure, Manage, Govern), ISO 42001 (Annex A), and MITRE ATLAS techniques. Posture categories 14-16 assess compliance readiness. |
|
||
| **Norwegian Context** | Regulatory guidance from Datatilsynet (DPIA for AI), NSM (basic security principles), and Digitaliseringsdirektoratet. Relevant for Norwegian public sector AI deployments. |
|
||
| **SARIF 2.1.0 Output** | `--format sarif` flag on scan/deep-scan produces OASIS SARIF standard output for CI/CD integration (GitHub Advanced Security, Azure DevOps, SonarQube). |
|
||
| **Structured Audit Trail** | JSONL audit events (`audit-trail.mjs`) with ISO 8601 timestamps, OWASP category tags, and SIEM-ready schema. Configurable via `LLM_SECURITY_AUDIT_*` env vars. |
|
||
| **AI-BOM** | CycloneDX 1.6 Bill of Materials for AI components — models, MCP servers, plugins, knowledge files, hooks. `llm-security audit-bom <target>`. |
|
||
| **Policy-as-Code** | `.llm-security/policy.json` for distributable hook configuration. Teams can enforce consistent security thresholds without per-developer env var setup. |
|
||
| **Standalone CLI** | `node bin/llm-security.mjs scan <target>` — runs scanners without Claude Code. Subcommands: `scan`, `posture`, `audit-bom`, `benchmark`. |
|
||
| **CI/CD Integration** | `--fail-on <severity>` for threshold-based exit codes, `--compact` for one-liner output. Pipeline templates for GitHub Actions, Azure DevOps, GitLab CI in `ci/`. Guide: `docs/ci-cd-guide.md`. |
|
||
|
||
### Benchmarks
|
||
|
||
The attack simulator (`llm-security benchmark`) tests hook defenses with 64 crafted scenarios across 12 categories. Adaptive mode (`--adaptive`) applies 5 mutation rounds per passing scenario (homoglyph substitution, encoding variations, zero-width injection, case alternation, synonym replacement).
|
||
|
||
---
|
||
|
||
## OWASP Coverage
|
||
|
||
| Category | Automated (Hooks) | Deterministic (Scanners) | Advisory (Commands) | Coverage |
|
||
|----------|-------------------|--------------------------|---------------------|----------|
|
||
| LLM01 Prompt Injection | **Strong** (input + output) | UNI + ENT + TNT | Scan, Audit | **95%** |
|
||
| LLM02 Sensitive Info Disclosure | **Strong** | TNT + NET | Audit | **83%** |
|
||
| LLM03 Supply Chain | Partial | ENT + DEP + GIT + NET | Scan, Plugin Audit, MCP Audit | 60% |
|
||
| LLM04 Data Poisoning | — | — | Threat Model | 40% |
|
||
| LLM05 Improper Output Handling | **Strong** (output scan) | — | Audit | **83%** |
|
||
| LLM06 Excessive Agency | **Strong** | PRM | Posture | **100%** |
|
||
| LLM07 System Prompt Leakage | — | — | Audit | 60% |
|
||
| LLM08 Vector/Embedding Weaknesses | — | — | Threat Model | 40% |
|
||
| LLM09 Misinformation | — | — | Advisory | 50% |
|
||
| LLM10 Unbounded Consumption | — | — | Pre-Deploy | **83%** |
|
||
|
||
**Average coverage: ~69%.** Percentages reflect control-count coverage from `knowledge/mitigation-matrix.md`. Strongest in prompt injection (LLM01, 95% with runtime input/output scanning + obfuscation decoding) and agency controls (LLM06, 100%). Weakest in areas requiring model-provider or infrastructure controls (LLM04, LLM08), which are better addressed at the platform level.
|
||
|
||
---
|
||
|
||
## Workflow Examples
|
||
|
||
### 1. Pre-Installation Gate
|
||
|
||
Evaluate a plugin or MCP server before installing it — locally or from a remote repo:
|
||
|
||
```
|
||
/security scan path/to/plugin # Quick scan with ALLOW/WARNING/BLOCK verdict
|
||
/security plugin-audit path/to/plugin # Deep trust assessment with Install/Review/Do Not Install
|
||
# → Install if both pass, investigate if flagged
|
||
|
||
# Remote repo — scans without installing (v2.4+)
|
||
/security scan https://github.com/org/repo --deep
|
||
/security scan https://github.com/org/repo --branch dev --deep
|
||
/security plugin-audit https://github.com/org/repo
|
||
```
|
||
|
||
### 2. Monthly Security Review
|
||
|
||
Regular cadence for maintaining security posture:
|
||
|
||
```
|
||
/security posture # 30-second baseline scorecard (16 categories)
|
||
/security audit # Full audit with A-F grade and action items
|
||
# → Fix critical/high findings
|
||
/security posture # Verify improvement
|
||
```
|
||
|
||
### 3. Track Security Over Time
|
||
|
||
Compare scan results against a stored baseline to see what changed:
|
||
|
||
```
|
||
/security diff path/to/project # First run creates baseline, subsequent runs show delta
|
||
# → Shows new, resolved, unchanged, and moved findings
|
||
/security watch path/to/project # Continuous: runs diff every 6h via /loop
|
||
```
|
||
|
||
### 4. Deep Threat Analysis
|
||
|
||
For new architectures, major changes, or compliance requirements:
|
||
|
||
```
|
||
/security threat-model # 15-30 min guided STRIDE × MAESTRO session
|
||
/security audit # Verify current controls against identified threats
|
||
/security pre-deploy # Pre-deployment checklist before production
|
||
```
|
||
|
||
### 5. Remediation
|
||
|
||
Fix findings from scans and audits:
|
||
|
||
```
|
||
/security clean path/to/project --dry-run # Preview fixes without modifying files
|
||
/security clean path/to/project # Auto-fix safe issues, confirm semi-auto, report manual
|
||
# → Review semi-auto proposals, handle manual findings
|
||
```
|
||
|
||
---
|
||
|
||
## Prompt Injection Showcase (v5.0)
|
||
|
||
The `examples/prompt-injection-showcase/` demonstrates runtime hook detection against 61 attack payloads across 19 categories — from classic instruction overrides to v5.0's Unicode steganography, HITL traps, NL indirection, hybrid P2SQL, and bash evasion techniques. Includes 6 false positive checks.
|
||
|
||
```bash
|
||
node examples/prompt-injection-showcase/run-showcase.mjs # Run all 61 payloads
|
||
node examples/prompt-injection-showcase/run-showcase.mjs --verbose # Show hook output
|
||
```
|
||
|
||
See [examples/prompt-injection-showcase/README.md](examples/prompt-injection-showcase/README.md) for the full category breakdown.
|
||
|
||
---
|
||
|
||
## Security Assessment Demo
|
||
|
||
The `examples/malicious-skill-demo/` directory contains a deliberately malicious plugin called "Project Health Dashboard" and a [full security assessment](examples/malicious-skill-demo/security-assessment.md) produced by the combined LLM + deterministic scanning pipeline.
|
||
|
||
**What it demonstrates:** A single plugin that looks like a legitimate project health monitoring tool but embeds attacks across every threat category — prompt injection, data exfiltration, Unicode steganography, typosquatting, taint flows, persistence mechanisms, and more.
|
||
|
||
**Key stats:**
|
||
- **85 total findings** (24 Critical, 24 High, 20 Medium, 6 Low, 11 Info)
|
||
- **Verdict: BLOCK 100/100** — both LLM and deterministic scanners independently maxed the risk score
|
||
- **All 9 deterministic scanners active** — every scanner found findings
|
||
- **25 LLM findings** detecting semantic patterns (social engineering, intent, context normalization)
|
||
- **60 deterministic findings** detecting byte-level patterns (entropy, Unicode codepoints, taint flow, Levenshtein distance)
|
||
|
||
**Run it yourself:**
|
||
|
||
```bash
|
||
# Deterministic scanners only (~5 seconds)
|
||
node scanners/scan-orchestrator.mjs examples/malicious-skill-demo/evil-project-health/
|
||
|
||
# Full LLM-enhanced deep scan (both layers)
|
||
/security scan examples/malicious-skill-demo/evil-project-health/ --deep
|
||
```
|
||
|
||
**Key takeaway:** A single "Project Health Dashboard" plugin embedded 7 categories of attacks invisible to human review. The Unicode Tag steganography, base64-encoded exfiltration payloads, and one-character-off typosquatting packages would pass casual inspection. Automated scanning caught all of them.
|
||
|
||
### Self-scan: scanning the scanner
|
||
|
||
Running `node scanners/scan-orchestrator.mjs .` on this plugin produces **0 findings (ALLOW)** with ~190 suppressions via `.llm-security-ignore`.
|
||
|
||
Why ~190 suppressed? A security plugin that documents attack patterns, ships a malicious demo fixture, and tests against deliberately evil code will trigger its own scanners. The entropy scanner flags regex patterns in `knowledge/secrets-patterns.md`. The taint scanner flags `eval(user_input)` in test fixtures. The network scanner flags `evil.com` in documentation. The toxic flow analyzer flags the plugin's own commands that use Read+Bash (they're security scanners). Every suppression is explained in the ignore file. Remove `.llm-security-ignore` and re-run to see all ~190.
|
||
|
||
---
|
||
|
||
## Architecture
|
||
|
||
```mermaid
|
||
flowchart TB
|
||
subgraph Runtime["Runtime Defense (9 hooks)"]
|
||
direction LR
|
||
H1["UserPromptSubmit<br/>Injection scan"]
|
||
H2["PreToolUse<br/>Secrets · Paths · Bash · Supply chain"]
|
||
H3["PostToolUse<br/>Output verify · Session guard"]
|
||
H4["Update check"]
|
||
end
|
||
|
||
subgraph Scanning["Deterministic Analysis (10+11 scanners)"]
|
||
direction LR
|
||
S1["UNI · ENT · PRM · DEP<br/>TNT · GIT · NET · MEM · SCR"]
|
||
S2["TFA<br/>Toxic flow correlator"]
|
||
S3["MCI · PST · BOM<br/>Standalone scanners"]
|
||
end
|
||
|
||
subgraph Advisory["Advisory Analysis (6 agents, 19 commands)"]
|
||
direction LR
|
||
A1["Skill Scanner<br/>7 threat categories"]
|
||
A2["MCP Scanner<br/>5-phase analysis"]
|
||
A3["Posture · Audit<br/>16 categories, A-F grade"]
|
||
A4["Threat Model<br/>STRIDE × MAESTRO"]
|
||
end
|
||
|
||
subgraph Knowledge["Knowledge Base (16 files)"]
|
||
direction LR
|
||
K1["5 OWASP frameworks"]
|
||
K2["Threat patterns<br/>Skills · MCP · Secrets"]
|
||
K3["Compliance · Research<br/>Registry · Packages"]
|
||
end
|
||
|
||
Runtime -->|"blocks/warns in real time"| User["Claude Code Session"]
|
||
User -->|"/security scan"| Scanning
|
||
User -->|"/security audit"| Advisory
|
||
Advisory -.->|"grounded by"| Knowledge
|
||
Scanning -->|"enriches"| Advisory
|
||
S1 -->|"prior results"| S2
|
||
```
|
||
|
||
### Directory Structure
|
||
|
||
```
|
||
llm-security/
|
||
├── .claude-plugin/plugin.json # Manifest (v3.0.0)
|
||
├── CLAUDE.md # Plugin documentation
|
||
├── README.md # This file
|
||
├── LICENSE # MIT License
|
||
├── SECURITY.md # Vulnerability disclosure policy
|
||
├── package.json # type: module, engines, test script, bin field
|
||
├── bin/ # Standalone CLI
|
||
│ └── llm-security.mjs # node bin/llm-security.mjs scan/posture/audit-bom/benchmark
|
||
├── ci/ # CI/CD pipeline templates
|
||
│ ├── github-action.yml # GitHub Actions with SARIF upload
|
||
│ ├── azure-pipelines.yml # Azure DevOps with SARIF upload
|
||
│ └── gitlab-ci.yml # GitLab CI with SARIF upload
|
||
├── docs/ # Guides
|
||
│ └── ci-cd-guide.md # CI/CD integration guide (Schrems II, NSM)
|
||
├── commands/ # 18 slash commands
|
||
│ ├── security.md # Router + quick start
|
||
│ ├── scan.md # Supply chain gate (+ --deep, --fail-on, --compact, --format sarif)
|
||
│ ├── deep-scan.md # Deterministic-only deep scan
|
||
│ ├── diff.md # Compare scan against stored baseline
|
||
│ ├── watch.md # Continuous monitoring via /loop
|
||
│ ├── registry.md # Skill signature registry
|
||
│ ├── supply-check.md # Re-audit installed dependencies
|
||
│ ├── clean.md # Scan + remediate (auto/semi-auto/manual)
|
||
│ ├── dashboard.md # Cross-project security dashboard
|
||
│ ├── audit.md # Full project audit
|
||
│ ├── plugin-audit.md # Plugin trust assessment
|
||
│ ├── mcp-audit.md # MCP-focused audit (+ --live flag)
|
||
│ ├── mcp-inspect.md # Live MCP server inspection via JSON-RPC 2.0
|
||
│ ├── posture.md # Quick scorecard (16 categories)
|
||
│ ├── harden.md # Generate Grade A security config
|
||
│ ├── red-team.md # Attack simulation (64 scenarios, adaptive mode)
|
||
│ ├── threat-model.md # Interactive STRIDE/MAESTRO
|
||
│ └── pre-deploy.md # Deployment checklist
|
||
├── agents/ # 6 specialized agents
|
||
│ ├── skill-scanner-agent.md # 7 threat categories
|
||
│ ├── mcp-scanner-agent.md # 5-phase MCP analysis
|
||
│ ├── posture-assessor-agent.md # 16-category assessment
|
||
│ ├── threat-modeler-agent.md # STRIDE × MAESTRO interview
|
||
│ ├── deep-scan-synthesizer-agent.md # JSON → human-readable report
|
||
│ └── cleaner-agent.md # Semi-auto remediation proposals
|
||
├── scanners/ # 10 orchestrated + 11 standalone
|
||
│ ├── scan-orchestrator.mjs # Entry point — runs all 10 orchestrated, outputs JSON
|
||
│ ├── posture-scanner.mjs # Standalone: 16-category posture assessment, <50ms
|
||
│ ├── attack-simulator.mjs # Standalone: red-team harness, 64 scenarios, adaptive mode
|
||
│ ├── ai-bom-generator.mjs # Standalone: CycloneDX 1.6 AI Bill of Materials
|
||
│ ├── dashboard-aggregator.mjs # Standalone: cross-project dashboard aggregation
|
||
│ ├── reference-config-generator.mjs # Standalone: Grade A config generation
|
||
│ ├── supply-chain-recheck-cli.mjs # Standalone: CLI for supply chain re-audit
|
||
│ ├── auto-cleaner.mjs # Standalone: remediation engine — 16 fix ops, atomic writes
|
||
│ ├── content-extractor.mjs # Standalone: pre-extracts evidence, strips injection patterns
|
||
│ ├── mcp-live-inspect.mjs # Standalone: live MCP server inspection via JSON-RPC 2.0
|
||
│ ├── watch-cron.mjs # Standalone: cron wrapper for background scanning
|
||
│ ├── lib/
|
||
│ │ ├── severity.mjs # Constants, risk score, verdict logic
|
||
│ │ ├── string-utils.mjs # Entropy, Levenshtein, base64, redact, obfuscation decoders
|
||
│ │ ├── injection-patterns.mjs # Shared prompt injection patterns (21 critical, 8 high, 15 medium)
|
||
│ │ ├── output.mjs # Finding/result builders, JSON envelope
|
||
│ │ ├── diff-engine.mjs # Baseline storage, fingerprinting, diff categorization
|
||
│ │ ├── skill-registry.mjs # Fingerprinting, caching, pattern search
|
||
│ │ ├── file-discovery.mjs # Walk tree, filter, binary detect
|
||
│ │ ├── yaml-frontmatter.mjs # Regex-based frontmatter parser
|
||
│ │ ├── git-clone.mjs # Sandboxed clone/cleanup (sandbox-exec + git config hardening)
|
||
│ │ ├── fs-utils.mjs # Backup, restore, cleanup, tmppath (UUID-unique) utilities
|
||
│ │ ├── bash-normalize.mjs # Bash evasion normalization (empty quotes, ${}, backslash)
|
||
│ │ ├── supply-chain-data.mjs # Shared blocklists and supply chain data
|
||
│ │ ├── sarif-formatter.mjs # OASIS SARIF 2.1.0 output formatter
|
||
│ │ ├── audit-trail.mjs # Structured JSONL audit events (ISO 8601, OWASP tags)
|
||
│ │ ├── bom-builder.mjs # CycloneDX BOM construction
|
||
│ │ ├── distribution-stats.mjs # Statistical analysis (Jensen-Shannon divergence)
|
||
│ │ ├── policy-loader.mjs # Reads .llm-security/policy.json for distributable config
|
||
│ │ └── mcp-description-cache.mjs # MCP tool description caching + drift detection
|
||
│ ├── unicode-scanner.mjs # Zero-width, Tags, BIDI, homoglyphs
|
||
│ ├── entropy-scanner.mjs # Shannon entropy, base64/hex detection
|
||
│ ├── permission-mapper.mjs # Plugin permission analysis
|
||
│ ├── dep-auditor.mjs # CVE, typosquatting, install scripts
|
||
│ ├── taint-tracer.mjs # Source-to-sink data flow tracing
|
||
│ ├── git-forensics.mjs # Rug pull signals, history analysis
|
||
│ ├── network-mapper.mjs # URL discovery, DNS, domain classification
|
||
│ ├── memory-poisoning-scanner.mjs # Injection in CLAUDE.md, memory, rules files
|
||
│ ├── supply-chain-recheck.mjs # Re-audit installed deps from lockfiles
|
||
│ └── toxic-flow-analyzer.mjs # Post-processing correlator: lethal trifecta detection
|
||
├── hooks/ # 9 automated hooks
|
||
│ ├── hooks.json # Hook registration
|
||
│ └── scripts/
|
||
│ ├── pre-prompt-inject-scan.mjs # 21 critical + 8 high + 15 medium patterns, obfuscation decode, configurable mode
|
||
│ ├── pre-edit-secrets.mjs # 13 secret patterns, knowledge/ exclusion
|
||
│ ├── pre-write-pathguard.mjs # 8 path categories (env, ssh, aws, gnupg, creds, hooks, system, settings)
|
||
│ ├── pre-bash-destructive.mjs # 8 block + 6 warn rules, T1-T6 bash-normalize
|
||
│ ├── pre-install-supply-chain.mjs # 7 package managers, CVE/typosquat/age-gate
|
||
│ ├── pre-compact-scan.mjs # PreCompact: scans transcript tail (500 KB) for injection before compaction, mode: block/warn/off
|
||
│ ├── post-mcp-verify.mjs # Advisory: ALL tools injection scan, Bash secrets/URLs/size
|
||
│ ├── post-session-guard.mjs # Advisory: runtime trifecta detection (sliding window, JSONL state)
|
||
│ └── update-check.mjs # Informational: version check (1x/24h, cached, disable: LLM_SECURITY_UPDATE_CHECK=off)
|
||
├── knowledge/ # 16 reference files
|
||
│ ├── owasp-llm-top10.md
|
||
│ ├── owasp-agentic-top10.md
|
||
│ ├── owasp-skills-top10.md # OWASP Skills Top 10 (AST01-AST10)
|
||
│ ├── skill-threat-patterns.md
|
||
│ ├── mcp-threat-patterns.md
|
||
│ ├── secrets-patterns.md
|
||
│ ├── mitigation-matrix.md
|
||
│ ├── compliance-mapping.md # EU AI Act, NIST AI RMF, ISO 42001, MITRE ATLAS
|
||
│ ├── norwegian-context.md # Datatilsynet, NSM, Digitaliseringsdirektoratet
|
||
│ ├── deepmind-agent-traps.md # 6 categories, 43 techniques
|
||
│ ├── prompt-injection-research-2025-2026.md # 7 research papers
|
||
│ ├── attack-scenarios.json # 64 red-team scenarios across 12 categories
|
||
│ ├── attack-mutations.json # Synonym tables for adaptive testing
|
||
│ ├── typosquat-allowlist.json # False positive reduction
|
||
│ ├── top-packages.json # Top 200 npm + 100 PyPI for typosquatting
|
||
│ └── skill-registry.json # Seed data for skill signature registry
|
||
├── tests/ # Test suite (node:test, zero external deps)
|
||
│ ├── lib/ # Unit tests for shared library
|
||
│ ├── scanners/ # Integration tests against fixture
|
||
│ └── fixtures/ # Test-specific data (dep-test)
|
||
├── reports/ # Scan reports (.docx + .md source)
|
||
│ ├── baselines/ # Stored scan baselines for diff comparison
|
||
│ └── watch/ # Cron scan results (latest.json) + config
|
||
├── examples/ # Demo fixtures
|
||
│ └── malicious-skill-demo/ # Regression test (47+ findings, BLOCK)
|
||
└── templates/ # Report templates (1 unified + archive)
|
||
├── unified-report.md # All 9 analysis types via conditional sections
|
||
└── archive/ # 9 original templates preserved for reference
|
||
```
|
||
|
||
**~25,400 lines across ~100 active files (+10 archived).** Minimal persistent state: scan baselines in `reports/baselines/`, watch results in `reports/watch/`, skill registry in `reports/skill-registry.json`, session guard JSONL in `/tmp/`, update-check cache in `~/.cache/`. All scan outputs generated fresh per invocation.
|
||
|
||
---
|
||
|
||
## What This Plugin Does Not Cover
|
||
|
||
| Area | Why | Alternative |
|
||
|------|-----|-------------|
|
||
| CLAUDE.md poisoning (post-clone) | Once a repo is cloned, CLAUDE.md loads into the system prompt *before* any hooks run. No hook-based solution can intercept this after cloning. **This is exactly why you should scan repos remotely before cloning:** `/security scan https://repo-url --deep` analyzes CLAUDE.md and all other files via the pre-extraction layer without ever loading them into your session. | **Always scan before cloning unknown repos.** For repos already cloned: manually review CLAUDE.md before opening with Claude Code. See [context-filter](https://github.com/jedi-be/context-filter) for experimental OS-level interposition (macOS only, requires re-signing after Claude Code updates). |
|
||
| ML-based injection classification | Regex patterns cannot catch novel phrasings, multilingual injection, or adversarial rephrasing that semantic models can. | Use [parry-guard](https://github.com/vaporif/parry) alongside this plugin for DeBERTa/Llama Prompt Guard 2 ML classification. |
|
||
| Enterprise SSO/SCIM | Platform-level configuration | Anthropic Admin Console |
|
||
| RAG infrastructure | Vector DB / embedding pipeline security | Dedicated RAG security tools |
|
||
| LLM gateway/proxy | Network infrastructure | API gateway solutions |
|
||
| SIEM integration | Organization security stack | Splunk, Sentinel, etc. |
|
||
| Runtime scheming detection | The session guard hook detects lethal trifecta patterns (a known attack sequence), but general scheming — where an agent pursues hidden goals through novel strategies — remains fundamentally hard for any tool. | Session guard provides partial coverage. Full scheming detection requires monitoring + human oversight |
|
||
|
||
These gaps are surfaced advisorily through `/security threat-model` and `/security pre-deploy`.
|
||
|
||
---
|
||
|
||
## Complementary Tools
|
||
|
||
This plugin provides full-stack security hardening (static analysis + supply chain + audit + threat modeling). For organizations wanting defense in depth, these tools cover areas we intentionally leave to specialists:
|
||
|
||
| Tool | What It Adds | How It Complements |
|
||
|------|-------------|-------------------|
|
||
| [parry-guard](https://github.com/vaporif/parry) | ML-based prompt injection detection (DeBERTa v3 + Llama Prompt Guard 2 86M) in Rust. Fail-closed: uncertain = unsafe. | Our regex patterns catch known injection signatures. parry-guard catches novel phrasings, multilingual injection, and adversarial rephrasing via semantic ML models. No overlap, no conflict. |
|
||
| [Lasso claude-hooks](https://github.com/lasso-security/claude-hooks) | Warn-and-continue PostToolUse hook. 96 patterns across 5 categories. `allowManagedHooksOnly` for team deployment. | Different philosophy: Lasso warns but never blocks, letting Claude decide with context. Our hooks block critical patterns. Both can run together; hooks execute sequentially. |
|
||
| [Snyk agent-scan](https://github.com/snyk/agent-scan) | Commercial skills/MCP scanning with a larger dataset (3,984 skills analyzed). Tool poisoning and shadowing detection. | Our skill-scanner-agent covers the same 7 threat categories. Snyk has a larger training set from scanning the full ClawHub marketplace. Use both for maximum coverage. |
|
||
|
||
> [!TIP]
|
||
> Recommended combo: **llm-security** (breadth: static + supply chain + audit + posture + threat modeling) + **parry-guard** (depth: ML injection classification). They cover different layers with no conflicts.
|
||
|
||
---
|
||
|
||
## Compatibility
|
||
|
||
- **Claude Code:** v2.x+
|
||
- **Platform:** macOS, Linux, Windows (all hooks are Node.js `.mjs`)
|
||
- **Node.js:** Required for hook scripts (any recent LTS version)
|
||
- **Overlap with claude-code-essentials:** Safe to run both. This plugin extends `claude-code-essentials` with path guarding and MCP verification. Duplicate blocking is harmless — hooks run sequentially.
|
||
|
||
---
|
||
|
||
## Version History
|
||
|
||
| Version | Date | Highlights |
|
||
|---------|------|------------|
|
||
| **7.0.0** | 2026-04-19 | **Trustworthy scoring (BREAKING).** Three changes target the false-positive cascade on real codebases (scan of hyperframes.com gave `BLOCK / Extreme / 100` with ~70% noise). **1. Risk-score v2** (`scanners/lib/severity.mjs`) — severity-dominated, log-scaled within tier. Replaces sum-and-cap that collapsed every non-trivial scan to 100/Extreme. Tiers: critical → 70–95, high only → 40–65, medium only → 15–35, low only → 1–11. Verdict cutoffs realigned (BLOCK ≥65, WARNING ≥15) for band co-monotonicity. **2. Context-aware entropy scanner** — file-extension skip (`.glsl/.frag/.vert/.shader/.wgsl/.css/.scss/.sass/.less/.svg/.min.*/.map`) + 8 new line-suppression rules (GLSL keywords, CSS-in-JS templates, inline SVG, ffmpeg `filter_complex`, User-Agent strings, SQL DDL on dedicated lines, `throw new Error(\`...\`)`, markdown image URLs). Configurable via `.llm-security/policy.json` `entropy` section (thresholds, `suppress_extensions`, `suppress_line_patterns`, `suppress_paths`). Envelope `calibration` block reports skip counters + effective thresholds + policy source. **3. DEP typosquat allowlist expansion** — 22 npm + 5 PyPI entries for short-name tools that tripped Levenshtein on every modern codebase (`knip`, `oxlint`, `tsx`, `nx`, `rimraf`, `uv`, `ruff`, etc.). Synthesizer "Scan Calibration" section + "never override verdict" rule added. Legacy `riskScoreV1()` kept for reference. **CI pipelines with `--fail-on` thresholds may need recalibration.** 1487 tests (was 1461). |
|
||
| **6.6.0** | 2026-04-18 | **JetBrains/IntelliJ plugin scanning.** `/security ide-scan` now covers JetBrains IDEs (IntelliJ IDEA, PyCharm, GoLand, WebStorm, RubyMine, PhpStorm, CLion, DataGrip, RustRover, Rider, Aqua, Writerside, Android Studio) — Fleet and Toolbox excluded. OS-aware discovery of `~/Library/Application Support/JetBrains/<IDE><version>/plugins/` (macOS), `%APPDATA%\JetBrains\...` (Windows), `~/.config/JetBrains/...` (Linux). Zero-dep parsers for `META-INF/plugin.xml` and `META-INF/MANIFEST.MF` with nested-jar extraction. 7 JetBrains-specific checks: theme-with-code, broad activation (`application-components`), `Premain-Class` instrumentation (HIGH — javaagent retransform), native binaries (`.so`/`.dylib`/`.dll`/`.jnilib`), long `<depends>` chains (supply-chain pressure), typosquat vs top JetBrains plugins, shaded-jar advisory. URL fetch for `plugins.jetbrains.com/plugin/<numericId>-<slug>` + direct `/plugin/download?pluginId=<xmlId>`; metadata resolves numericId → xmlId before download. `.kt`, `.groovy`, `.scala` added to `taint-tracer` code extensions. Reuses existing OS sandbox (`lib/vsix-sandbox.mjs` parameterized via `buildSandboxedWorker(..., workerPath)`). Knowledge: `knowledge/jetbrains-marketplace-api-notes.md`, expanded `knowledge/ide-extension-threat-patterns.md`, seeded `knowledge/top-jetbrains-plugins.json`. 1461 tests (was 1352). |
|
||
| **6.5.0** | 2026-04-17 | **OS sandbox for `/security ide-scan <url>`.** VSIX fetch + extract now runs in a sub-process wrapped by `sandbox-exec` (macOS) or `bwrap` (Linux), reusing the same primitives proven by the v5.1 git-clone sandbox. Defense-in-depth — even if `lib/zip-extract.mjs` ever has a bypass, the kernel refuses any write outside the per-scan temp directory. New: `lib/vsix-fetch-worker.mjs` (sub-process worker with deterministic JSON-line IPC) and `lib/vsix-sandbox.mjs` (`buildSandboxProfile` / `buildBwrapArgs` / `buildSandboxedWorker` / `runVsixWorker`, 35 s timeout, 1 MB stdout cap). New `scan(target, { useSandbox })` option (default `true` for CLI; tests use `false` since `globalThis.fetch` mocks do not cross processes). Windows fallback: in-process with `meta.warnings` advisory. Envelope `meta.source.sandbox` field: `'sandbox-exec' \| 'bwrap' \| 'none' \| 'in-process'`. 1352 tests (was 1344). |
|
||
| **6.4.0** | 2026-04-17 | **`/security ide-scan <url>` — pre-install verification.** The IDE extension scanner now accepts URLs and fetches the VSIX before scanning. Supported: VS Code Marketplace (`https://marketplace.visualstudio.com/items?itemName=publisher.name`), OpenVSX (`https://open-vsx.org/extension/publisher/name[/version]`), and direct `.vsix` URLs. New libraries: `lib/vsix-fetch.mjs` (HTTPS-only fetch with 50MB cap, 30s timeout, SHA-256, manual host-whitelisted redirects) and `lib/zip-extract.mjs` (zero-dep ZIP parser, rejects zip-slip / symlinks / absolute paths / drive letters / encrypted entries / ZIP64; caps: 10 000 entries, 500MB uncompressed, 100x expansion ratio, depth 20). Temp dir always cleaned in `try/finally`. Envelope `meta.source` carries `{ type: "url", kind, url, finalUrl, sha256, size, publisher, name, version }`. New knowledge file: `marketplace-api-notes.md`. GitHub repo URLs intentionally not supported (would require a build step). 1344 tests (was 1296). |
|
||
| **6.3.0** | 2026-04-17 | **IDE extension prescan.** New `/security ide-scan` command and `ide-extension-scanner.mjs` (prefix IDE) discover and audit installed VS Code extensions (and forks: Cursor, Windsurf, VSCodium, code-server, Insiders, Remote-SSH; JetBrains is a v1.1 stub). 7 IDE-specific checks: blocklist match, theme-with-code, sideload (`.vsix`), broad activation (`*`, `onStartupFinished`), Levenshtein typosquat ≤2 vs top-100, extension-pack expansion, dangerous `vscode:uninstall` hooks. Per-extension orchestration of UNI/ENT/NET/TNT/MEM/SCR scanners with bounded concurrency. OS-aware discovery via `lib/ide-extension-discovery.mjs` (Platform-specific suffix parsing for `darwin-x64`, `linux-arm64`, etc.). Offline-first; `--online` opt-in for future Marketplace/OSV.dev lookups. New knowledge files: `ide-extension-threat-patterns.md` (10 categories, 2024-2026 case studies from Koi Security — GlassWorm, WhiteCobra, TigerJack, Material Theme), `top-vscode-extensions.json` (typosquat seed + blocklist), `top-jetbrains-plugins.json` (stub). 1296 tests (was 1274). |
|
||
| **6.2.0** | 2026-04-17 | **Opus 4.7 + Claude Code 2.1.112 alignment.** Bash-normalize extended with T5 (`${IFS}` word-splitting) and T6 (ANSI-C `$'\xHH'` hex quoting) layers. New `pre-compact-scan.mjs` PreCompact hook — scans transcript tail (500 KB cap, <500 ms) for injection + credentials before context compaction. Modes: `block` / `warn` / `off` via `LLM_SECURITY_PRECOMPACT_MODE`. Agent files reframed for Opus 4.7's more literal instruction-following (Step 0 generaliseringsgrense + parallell Read-hint in skill-scanner + mcp-scanner). New `docs/security-hardening-guide.md` with env-var reference, sandboxing notes, system-card §5.2.1 / §6.3.1.1 mapping. CLAUDE.md Defense Philosophy links to system card. 1274 tests (was 1264). |
|
||
| **6.1.0** | 2026-04-10 | **CI/CD integration.** `--fail-on <severity>` flag for threshold-based exit codes (exit 1 if findings at/above level). `--compact` output mode (one-liner per finding). Policy `ci` section in `policy.json`. Pipeline templates: GitHub Actions, Azure DevOps, GitLab CI with SARIF upload. CI/CD guide (`docs/ci-cd-guide.md`) with Schrems II/NSM compliance docs. npm publish preparation (`files` whitelist). 1264 tests. |
|
||
| **6.0.0** | 2026-04-10 | **CAISS-readiness release.** Enterprise compliance and governance layer: compliance mapping (EU AI Act, NIST AI RMF, ISO 42001, MITRE ATLAS), Norwegian regulatory context (Datatilsynet, NSM, Digitaliseringsdirektoratet), SARIF 2.1.0 output format (`--format sarif`), structured JSONL audit trail (`audit-trail.mjs`), AI-BOM generator (CycloneDX 1.6), policy-as-code (`.llm-security/policy.json`), standalone CLI (`bin/llm-security.mjs` — `node bin/llm-security.mjs scan`). Posture scanner expanded to 16 categories (+EU AI Act, NIST AI RMF, ISO 42001). Attack simulator benchmark mode (`--benchmark`). 15 knowledge docs, 16 scanners, 1242+ tests. |
|
||
| **5.1.0** | 2026-04-07 | **Sandboxed remote cloning.** Defense-in-depth for `git clone` attack surface: (1) 8 git config flags disable hooks, symlinks, filter/smudge drivers, fsmonitor, local file protocol; 4 env vars isolate from system/user config. (2) OS sandbox: macOS `sandbox-exec` + Linux `bubblewrap` restrict file writes to only the clone temp dir. Graceful fallback on Windows (git config only). Post-clone size check (100MB max). UUID-unique evidence filenames prevent race conditions. Cleanup guarantee in scan/plugin-audit commands. 1147 tests (was 1115). |
|
||
| **5.0.0** | 2026-04-06 | **Prompt Injection Hardening (v5.0).** 8-session defense-in-depth overhaul driven by 7 research papers (2025-2026). MEDIUM advisory for obfuscation signals (leetspeak, homoglyphs, zero-width, multi-language). Unicode Tag steganography detection (U+E0000-E007F). Bash expansion normalization (`bash-normalize.mjs`). Rule of Two enforcement (configurable `LLM_SECURITY_TRIFECTA_MODE=block\|warn\|off`). 100-call long-horizon monitoring window with slow-burn trifecta detection. Behavioral drift via Jensen-Shannon divergence. HITL trap detection (approval urgency, summary suppression, scope minimization). Sub-agent delegation tracking (escalation-after-input advisory). NL indirection patterns. Hybrid attacks (P2SQL, recursive injection, XSS-in-agent). CaMeL-inspired data flow tagging (SHA-256 provenance, output-to-input linking). Adaptive red-team (5 mutation rounds per scenario: homoglyph, encoding, zero-width, case alternation, synonym). Knowledge base expanded: `prompt-injection-research-2025-2026.md`, `deepmind-agent-traps.md`, `attack-mutations.json`. Posture scanner expanded to 13 categories (+Prompt Injection Hardening, Rule of Two, Long-Horizon Monitoring). Defense Philosophy section documenting honest limitations. 1115 tests. |
|
||
| **4.5.1** | 2026-04-04 | **Cross-platform support.** Windows/Linux compatibility: `fileURLToPath()`, `path.dirname()`, native `fetch()` replaces `curl` subprocess, fixed tilde expansion regex. 11 files, 782 tests pass. |
|
||
| **4.5.0** | 2026-04-04 | **Attack simulation / red-team mode.** New `attack-simulator.mjs` runs 38 crafted attack scenarios across 7 categories (secrets, destructive, supply-chain, prompt-injection, pathguard, mcp-output, session-trifecta) against the plugin's own hooks. Data-driven via `knowledge/attack-scenarios.json` with runtime payload assembly. New `/security red-team` command with `--category` filter. Capstone release: v4.0 roadmap complete (S1-S6). 18 commands, 16 scanners (10 orchestrated + 6 standalone). 782 tests. |
|
||
| **4.4.0** | 2026-04-03 | **Cross-project security dashboard.** New `dashboard-aggregator.mjs` discovers all Claude Code projects under ~/ (depth 3) and ~/.claude/plugins/, runs posture-scanner on each. Machine grade = weakest link. Cache in `~/.cache/llm-security/dashboard-latest.json` (24h staleness). New `/security dashboard` command. 17 commands, 15 scanners (10 orchestrated + 5 standalone). 751 tests. |
|
||
| **4.3.0** | 2026-04-03 | **Enhanced MCP session monitoring.** MCP description drift detection via `mcp-description-cache.mjs` — caches tool descriptions, alerts on >10% Levenshtein drift (OWASP MCP05 rug-pull). MCP-concentrated trifecta in `post-session-guard.mjs` — elevated severity when all 3 lethal trifecta legs trace to the same MCP server. Cumulative data volume tracking (100KB/500KB/1MB thresholds, OWASP ASI02). Per-MCP-tool volume tracking in `post-mcp-verify.mjs` (>100KB per tool = advisory). 735 tests. |
|
||
| **4.2.0** | 2026-04-03 | **Supply chain re-check scanner.** New `supply-chain-recheck.mjs` (prefix SCR) periodically re-audits installed dependencies from lockfiles against blocklists, OSV.dev batch API, and typosquat detection. Shared data module extracts blocklists from hook. New `/security supply-check` command. 16 commands, 14 scanners (10 orchestrated + 4 standalone). 700 tests. |
|
||
| **4.1.0** | 2026-04-03 | **Reference configuration generator.** New `/security harden` command generates Grade A security config based on posture scanner gaps. New `reference-config-generator.mjs` standalone scanner detects project type (plugin/monorepo/standalone) and generates `settings.json` (deny-first), CLAUDE.md security section, and `.gitignore` additions. `--dry-run` (default) shows JSON output; `--apply` writes files with backup. Post-apply verification re-runs posture scanner. Templates in `templates/reference-config/`. 15 commands, 12 scanners (9 orchestrated + 4 standalone). 670 tests. |
|
||
| **4.0.0** | 2026-04-03 | **Deterministic posture scanner.** New `posture-scanner.mjs` — standalone scanner (prefix PST) replacing Opus agent for `/security posture`. 10 categories assessed in <50ms (was ~6 min). Categories: Deny-First, Secrets, Path Guarding, MCP Trust, Destructive Blocking, Sandbox, Human Review, Plugin Sources, Session Isolation, Cognitive State Security. Reuses `scanForInjection()` and `gradeFromPassRate()`. `/security audit` now runs scanner first for instant data, then agents for narrative. 12 scanners (9 orchestrated + 3 standalone). 647 tests. |
|
||
| **3.1.1** | 2026-04-03 | **Memory poisoning scanner (Cognitive State Traps).** New `memory-poisoning-scanner.mjs` — scanner #9 in orchestrator (prefix MEM, OWASP: LLM01+ASI02). Detects 6 threat categories in CLAUDE.md, memory files, `.claude/rules`, REMEMBER.md, and `*.local.md`: injection patterns (via shared injection-patterns.mjs), shell commands in memory files, suspicious exfiltration URLs (webhook.site/ngrok/pipedream/etc.), credential path references (.ssh/.aws/id_rsa/credentials.json), permission expansion directives (bypassPermissions/dangerouslySkipPermissions), encoded payloads (base64 >40 chars, hex >64 chars). Posture assessor gains Category 10: Cognitive State Security. 11 scanners (9 orchestrated + 2 standalone). 606 tests (was 588). |
|
||
| **3.1.0** | 2026-04-03 | **AI Agent Traps defense.** Gap analysis against [AI Agent Traps](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6372438) (Franklin et al., Google DeepMind, 2025). New detections: HTML/CSS content obfuscation (6 patterns for `display:none`, `visibility:hidden`, off-screen positioning, zero font-size/opacity, `aria-label` injection), oversight evasion (9 patterns for educational/hypothetical/red-team/research framing), markdown syntactic masking (anchor text injection payloads). Encoding hardening: HTML entity decoding (named, decimal, hex), recursive multi-layer decode (max 3 iterations), letter-spacing collapse. `post-mcp-verify` hook gains HTML content trap detection for WebFetch/Read/MCP output. Knowledge base updated with Agent Traps taxonomy mapping. 588 tests (was 544). |
|
||
| **3.0.0** | 2026-04-03 | **Public release.** 8 sessions from v2.5→v3.0. New in v3: toxic flow analysis (TFA scanner — lethal trifecta detection via cross-component correlation), runtime session guard (PostToolUse trifecta monitoring with sliding window), MCP live inspection (JSON-RPC 2.0 connect to running servers), report diffing with baselines (fuzzy matching, new/resolved/moved), continuous scanning (watch command + cron wrapper), skill signature registry (SHA-256 fingerprinting + cache). 4 OWASP frameworks (LLM Top 10, Agentic AI, Skills, MCP). 15 commands, 8 hooks, 10 scanners (8 orchestrated + 2 standalone), 6 agents, 9 knowledge files, 544 tests. Architecture diagram added. |
|
||
| **2.9.2** | 2026-04-03 | **Skill signature registry.** New `skill-registry.mjs` library for SHA-256 fingerprinting of normalized skill content, scan result caching, and pattern search. New `/security registry` command with stats, scan+register, and search sub-commands. `/security scan` now checks registry before full scan — instant result for known fingerprints (7-day staleness threshold). Seed data in `knowledge/skill-registry.json`, active registry in `reports/skill-registry.json`. 15 commands, 9 knowledge files total. |
|
||
| **2.9.1** | 2026-04-03 | **Continuous/background scanning.** New `/security watch [path] [--interval 6h]` command uses the built-in /loop skill to run `/security diff` on a recurring interval. New `watch-cron.mjs` standalone script for system cron/launchd — reads multi-target config from `reports/watch/config.json`, writes summary to `reports/watch/latest.json`, exits with worst verdict code (0/1/2). 13 commands total. |
|
||
| **2.9.0** | 2026-04-03 | **Report diffing & baseline.** New `diff-engine.mjs` library for finding fingerprinting, fuzzy line matching (±3), and diff categorization (new/resolved/unchanged/moved). Scan orchestrator gains `--baseline` and `--save-baseline` flags. Baselines stored per target hash in `reports/baselines/`. New `/security diff` command compares current scan against stored baseline and shows delta. 12 commands total. |
|
||
| **2.8.1** | 2026-04-03 | **Auto update notifications.** New `update-check.mjs` UserPromptSubmit hook checks for newer plugin versions against the public Forgejo repo (max 1x/24h, cached in `~/.cache/llm-security/`). Notifies via systemMessage when a newer version is available. Disable: `LLM_SECURITY_UPDATE_CHECK=off`. 8 hooks total. |
|
||
| **2.8.0** | 2026-04-02 | **MCP Runtime Inspection.** New `mcp-live-inspect.mjs` standalone scanner connects to MCP stdio servers via JSON-RPC 2.0, fetches live tool/prompt/resource lists, scans descriptions for injection (MCP03, MCP06), tool shadowing across servers (MCP09), URL/IP in descriptions. New `/security mcp-inspect` command. `/security mcp-audit --live` flag for combined static + live analysis with cross-reference escalation. Scanner prefix: MCI. 9 scanners (8 orchestrated + 1 standalone), 11 commands total. |
|
||
| **2.7.1** | 2026-04-02 | **Runtime session guard hook.** PostToolUse hook monitoring tool call sequences for lethal trifecta (untrusted input + sensitive data access + exfiltration sink). Sliding window (20 calls), per-session JSONL state, advisory warning. 7 hooks total. |
|
||
| **2.7.0** | 2026-04-02 | **Toxic flow analysis scanner.** 8th deterministic scanner detecting lethal trifecta patterns in plugin component definitions. Post-processing correlator consuming output from all prior scanners. Direct, cross-component, and project-level trifecta detection with mitigation downgrades. |
|
||
| **2.6.0** | 2026-04-02 | **MEDIUM injection patterns + 4-framework OWASP mapping.** Added ~15 MEDIUM-severity patterns (base64 payloads, leetspeak, homoglyphs). Full OWASP mapping: LLM Top 10, Agentic AI Top 10 (ASI), Skills Top 10 (AST), MCP Top 10. New knowledge file `owasp-skills-top10.md`. 8 knowledge files total. |
|
||
| **2.5.0** | 2026-04-02 | **Pre-extraction indirection layer for remote scan defense.** Remote scans now pre-extract structured evidence via `content-extractor.mjs` and strip injection patterns BEFORE LLM agents see the content. Agents analyze a JSON evidence package, never raw files from untrusted repos. `[INJECTION-PATTERN-STRIPPED]` markers are confirmed findings. |
|
||
| **2.4.0** | 2026-04-01 | **GitHub repo URL support for scan and plugin-audit.** `scan` and `plugin-audit` accept `https://github.com/...` URLs directly. Clones to temp dir via `scanners/lib/git-clone.mjs`, scans locally, cleans up. `--branch <name>` flag for non-default branches. |
|
||
| **2.3.0** | 2026-04-01 | **PostToolUse expanded to ALL tools + configurable injection mode.** 498 tests (was 470). PostToolUse hook now scans Read, WebFetch, MCP, and all other tool output for indirect prompt injection (was Bash-only). Bash-specific checks (secrets, URLs, large output) preserved. Short output skip (<100 chars) for performance. `LLM_SECURITY_INJECTION_MODE` env var: `block` (default), `warn` (advisory-only), `off` (disable). Complementary Tools section documenting parry-guard, Lasso, Snyk compatibility. CLAUDE.md poisoning gap documented as known limitation. |
|
||
| **2.2.0** | 2026-04-01 | **Prompt injection runtime defense (Gaps 1-3).** 470 tests (was 383). New `UserPromptSubmit` hook blocks injection in user input. `post-mcp-verify` extended with indirect injection scanning in tool output (LLM01). Obfuscation decoding: unicode-escape, hex-escape, URL-encoding, base64 normalization before pattern matching. Shared `injection-patterns.mjs` module with 21 critical + 8 high patterns from skill-scanner-agent Category 1. LLM01 coverage 83%->95%, LLM05 80%->83%. |
|
||
| **2.1.0** | 2026-04-01 | 383 tests (was 177): full hook coverage (66 tests), auto-cleaner coverage (140 tests), auto-cleaner import guard fix, solo project (CONTRIBUTING.md removed), HTTPS install URL under fromaitochitta org, model defaults set to sonnet |
|
||
| **2.0.0** | 2026-03-31 | Open-source release: MIT LICENSE, SECURITY.md, test suite (`node:test`), path guarding hook (`pre-write-pathguard.mjs`), supply chain hook documentation, version alignment, `.gitignore`, `.editorconfig` |
|
||
| **1.4.0** | 2026-02-21 | Unified risk scoring formula (25/10/4/1 weights), score-based verdicts, risk bands (Low→Extreme), OWASP categorization, A-F grading function, single `unified-report.md` template replacing 9 separate templates with conditional sections per analysis type |
|
||
| **1.3.0** | 2026-02-21 | `/security clean` command with 3-tier remediation (auto/semi-auto/manual), `auto-cleaner.mjs` engine (16 fix operations, atomic writes, post-fix validation), `cleaner-agent` for semi-auto proposals, `clean-report.md` template, `--dry-run` flag |
|
||
| **1.2.0** | 2026-02-19 | 7 deterministic Node.js scanners (unicode, entropy, permissions, dependencies, taint, git forensics, network), deep-scan command + `--deep` flag, synthesizer agent, shared scanner library, demo fixture with 85-finding security assessment, OWASP coverage improvements (LLM01 70→85%, LLM02 90→95%, LLM03 80→90%, LLM06 85→95%) |
|
||
| **1.1.0** | 2026-02-19 | Plugin audit command (`/security plugin-audit`), MCP audit command (`/security mcp-audit`), pre-deployment checklist (`/security pre-deploy`), 3 new report templates, updated OWASP coverage (LLM03 75%→80%) |
|
||
| **1.0.0** | 2026-02-19 | Initial release — 4 agents, 4 hooks, 6 knowledge files (2,771 lines), 8 commands, 7 report templates. OWASP LLM Top 10 + Agentic AI Top 10 coverage |
|
||
|
||
---
|
||
|
||
## License & Attribution
|
||
|
||
This project is licensed under the [MIT License](LICENSE).
|
||
|
||
Knowledge base files in `knowledge/` are derived from published [OWASP](https://owasp.org/) standards and security research papers. OWASP content is used under the [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/) license.
|
||
|
||
Threat intelligence sources: [AI Agent Traps](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6372438) (Franklin et al., Google DeepMind, 2025), [ToxicSkills](https://arxiv.org/abs/2502.01063) (Xi'an Jiaotong, 2025), [ClawHavoc](https://blog.repello.ai/clawhavoc-framework) (Repello AI, 2025), [MCPTox](https://invariantlabs.ai/blog/mcp-security) (Invariant Labs, 2025), [Pillar Security MCP Research](https://www.pillar.security/blog/the-mcp-security-landscape) (2025), [Operant AI Agentic Security](https://www.operant.ai/) (2025).
|
||
|
||
The plugin architecture, scan pipeline, threat detection patterns, and security assessment methodology are original work.
|
||
|
||
Part of [From AI to Chitta](https://fromaitochitta.com). Source: [git.fromaitochitta.com/open/claude-code-llm-security](https://git.fromaitochitta.com/open/claude-code-llm-security).
|
||
|
||
## Feedback & Requests
|
||
|
||
- **Bug reports:** [Open an issue](https://git.fromaitochitta.com/open/claude-code-llm-security/issues) on Forgejo
|
||
- **Feature requests:** [Open an issue](https://git.fromaitochitta.com/open/claude-code-llm-security/issues) with a `[Request]` prefix
|
||
- **Security vulnerabilities:** See [SECURITY.md](SECURITY.md) — do not open a public issue
|
||
- **General questions:** Email security@fromaitochitta.com or use the [contact form](https://fromaitochitta.com)
|
||
|
||
## Contributing
|
||
|
||
This is a solo project. See [Feedback & Requests](#feedback--requests) for how to report bugs or suggest features. Pull requests are not accepted.
|
||
|
||
> Microsoft and OWASP product names are trademarks of their respective owners. This project is not endorsed by or affiliated with any referenced organization.
|