ktg-plugin-marketplace/plugins/llm-security/README.md
Kjell Tore Guttormsen b7d64a6d2b docs(llm-security): tre doc-nivåer oppdatert for v7.6.1
CLAUDE.md OBLIGATORISK-regel: enhver feature-endring som pusher til
Forgejo MÅ oppdatere alle tre doc-nivåer i SAMME commit eller umiddelbart
etter. v7.6.1-fix-commit (f9b555a) bumpet kun versjons-badgen — denne
oppfølgings-commit-en lukker doc-gapet.

- plugins/llm-security/README.md: ny [7.6.1] history-tabell-rad
- plugins/llm-security/CLAUDE.md: header bumpet v7.6.0 → v7.6.1 +
  ny v7.6.1-blurb (alle 6 fix-detaljer)
- README.md (rot): llm-security versjons-rad bumpet v7.6.0 → v7.6.1 +
  v7.6.1 history-bullet over v7.6.0-bullet

Ingen kodeendringer.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-06 14:44:55 +02:00

659 lines
50 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# LLM Security Plugin for Claude Code
> Automated defense and advisory analysis for the agentic AI attack surface.
> **Solo-maintained, fork-and-own.** This plugin is a starting point, not a vendor product. Issues are welcome as signals; pull requests are not accepted. See [GOVERNANCE.md](GOVERNANCE.md) for the full model and what upstream provides.
*AI-generated: all code produced by Claude Code through dialog-driven development. [Full disclosure →](../../README.md#ai-generated-code-disclosure)*
![Version](https://img.shields.io/badge/version-7.6.1-blue)
![Platform](https://img.shields.io/badge/platform-Claude_Code_Plugin-purple)
![Commands](https://img.shields.io/badge/commands-20-orange)
![Agents](https://img.shields.io/badge/agents-6-orange)
![Scanners](https://img.shields.io/badge/scanners-23-cyan)
![Hooks](https://img.shields.io/badge/hooks-9-red)
![Knowledge](https://img.shields.io/badge/knowledge_docs-22-green)
![Tests](https://img.shields.io/badge/tests-1822-success)
![License](https://img.shields.io/badge/license-MIT-lightgrey)
A Claude Code plugin that provides security scanning, auditing, and threat modeling for agentic AI projects. Built on [OWASP LLM Top 10 (2025)](https://genai.owasp.org/llm-top-10/), [OWASP Agentic AI Top 10 (ASI01-ASI10)](https://genai.owasp.org/agentic-ai/), OWASP Skills Top 10 (AST01-AST10), MCP Top 10, and the [AI Agent Traps](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6372438) taxonomy (Google DeepMind, 2025), grounded in published research from ToxicSkills, ClawHavoc, MCPTox, Pillar Security, Invariant Labs, GHSL Security Lab, and Operant AI.
---
## Why this exists
Claude Code's extensibility model — skills, MCP servers, plugins, hooks, IDE extensions — creates an attack surface that mirrors the npm/PyPI supply chain problem with one critical difference: **extensions run with LLM agency**. A malicious plugin doesn't just execute code in a sandbox. It can instruct the agent to read your SSH keys, exfiltrate environment variables, install persistence mechanisms, and modify its own configuration — all while appearing to be a helpful "Project Health Dashboard."
This is not theoretical. ToxicSkills (Xi'an Jiaotong, 2025), ClawHavoc (Repello AI, 2025), GHSL-class workflow injections, and the November 2024 npm/PyPI typosquat campaigns documented real attack patterns. OWASP, NIST, and the EU AI Act now formalize the controls needed.
This plugin layers three independent kinds of defense — **runtime hooks** that block, **deterministic scanners** that compute, and **LLM-driven advisory commands** that judge — so failures in any one layer are caught by the others.
> [!IMPORTANT]
> **Scan repos remotely before cloning.** A poisoned `CLAUDE.md` injects instructions into the model context the moment you open a cloned repo — before any hook can intervene. `/security scan https://repo-url --deep` analyses everything safely via pre-extraction, without loading anything into your session. This is the primary defense against `CLAUDE.md` poisoning.
---
## Quick Start
### Prerequisites
- [Claude Code](https://docs.anthropic.com/en/docs/claude-code) v2.x+
- Node.js (any recent LTS — required for hook scripts)
### Install
```bash
claude plugin marketplace add https://git.fromaitochitta.com/open/ktg-plugin-marketplace.git
```
Or enable directly in `~/.claude/settings.json`:
```json
{
"enabledPlugins": {
"llm-security@ktg-plugin-marketplace": true
}
}
```
Hooks activate immediately on install. Secret detection, path guarding, prompt-injection scanning, destructive-command blocking, supply-chain guardrails, and runtime trifecta detection start working without any commands.
### First scan
```
> /security posture
┌──────────────────────────────────────────────┐
│ Security Posture: 8/16 [B] 77% │
├──────────────────────────────────────────────┤
│ ✅ Deny-First Config │
│ ✅ Secrets Protection │
│ ⚠️ MCP Server Trust │
│ ✅ Destructive Command Blocking │
│ ⚠️ Sandbox Config │
│ ✅ Prompt Injection Hardening │
│ ⚠️ Rule of Two │
│ ✅ EU AI Act │
│ ⚠️ NIST AI RMF │
│ — ISO 42001 │
├──────────────────────────────────────────────┤
│ 6 findings (1 high, 3 medium, 2 low) │
└──────────────────────────────────────────────┘
```
> [!TIP]
> Start with `/security posture` for a 30-second baseline, then `/security audit` for the full picture, or `/security scan <target>` for supply-chain gating.
> [!IMPORTANT]
> **Opus extended-context users:** subagents inherit the parent session's context limit but do not support extended context, causing API errors. Run `/model Opus` before using security commands to reset to the standard 200 K context window subagents handle correctly.
---
## What's inside
```mermaid
flowchart TB
subgraph Runtime["Runtime defense — 9 hooks"]
direction LR
H1["UserPromptSubmit<br/>injection scan"]
H2["PreToolUse<br/>secrets · paths · bash · supply chain"]
H3["PostToolUse<br/>output verify · session guard"]
H4["PreCompact<br/>transcript scan"]
end
subgraph Scanning["Deterministic analysis — 23 scanners"]
direction LR
S1["UNI · ENT · PRM · DEP<br/>TNT · GIT · NET · MEM · SCR · TFA"]
S2["WFL workflow scanner"]
S3["MCI · IDE · PST · BOM<br/>+ standalone CLIs"]
end
subgraph Advisory["Advisory analysis — 6 agents · 20 commands"]
direction LR
A1["Skill scanner<br/>7 threat categories"]
A2["MCP scanner<br/>5-phase analysis"]
A3["Posture · audit<br/>16 categories, A-F"]
A4["Threat model<br/>STRIDE × MAESTRO"]
end
subgraph Knowledge["Knowledge base — 22 files"]
direction LR
K1["5 OWASP frameworks<br/>+ DeepMind Agent Traps"]
K2["Threat patterns<br/>skills · MCP · workflows · IDE · secrets"]
K3["Compliance · research<br/>registry · packages"]
end
Runtime -->|"blocks/warns in real time"| User["Claude Code session"]
User -->|"/security scan"| Scanning
User -->|"/security audit"| Advisory
Advisory -.->|"grounded by"| Knowledge
Scanning -->|"enriches"| Advisory
```
Each layer is independent. A failure in one (e.g. an injection that slips past the prompt scanner) gets a second chance from the others (e.g. the trifecta session guard catching the attempted exfiltration step downstream).
---
## Commands
20 slash commands grouped by purpose. All accept `path` or GitHub URL targets unless noted.
### Scanning & assessment
| Command | Description |
|---------|-------------|
| `/security` | Router with quick-start guide |
| `/security scan [path\|url]` | Supply-chain gate — ALLOW/WARNING/BLOCK verdict on skills, MCP servers, directories, or remote repos |
| `/security scan [path\|url] --deep` | Adds 10 deterministic scanners on top of the LLM agents |
| `/security deep-scan [path]` | Run only the 10 orchestrated deterministic scanners. Supports `--fail-on <severity>`, `--compact`, `--format sarif`, `--output-file <path>` |
| `/security audit` | Full project audit, A-F grade, prioritized action plan |
| `/security plugin-audit [path\|url]` | Plugin trust assessment with Install/Review/Do Not Install verdict |
| `/security mcp-audit [--live]` | Audit installed MCP server configs (`--live` adds runtime inspection) |
| `/security mcp-inspect` | Connect to running MCP stdio servers and scan live tool descriptions via JSON-RPC 2.0 |
| `/security mcp-baseline-reset` | Clear cumulative-drift baseline cache after a legitimate MCP server upgrade (E14, v7.3.0) |
| `/security ide-scan [target\|url]` | Scan installed VS Code (+ Cursor, Windsurf, VSCodium, code-server) and JetBrains extensions, OR fetch a remote VSIX from VS Code Marketplace, OpenVSX, direct `.vsix`, or JetBrains Marketplace. 7 VS Code + 7 JetBrains-specific checks plus UNI/ENT/NET/TNT/MEM/SCR per extension |
| `/security posture` | 30-second scorecard across 16 categories incl. EU AI Act, NIST AI RMF, ISO 42001 |
| `/security diff [path]` | Compare scan against stored baseline — new/resolved/unchanged/moved findings |
| `/security watch [path] [--interval 6h]` | Continuous monitoring via `/loop` |
| `/security registry [scan\|search]` | Skill signature registry — view stats, scan-and-register, search known fingerprints |
| `/security supply-check [path]` | Re-audit installed dependencies from lockfiles against blocklists, OSV.dev, and typosquats |
| `/security dashboard` | Cross-project security dashboard — machine-grade aggregation across all projects under `~/` |
### Remediation
| Command | Description |
|---------|-------------|
| `/security clean [path]` | Three-tier remediation pipeline — auto-fix safe issues, confirm semi-auto with the user, report manual findings |
| `/security clean [path] --dry-run` | Preview without modifying files |
| `/security harden [path]` | Generate Grade A reference config (`settings.json`, `CLAUDE.md`, `.gitignore`) |
| `/security harden [path] --apply` | Apply with automatic backup |
### Threat modeling & planning
| Command | Description |
|---------|-------------|
| `/security threat-model` | Interactive STRIDE × MAESTRO 7-layer session, 15-30 min |
| `/security red-team [--category] [--adaptive]` | Attack simulation — 72 scenarios across 12 categories, 100 % block rate. `--adaptive` applies 5 mutation rounds per blocked scenario for evasion testing |
| `/security pre-deploy` | Pre-deployment checklist — 10 automated + 3 manual checks |
### Remote scanning safely
`/security scan` and `/security plugin-audit` accept GitHub and Forgejo URLs directly. The plugin clones to a temp directory inside an OS sandbox, scans, and cleans up.
```
/security scan https://github.com/org/repo --deep
/security scan https://github.com/org/repo --branch dev --deep
```
**Defense-in-depth on the clone path** (v5.1+):
| Layer | Mechanism | Mitigates |
|-------|-----------|-----------|
| Git config hardening | `core.hooksPath=/dev/null`, `core.symlinks=false`, all `filter.lfs.*` neutralized, `protocol.file.allow=never`, `transfer.fsckObjects=true`, plus `GIT_CONFIG_NOSYSTEM=1` and friends | Git hooks at clone, symlink traversal, filter/smudge driver code execution via `.gitattributes` (CVE-2024-32002 class), local-file protocol traversal, malformed objects |
| OS filesystem sandbox | macOS `sandbox-exec` (Seatbelt) or Linux `bwrap` (bubblewrap) restricts writes to the per-clone temp dir | Even if a filter driver bypasses git config hardening, the kernel refuses writes outside the sandbox |
| `.gitattributes` post-clone advisory (E12, v7.3.0) | `scanGitAttributes()` scans for `filter=` / `diff=` / `merge=` driver directives and emits MEDIUM advisories | Surfaces driver-based supply-chain surface that survives even a sandboxed clone |
| Pre-LLM injection strip | `content-extractor.mjs` produces a structured JSON evidence package; `[INJECTION-PATTERN-STRIPPED]` markers are confirmed findings | Agents never see raw poisoned files from untrusted repos |
| Post-clone size cap | 100 MB max | Resource-exhaustion attacks |
Windows has no kernel-level sandbox equivalent. Run Claude Code inside WSL2 or Docker Desktop for full coverage; the git config hardening alone is sufficient against all known `.gitattributes` attack vectors.
---
## Automated hooks (9)
Hooks run on every operation — no commands needed. They activate the moment the plugin is installed.
| Hook | Event | What it does |
|------|-------|--------------|
| **Prompt injection scan** | UserPromptSubmit | Blocks direct injection (override instructions, spoofed system headers, identity redefinition) and warns on subtle signals (leetspeak, homoglyphs, zero-width chars, multi-language). Decodes obfuscated payloads (Unicode Tag, hex, URL, base64, rot13) before matching. Mode: `LLM_SECURITY_INJECTION_MODE=block\|warn\|off` (default block) |
| **Secret detection** | Edit, Write | Blocks AWS keys, Azure tokens, GitHub PATs, npm tokens, PEM keys, database URLs, Bearer tokens, and 30+ other secret patterns |
| **Path guarding** | Write | Blocks writes to `.env*` (multi-segment-suffix-safe), `.ssh/`, `.aws/`, `.gnupg/`, credentials files, hook scripts, `/etc/`, `settings.json` |
| **Destructive commands** | Bash | Blocks `rm -rf /`, `chmod 777`, pipe-to-shell, fork bombs, eval-with-substitution, T8 base64-pipe-shell loaders. Bash-normalize T1-T9 collapses obfuscation (empty quotes, `${IFS}`, ANSI-C hex, process substitution, eval-via-variable) before pattern matching |
| **Supply-chain guardrail** | Bash | Blocks known-compromised npm/pip packages, Levenshtein typosquats, age-gated installs (<72 h), OSV.dev CVE checks. Covers npm, pip, brew, docker, go, cargo, gem. v7.3.0: npm scope-hop typosquat advisory (E13) — `@evil/lodash`-class catches scope-jumping when the unscoped name matches a popular package |
| **Output verification** | All tools (post) | Advisory: scans ALL tool output for indirect injection (LLM01) and HITL traps (DeepMind kat. 6). Bash-specific: leaked secrets, unexpected URLs, oversized MCP responses. v7.3.0: per-update MCP description drift AND cumulative drift vs sticky baseline (E14) — slow-burn rug-pulls that stay under per-update thresholds but cumulatively diverge ≥25% emit `mcp-cumulative-drift` MEDIUM |
| **Session guard** | All tools (post) | Advisory: monitors tool-call sequences for the lethal trifecta (untrusted input + sensitive read + exfiltration sink). 20-call sliding window + 100-call long-horizon window. Mode: `LLM_SECURITY_TRIFECTA_MODE=block\|warn\|off`. Sub-agent delegation tracking via Task/Agent tools surfaces escalation-after-input as a separate advisory |
| **Pre-compact scan** | PreCompact | Scans transcript tail (max 512 KB, <500 ms) for injection patterns + credentials before context compaction. Prevents poisoned content from surviving in compact form. Mode: `LLM_SECURITY_PRECOMPACT_MODE=block\|warn\|off` (default warn) |
| **Update check** | UserPromptSubmit | Checks for newer plugin versions max 1× / 24 h, cached. Disable: `LLM_SECURITY_UPDATE_CHECK=off` |
All hooks are Node.js `.mjs` for cross-platform compatibility (macOS, Linux, Windows).
> [!IMPORTANT]
> Five hooks are **blocking** (prompt injection, secrets, path guarding, destructive commands, supply chain). Four are **advisory** (output verification, session guard, pre-compact, update check). Blocking modes can be downgraded via env-vars or `policy.json` for security research or staged rollouts.
---
## Deterministic scanners
23 scanners. Zero external dependencies. All output JSON.
### Orchestrated (10) — run via `node scanners/scan-orchestrator.mjs <target>` or `/security deep-scan`
| Scanner | Prefix | Detects | OWASP |
|---------|--------|---------|-------|
| `unicode-scanner.mjs` | UNI | Zero-width chars, Unicode Tag steganography (incl. PUA-A/B), BIDI overrides, Cyrillic/Greek homoglyphs (NFKC fold) | LLM01 |
| `entropy-scanner.mjs` | ENT | High-entropy strings, base64/hex blobs, encoded payloads via Shannon entropy. Two-stage context classification suppresses GLSL/CSS/inline-SVG/markdown-CDN false positives | LLM01, LLM03 |
| `permission-mapper.mjs` | PRM | Purpose-vs-tools mismatch, ghost hooks, haiku on sensitive agents, overprivileged components | LLM06 |
| `dep-auditor.mjs` | DEP | CVEs (npm/pip audit + OSV.dev), Levenshtein + token-overlap typosquats, malicious install scripts, unpinned versions | LLM03 |
| `taint-tracer.mjs` | TNT | Source-to-sink data flow (process.env / req.body → eval / exec / fetch / writeFile), 3-pass analysis, destructuring + spread support | LLM01, LLM02 |
| `git-forensics.mjs` | GIT | Force pushes, description drift, hook modifications, new outbound URLs, author changes | LLM03 |
| `network-mapper.mjs` | NET | Undisclosed URLs, suspicious domains (ngrok, webhook.site), IP-based URLs, DNS analysis | LLM02, LLM03 |
| `memory-poisoning-scanner.mjs` | MEM | Injection patterns, shell commands, credential paths, permission expansion, suspicious URLs in `CLAUDE.md` / memory / `.claude/rules` / `.claude/agents/*.md` | LLM01, ASI02 |
| `supply-chain-recheck.mjs` | SCR | Re-audit installed deps from lockfiles against blocklists, OSV.dev batch API, typosquats | LLM03 |
| `toxic-flow-analyzer.mjs` | TFA | Lethal trifecta correlation across prior scanner output (runs last) | ASI01, ASI02, ASI05 |
### Workflow & live (3, run independently)
| Scanner | Prefix | Detects | OWASP |
|---------|--------|---------|-------|
| `workflow-scanner.mjs` (E11, v7.3.0) | WFL | GitHub Actions and Forgejo Actions injection — dangerous `${{ <field> }}` interpolations inside `run:` blocks across a 23-field GHSL+GlueStack-class blacklist; sink-restricted (only `run:` is a shell sink); severity matrix grades by trigger privilege; tracks env-block re-interpolation (Appsmith GHSL-2024-277 stealth pattern); flags `actor == bot[bot]` auth-bypass (Synacktiv 2023 Dependabot class) | LLM02, LLM06 |
| `mcp-live-inspect.mjs` | MCI | Connects to running MCP servers via JSON-RPC 2.0 and scans live tool descriptions for injection, shadowing, drift | LLM01, LLM02 |
| `ide-extension-scanner.mjs` | IDE | VS Code (+ forks) and JetBrains plugin prescan: blocklist, theme-with-code, sideload, broad activation, typosquat, extension-pack expansion, dangerous uninstall hooks; `Premain-Class` instrumentation; native binaries; nested-jar inspection | LLM01-03, LLM06, ASI02, ASI04 |
### Standalone utilities (10)
| Scanner | Purpose |
|---------|---------|
| `posture-scanner.mjs` | Deterministic posture assessment, 16 categories, <50 ms |
| `attack-simulator.mjs` | Red-team harness: 72 scenarios, 12 categories, fixed + adaptive modes, benchmark output |
| `ai-bom-generator.mjs` | CycloneDX 1.6 AI Bill of Materials |
| `dashboard-aggregator.mjs` | Cross-project security dashboard with weakest-link machine grade |
| `reference-config-generator.mjs` | Grade A config generation based on posture gaps |
| `mcp-baseline-reset.mjs` | Clear cumulative-drift baseline cache (`--list` / `--target <tool>` / clear-all) |
| `auto-cleaner.mjs` | Remediation engine — 16 fix operations, atomic writes, post-fix validation |
| `content-extractor.mjs` | Pre-extracts evidence from untrusted repos and strips injection patterns before LLM exposure |
| `watch-cron.mjs` | Cron wrapper for background scanning |
| `scan-orchestrator.mjs` | Entry point that runs all 10 orchestrated scanners |
**Why deterministic?** LLMs are powerful at semantic analysis — intent, social engineering, context. They cannot reliably calculate Shannon entropy, measure Levenshtein distance between package names, trace taint flow across function boundaries, or detect individual Unicode codepoints. These scanners fill that gap.
### MCP cumulative drift baseline (E14, v7.3.0)
`scanners/lib/mcp-description-cache.mjs` anchors a sticky `baseline` description per MCP tool plus a rolling 10-event history. Cumulative drift = `levenshtein(current, baseline) / max(|current|, |baseline|)`. When the ratio crosses `mcp.cumulative_drift_threshold` (default 0.25), `post-mcp-verify.mjs` emits a MEDIUM `mcp-cumulative-drift` advisory — independent of the existing per-update >10% drift signal. Slow-burn rug-pulls that keep each update under the per-update threshold but cumulatively diverge from baseline are now caught.
The baseline survives the 7-day TTL purge so detection persists across the full window. After a legitimate MCP server upgrade, run `/security mcp-baseline-reset` (or `node scanners/mcp-baseline-reset.mjs --target <tool>`) to clear the stale baseline. The next call seeds a fresh baseline; description, firstSeen, lastSeen, and history are preserved across reset for audit. `LLM_SECURITY_MCP_CACHE_FILE` overrides the cache path for testing.
---
## Agents (6)
Specialized analysts spawned by commands. Read-only by default; `clean` and `harden` grant Edit/Write under explicit user confirmation.
| Agent | Role | Spawned by |
|-------|------|------------|
| `skill-scanner-agent` | 7 threat categories (injection, exfiltration, privilege escalation, scope creep, hidden instructions, toolchain manipulation, persistence) for skills/commands/agents | `scan`, `audit`, `plugin-audit` |
| `mcp-scanner-agent` | 5-phase MCP analysis (tool descriptions, source code, dependencies, configuration, rug pull detection) | `scan`, `mcp-audit` |
| `posture-assessor-agent` | Full audit narrative with PASS/PARTIAL/FAIL scoring and A-F grading (the deterministic `posture-scanner.mjs` handles quick mode) | `audit`, `posture` |
| `threat-modeler-agent` | Interactive STRIDE × MAESTRO interview, 5-phase workflow | `threat-model` |
| `deep-scan-synthesizer-agent` | Interprets deterministic scanner JSON into a human-readable report with executive summary + prioritized recommendations | `deep-scan`, `scan --deep` |
| `cleaner-agent` | Generates semi-auto remediation proposals for findings requiring human judgment (returns JSON proposals; `clean.md` performs the edits after user approval) | `clean` |
All agents run on Opus and reference the knowledge base for grounding. Agents are spawned sequentially to avoid burst rate limits.
---
## Knowledge base (22 files)
All analysis is grounded in published threat intelligence. The knowledge files are read by agents at scan time, not loaded preemptively.
| Category | Files |
|----------|-------|
| **OWASP frameworks** | `owasp-llm-top10.md`, `owasp-agentic-top10.md`, `owasp-skills-top10.md`, `mcp-threat-patterns.md` (9 categories), `mitigation-matrix.md` |
| **Threat patterns** | `skill-threat-patterns.md` (7 categories from ToxicSkills/ClawHavoc), `secrets-patterns.md` (30+ regex), `ide-extension-threat-patterns.md` (10 categories with 2024-2026 case studies), `workflow-injection-patterns.md` (23-field blacklist + Forgejo divergences) |
| **Research** | `prompt-injection-research-2025-2026.md` (7 papers), `deepmind-agent-traps.md` (6 categories, 43 techniques), `attack-scenarios.json` (72 red-team scenarios), `attack-mutations.json` (synonym tables for adaptive testing) |
| **Compliance** | `compliance-mapping.md` (EU AI Act, NIST AI RMF, ISO 42001, MITRE ATLAS), `norwegian-context.md` (Datatilsynet, NSM, Digitaliseringsdirektoratet) |
| **Reference data** | `top-packages.json` (top 200 npm + 100 PyPI), `top-vscode-extensions.json`, `top-jetbrains-plugins.json`, `typosquat-allowlist.json`, `marketplace-api-notes.md`, `jetbrains-marketplace-api-notes.md`, `skill-registry.json` |
---
## Coverage at a glance
**OWASP LLM Top 10 (2025) — control-count coverage from `knowledge/mitigation-matrix.md`:**
| Category | Hooks | Scanners | Commands | Coverage |
|----------|:-----:|:--------:|:--------:|:--------:|
| LLM01 Prompt Injection | ✅ | UNI + ENT + TNT | scan, audit | 95 % |
| LLM02 Sensitive Info Disclosure | ✅ | TNT + NET | audit | 83 % |
| LLM03 Supply Chain | ◐ | ENT + DEP + GIT + NET | scan, plugin-audit, mcp-audit, supply-check | 60 % |
| LLM04 Data Poisoning | — | — | threat-model | 40 % |
| LLM05 Improper Output Handling | ✅ | — | audit | 83 % |
| LLM06 Excessive Agency | ✅ | PRM + WFL | posture | 100 % |
| LLM07 System Prompt Leakage | — | — | audit | 60 % |
| LLM08 Vector/Embedding Weaknesses | — | — | threat-model | 40 % |
| LLM09 Misinformation | — | — | advisory | 50 % |
| LLM10 Unbounded Consumption | — | — | pre-deploy | 83 % |
Average ~69 %. Strongest at prompt injection (95 % with input + output scanning + obfuscation decoders) and agency controls (100 %). Weakest at LLM04/08, which are better addressed at the model-provider or platform level. `/security threat-model` and `/security pre-deploy` surface the gaps advisorily.
**Agentic and skill frameworks** — full ASI01-ASI10 and AST01-AST10 mapping in `knowledge/owasp-agentic-top10.md` and `knowledge/owasp-skills-top10.md`.
---
## Compliance & governance
| Capability | Detail |
|------------|--------|
| **Compliance mapping** | EU AI Act (Art. 9, 15, 17), NIST AI RMF (Map / Measure / Manage / Govern), ISO 42001 (Annex A), MITRE ATLAS techniques. Posture categories 14-16 assess readiness |
| **Norwegian context** | Datatilsynet DPIA-for-AI guidance, NSM basic security principles, Digitaliseringsdirektoratet — relevant for Norwegian public-sector deployments |
| **SARIF 2.1.0 output** | `--format sarif` on scan / deep-scan produces OASIS SARIF for CI/CD ingestion (GitHub Advanced Security, Azure DevOps, SonarQube) |
| **Structured audit trail** | JSONL events with ISO 8601 timestamps and OWASP category tags (`LLM_SECURITY_AUDIT_*` env-vars or `audit.log_path` policy key) — SIEM-ready |
| **AI-BOM** | CycloneDX 1.6 BOM for AI components — models, MCP servers, plugins, knowledge files, hooks (`llm-security audit-bom <target>`) |
| **Policy-as-code** | `.llm-security/policy.json` ships hook configuration with the team. v7.3.0 (D3) adds a one-time-per-process stderr deprecation line when both an env-var AND its `policy.json` equivalent are explicitly set; env still wins through the v7.x runway, env reads removed in v8.0.0. Suppress noise with `LLM_SECURITY_DEPRECATION_QUIET=1` |
| **Standalone CLI** | `node bin/llm-security.mjs scan <target>` — runs scanners without Claude Code. Subcommands: `scan`, `deep-scan`, `posture`, `audit-bom`, `benchmark`. Schrems II compatible in default offline mode (optional OSV.dev enrichment is the only network call and is opt-in) |
| **CI/CD integration** | `--fail-on <severity>` for threshold-based exit codes, `--compact` for one-liner output. Templates for GitHub Actions, Azure DevOps, GitLab CI in `ci/`. Guide: `docs/ci-cd-guide.md` |
### Benchmarks
`/security red-team` (also `llm-security benchmark`) tests hook defenses with 72 crafted scenarios across 12 categories. Adaptive mode applies 5 mutation rounds per blocked scenario (homoglyph substitution, encoding wrapping, zero-width injection, case alternation, synonym replacement). Current block rate: 100 % fixed mode.
---
## Workflow examples
### 1 — pre-installation gate
```
/security scan path/to/plugin # ALLOW/WARNING/BLOCK
/security plugin-audit path/to/plugin # Install/Review/Do Not Install
# Remote — scans without installing
/security scan https://github.com/org/repo --deep
/security plugin-audit https://github.com/org/repo
```
### 2 — monthly review
```
/security posture # 30-second baseline
/security audit # full A-F grade with action items
# → fix critical/high
/security posture # verify improvement
```
### 3 — track over time
```
/security diff path/to/project # first run creates baseline
/security watch path/to/project # continuous, runs diff every 6h via /loop
```
### 4 — deep threat analysis
```
/security threat-model # 15-30 min STRIDE × MAESTRO interview
/security audit # verify current controls vs identified threats
/security pre-deploy # 10 automated + 3 manual checks
```
### 5 — remediation
```
/security clean path/to/project --dry-run # preview
/security clean path/to/project # auto + semi-auto + manual report
/security harden path/to/project --apply # Grade A reference config
```
---
## What this plugin does NOT cover
| Area | Why | Alternative |
|------|-----|-------------|
| Post-clone `CLAUDE.md` poisoning | Once a repo is cloned, `CLAUDE.md` loads into the system prompt *before* any hook runs. Platform limitation, no hook-based fix. | **Always scan repos remotely before cloning** with `/security scan <url> --deep`. For repos already cloned: review `CLAUDE.md` before opening |
| ML-based injection classification | Regex patterns cannot catch novel phrasings or adversarial paraphrasing. Joint paper (14 researchers, 2025) reports 95-100 % ASR against all 12 tested defenses for motivated adaptive attackers | Use [parry-guard](https://github.com/vaporif/parry) (DeBERTa v3 + Llama Prompt Guard 2) alongside this plugin. No conflict |
| Enterprise SSO / SCIM | Platform-level configuration | Anthropic Admin Console |
| RAG infrastructure | Vector DB / embedding pipeline security | Dedicated RAG security tools |
| LLM gateway / proxy | Network infrastructure layer | API gateway solutions |
| SIEM integration | Organization security stack | Splunk, Sentinel, etc. — but the JSONL audit trail is SIEM-ready |
| General agent scheming detection | The session guard catches the lethal trifecta as a known sequence; novel hidden-goal pursuit remains fundamentally hard for any tool. | Trifecta + delegation tracking provide partial coverage; full scheming detection requires monitoring + human oversight |
These gaps are surfaced advisorily through `/security threat-model` and `/security pre-deploy`.
### Complementary tools
| Tool | What it adds |
|------|--------------|
| [parry-guard](https://github.com/vaporif/parry) | ML injection classification (DeBERTa v3 + Llama Prompt Guard 2 86M, Rust, fail-closed). Catches what regex misses |
| [Lasso claude-hooks](https://github.com/lasso-security/claude-hooks) | Different philosophy: 96 patterns across 5 categories, warn-and-continue. Both can run in the same hook chain |
| [Snyk agent-scan](https://github.com/snyk/agent-scan) | Commercial skills/MCP scanning with a larger training set (3 984 skills analyzed) |
> [!TIP]
> Recommended combo: **llm-security** (breadth — static + supply chain + audit + posture + threat modeling) + **parry-guard** (depth — ML injection classification). Different layers, no conflict.
---
## Project scope
This is a **solo open-source project in stabilization mode** as of 2026-05-01.
The current feature set (5 frameworks, 23 scanners, 9 hooks, 6 agents,
20 commands, 22 knowledge files, 1822+ tests including a dedicated end-to-end suite) is the natural plateau for
what a deterministic + advisory plugin can defend against without crossing
into commercial-grade territory. Going forward, work focuses on:
- **Bug fixes** and security patches
- **Compatibility** with new Claude Code releases
- **Knowledge-base refresh** (OWASP updates, new published research, new attack patterns)
- **Deprecation cleanup** — v8.0.0 removes the `LLM_SECURITY_*` env vars and `riskScoreV1` constant deprecated in v7.3.0
- **Opportunistic small additions** that fit the existing deterministic architecture
The following are **explicitly out of scope — fork the repo and own them**
under your organization's name. The MIT license permits this and the project
is architected to be forkable. See [`CONTRIBUTING.md`](CONTRIBUTING.md) for
the fork-and-own guide.
| Out of scope | Why | Where to look instead |
|--------------|-----|------------------------|
| Web dashboard / fleet policy server | Multi-tenant UX + ongoing infra work | Snyk, Lakera Cloud |
| Runtime prompt firewall (real-time blocking proxy) | Inline gateway architecture | Lakera Guard, Protect AI Rebuff, [parry-guard](https://github.com/vaporif/parry) |
| IDE real-time LSP scanning | IDE integration + always-on perf budget | Snyk IDE, Semgrep IDE |
| Compliance PDF/DOCX evidence pack | Auditor-formatted reports as a product | Vanta, Drata, Secureframe |
| Enterprise ticketing / chat connectors (Jira, ServiceNow, Slack, Teams, PagerDuty) | Per-vendor SDK + auth + ongoing API drift | Splunk SOAR, Tines, custom integration |
| Multi-tenancy / centralized plugin runtime / fleet state | Hosted-product surface area | Build it on a fork |
| ML-based detectors requiring model hosting | Model-serving infra (training, eval, drift) | parry-guard (DeBERTa v3 + Llama Prompt Guard 2) |
| Marketplace UI / web catalog | Frontend product | This is not that kind of project |
| SSO / SCIM / RBAC | Platform-level enterprise concerns | Anthropic Admin Console + your IdP |
If you need any of the above and your organization has the headcount to
maintain it, **fork freely**. The maintainer encourages it. Issues and
support flow back to the fork, not here.
---
## Defense philosophy
Prompt injection is **structurally unsolvable** with current architectures (joint paper, 14 researchers, 2025: 95-100 % ASR against all 12 tested defenses by motivated red-teamers). v5.0+ does not claim to "prevent" injection. It implements defense-in-depth:
- **Broader detection** — MEDIUM advisories for obfuscation signals (leetspeak, homoglyphs, zero-width chars, multi-language); Unicode Tag and PUA-A/B steganography; bash expansion evasion T1-T9; rot13 hidden imperatives in comments
- **Increased attack cost** — Rule-of-Two trifecta detection (configurable block/warn/off, default warn), bash normalization before gate matching, MCP cumulative-drift baseline catching slow-burn rug-pulls
- **Longer monitoring windows** — 100-call long-horizon alongside 20-call sliding window; slow-burn trifecta detection (legs >50 calls apart); Jensen-Shannon behavioral drift; sub-agent delegation tracking
- **Architectural constraints** — opportunistic byte-fingerprint matching for output→input lineage (first 200 bytes, SHA-256/16-hex tag — not semantic capability tracking; trivially bypassed by mutation, but raises the cost of casual exfil)
- **Honest documentation** — known limitations are surfaced, not hidden
**System-card alignment (Opus 4.7):** §5.2.1 documents that multi-layer defenses outperform single-layer against adaptive attacks; this plugin's posture matches. §6.3.1.1 documents that Opus 4.7 follows agent instructions more literally — stacked imperatives are less useful than tool-level enforcement, and agent files have been updated accordingly. Full mapping in `docs/security-hardening-guide.md` §5.
**What v5.0+ cannot do:** prevent adaptive attacks from motivated human red-teamers, fix `CLAUDE.md` loading before hooks (platform limitation), detect novel NL indirection without ML, prevent long-horizon attacks without detectable patterns, provide formal worst-case guarantees.
---
## Compatibility
- **Claude Code:** v2.x+
- **Platform:** macOS, Linux, Windows (all hooks are Node.js `.mjs`)
- **Node.js:** any recent LTS for hook scripts and CLI
- **Overlap with `claude-code-essentials`:** safe to run both. This plugin extends with path guarding, MCP verification, and runtime trifecta detection. Duplicate blocking is harmless — hooks run sequentially
---
## Playground (v7.6.0)
A single-file SPA at `playground/llm-security-playground.html` provides
an interactive surface for onboarding, command discovery and report demos
**without requiring Claude Code installation**. Open the file directly in
a browser (Chrome/Firefox/Safari over `file://`) — no build step, no
network calls, no npm install. Theme-bootstrap with FOUC-prevention; state
persisted in IndexedDB primary + localStorage fallback.
**v7.6.0 Tier 3-referanse-case:** Playgroundet er nå en visuelt og
strukturelt fullført referanse for `shared/playground-design-system/`
Tier 3-supplementet. 8 nye DS-komponenter integrert i de 18 rapport-
rendererne: `tfa-flow` (lethal trifecta-kjede), `mat-ladder` (modenhets-
stige), `suppressed-group` (narrative-audit), `codepoint-reveal` (Unicode-
steganografi), `top-risks` (rangert top-funn), `recommendation-card[data-
severity]` (severity-tinted advisory), `risk-meter` (band-visualisering
0-100), `card--severity-{level}` (severity-color findings-cards). Pluss
`badge--scope-security`, `verdict-pill-lg` og `form-progress`+`fp-step`
fra wave 1.
**Layout:**
```
playground/
├── llm-security-playground.html ← single-file SPA (~10 700 lines)
├── vendor/
│ └── playground-design-system/ ← synket fra shared/, sjekksum-låst
├── test-fixtures/ ← markdown-fixtures (én per kommando)
├── screenshots/v7.5.0/ ← Playwright-genererte demobilder (12)
├── screenshots/v7.6.0/ ← v7.6.0 demobilder (12, manuelt generert)
└── A11Y-RAPPORT.md ← WCAG 2.1 AA verifisering + Tier 3 ARIA
```
**Hva playgroundet dekker:**
- **Onboarding (5 grupper):** organisasjon, scope, profil, plattform,
compliance. Verdier persisteres som `shared`-state og prefylles automatisk
i alle command-skjemaer.
- **Home:** prosjekt-grid, fleet-tracks for posture/scan/red-team. «Last
inn demo-data»-knappen aktiverer 3 prosjekter inkludert `dft-komplett-demo`
med alle 18 rapporter ferdig parsed.
- **Catalog:** alle 20 kommandoer gruppert i 5 kategorier. Søk filtrerer
cards, og «Åpne skjema»-knapp bygger ferdig pipeline-streng for klipp-og-
lim til terminalen.
- **Project surface:** 4 skjermer (Oversikt / Rapporter / Kontekst /
Eksport). Rapporter-tabben har category-tabs (discover / posture /
findings-ops / hardening / adversarial / mcp-ops) og lim-inn-import for
hver rapport-kommando.
**Parser/renderer-arkitektur:** Hver `produces_report=true`-kommando i
`CATALOG` har en parser (markdown → struktur) og en renderer (struktur
→ DS-komponenter). 18 archetypes støttes: `findings`, `findings-grade`,
`risk-score-meter`, `posture-cards`, `dashboard-fleet`, `red-team-results`,
`diff-report`, `kanban-buckets`, `matrix-risk`. Parser-kontrakten er
`{ ok: true, data: {...} } | { ok: false, errors: [...] }`. Test-fixtures
under `playground/test-fixtures/` er kontrakt-anker — én markdown-fil per
kommando som speiler `templates/unified-report.md`-formatet.
**Eksponerte testing/automasjons-globaler:** `__store`, `__navigate`,
`__loadDemoState`, `__scheduleRender`, `__PARSERS`, `__RENDERERS`,
`__CATALOG`, `__inferVerdict`, `__inferKeyStats`, `__renderPageShell`,
`__handlePasteImport`. Aktiverer Playwright-styrt navigasjon og
programmatisk parser/renderer-test mot fixture-katalogen.
**Begrensninger:** SPA er en lim-inn-overflate — den kjører ingen scannere
selv. Output må komme fra Claude Code (`/security scan ...`), CLI
(`node scanners/...`) eller stub-fixtures. Demo-state inneholder kun de
3 inline-prosjektene; nye prosjekter er per-bruker og lagres lokalt.
---
## Self-scan
Running `node scanners/scan-orchestrator.mjs .` on this plugin produces **0 findings (ALLOW)** with ~190 suppressions via `.llm-security-ignore`. Every suppression is explained — a security plugin that documents attack patterns, ships a malicious demo fixture, and tests against deliberately evil code will trigger its own scanners. The entropy scanner flags regex patterns in `knowledge/secrets-patterns.md`. The taint scanner flags `eval(user_input)` in test fixtures. The toxic flow analyzer flags the plugin's own commands that use Read+Bash. Remove the ignore file and re-run to see the unsuppressed picture.
The `examples/malicious-skill-demo/` directory contains a deliberately malicious "Project Health Dashboard" plugin and a [full security assessment](examples/malicious-skill-demo/security-assessment.md). The combined LLM + deterministic pipeline produced **85 findings** (24 critical, 24 high, 20 medium, 6 low, 11 info) and verdict **BLOCK 100/100** — both layers independently maxed the risk score. A human reviewing the plugin's `README.md` and `SKILL.md` would likely miss most of them; the Unicode Tag steganography is literally invisible.
```bash
node scanners/scan-orchestrator.mjs examples/malicious-skill-demo/evil-project-health/ # ~5s
/security scan examples/malicious-skill-demo/evil-project-health/ --deep # full pipeline
```
### Other runnable examples
The `examples/` directory contains additional self-contained
demonstrations — each with `README.md`, fixture, run script, and
`expected-findings.md`:
- **`prompt-injection-showcase/`** — 61 payloads across 19 categories
fed to `pre-prompt-inject-scan`, `post-mcp-verify`, and
`pre-bash-destructive`. Run: `node examples/prompt-injection-showcase/run-showcase.mjs`
- **`lethal-trifecta-walkthrough/`** — 5-step Rule-of-Two demonstration
(WebFetch → Read .env → Bash curl POST + suppression follow-ups)
showing `post-session-guard` advisory firing on leg 3. State-isolated
via run-script PID. Run: `node examples/lethal-trifecta-walkthrough/run-trifecta.mjs`
- **`mcp-rug-pull/`** — 8-stage MCP description drift, each step under
the 10% per-update threshold but cumulatively >25% from baseline.
Demonstrates the v7.3.0 cumulative-drift advisory (E14, OWASP MCP05).
Cache isolated via `LLM_SECURITY_MCP_CACHE_FILE`. Run:
`node examples/mcp-rug-pull/run-rug-pull.mjs`
- **`supply-chain-attack/`** — two-layer demonstration: PreToolUse
hook blocks compromised `event-stream@3.3.6` and advises on
scope-hopping `@evilcorp/lodash`; offline `dep-auditor` flags 5
typosquats + a `postinstall: curl ... | sh` vector in the fixture
`package.json`. Run:
`node examples/supply-chain-attack/run-supply-chain.mjs`
- **`poisoned-claude-md/`** — 6 memory-poisoning detectors fire on a
fixture `CLAUDE.md` + agent file (E15 surface). Demonstrates
injection, shell-command, suspicious-URL, credential-path,
permission-expansion, and base64-encoded-payload detection. Run:
`node examples/poisoned-claude-md/run-memory-poisoning.mjs`
- **`bash-evasion-gallery/`** — one disguised variant per T-tag
(T1-T9) fed through `pre-bash-destructive`, verified BLOCK after
`bash-normalize` strips the evasion. T8 has its own BLOCK_RULE.
Run:
`node examples/bash-evasion-gallery/run-evasion-gallery.mjs`
- **`toxic-agent-demo/`** — single-component lethal trifecta detected
by the `toxic-flow-analyzer` (TFA). A fixture agent with
`tools: [Bash, Read, WebFetch]` covers all three trifecta legs
(untrusted input + sensitive data access + exfil sink), and the
fixture deliberately ships no `hooks/hooks.json` so TFA emits a
CRITICAL `Lethal trifecta:` finding without mitigation downgrade.
Uses `plugin.fixture.json` as the plugin marker so the example
doesn't trip `pre-write-pathguard` on `.claude-plugin/`. Maps to
ASI01 / ASI02 / ASI05 / LLM01 / LLM02 / LLM06. Run:
`node examples/toxic-agent-demo/run-toxic-flow.mjs`
- **`pre-compact-poisoning/`** — `pre-compact-scan` PreCompact hook
detecting both an injection pattern and a credential-shaped string
in a synthetic transcript across all three modes (off / warn /
block). The transcript is generated at runtime in a per-invocation
tempdir; the AWS-shaped key uses the same `'AK' + 'IA' + ...`
fragmentation idiom as `tests/e2e/attack-chain.test.mjs`, so the
source contains no literal credentials. Includes a benign-transcript
control case in block mode to prove the gate is not a brick wall.
Maps to LLM01 / LLM02 / ASI01 / AT-1 / AT-3. Run:
`node examples/pre-compact-poisoning/run-pre-compact-poisoning.mjs`
---
## Recent versions
| Version | Date | Highlights |
|---------|------|------------|
| **7.6.1** | 2026-05-06 | **Playground v7.6.0 visuell-patch.** Seks bugs fanget under maintainer-verifisering i nettleser. Alle skyldtes mismatch mellom DS-klasser og rendrer-bruk (eller manglende DS-implementasjoner playground antok eksisterte). (1) `renderFindingsBlock` brukte `.findings` outer som er DS' 2-kolonners list+detail-grid → erstattet med `<section class="report-meta">` + korrekt `findings__list > findings__group`-mønster. (2) `.report-table` manglet helt i DS men brukes i 7+ rendrere → lokal CSS-implementasjon i playground-HTML. (3) `renderPreDeploy` traffic-lights brukte `.sm-card__grade` (28×28 px for én A-F-bokstav) for "PASS"/"PASS-WITH-NOTES"/"FAIL" → erstattet med bredde-tilpasset status-pill. (4) Threat-model matrix-bobler ikke klikkbare → `<button>` med `data-threat-id` + click-handler som scroller til Trusler-tabellen. (5) Radar-labels overlappet ved 6+ akser → SVG 280→380, R 105→125, dynamisk `text-anchor` (start/end/middle) basert på horisontal-posisjon. (6) `recommendation-card__body` overflow på lange tekster → `overflow-wrap: anywhere`. 4/4 fix-spesifikke smoke-tester + 18/18 renderer-regresjon passerer. Ingen scanner- eller hook-atferdsendringer — purely additive surface. |
| **7.6.0** | 2026-05-06 | **Playground Tier 3-referanse-case.** Playground (`playground/llm-security-playground.html`) hevet til visuelt og strukturelt fullført referanse for `shared/playground-design-system/` Tier 3-supplementet. 8 nye DS-komponenter integrert i de 18 rapport-rendererne: `tfa-flow` + `tfa-leg` + `tfa-arrow` (lethal trifecta-kjede med `<button>`-elementer + ARIA), `mat-ladder` + `mat-step` (5-trinns modenhets-stige med terskler 0/25/50/75/95% PASS), `suppressed-group` (narrative-audit fra `summary.narrative_audit.suppressed_findings`), `codepoint-reveal` + `cp-tag`/`cp-zw`/`cp-bidi` (Unicode-steganografi side-ved-side), `top-risks` + `top-risk[data-severity]` (rangert top-funn-listing, semantisk `<ol>`), `recommendation-card[data-severity]` (severity-tinted advisory på `clean`/`harden`/`audit`/`posture`/`pre-deploy`/`plugin-audit`), `risk-meter` (band-visualisering 0-100 på 5 archetypes), `card--severity-{level}` (severity-color modifier på findings-cards). Wave 1: `badge--scope-security` (identitets-chip), `verdict-pill-lg` (DS Tier 3-pill), `form-progress` + `fp-step` (onboarding-wizard). Slettet ~30 duplikat-CSS-deklarasjoner (DS vinner cascade). 5 nye DS-helpers + `mapSeverityToCardLevel` + `parseNarrativeAudit`. Filendring 10209 → 10677 linjer. Levert over 5 sesjoner, atomic commits. A11Y-rapport oppdatert. Ingen scanner- eller hook-behavior-changes — purely additive surface. |
| **7.5.0** | 2026-05-05 | **Playground.** Single-file SPA at `playground/llm-security-playground.html` (~10 200 lines) for onboarding, demoer og workshop-bruk uten Claude Code-installasjon. Parsere + renderere for alle 18 `produces_report=true`-kommandoer (Fase 2: 10 høy-prio + Fase 3: 8 gjenstående: mcp-inspect, supply-check, pre-deploy, diff, watch, registry, clean, threat-model). 18 markdown test-fixtures under `playground/test-fixtures/` som kontrakt-anker. Komplett demo-prosjekt `dft-komplett-demo` har alle 18 rapporter ferdig parsed inline. Vendor-synket design-system under `playground/vendor/` (sjekksum-låst). 9 Playwright-genererte screenshots i `playground/screenshots/v7.5.0/`. 11 nye `window`-globaler for testing/automasjon. 2 nye `KEY_STATS_CONFIG`-archetypes (`kanban-buckets`, `matrix-risk`). Bug-fix: `normalizeVerdictText` regex-rekkefølge oppdatert så GO-WITH-CONDITIONS / CONDITIONAL / BETINGET ikke lenger kollapser til ALLOW. Ingen scanner- eller hook-behavior-changes — purely additive surface. |
| **7.4.0** | 2026-05-05 | **Examples + e2e suite.** Seven runnable demonstration walkthroughs under `examples/` (`prompt-injection-showcase`, `lethal-trifecta-walkthrough`, `mcp-rug-pull`, `supply-chain-attack`, `poisoned-claude-md`, `bash-evasion-gallery`, `toxic-agent-demo`, `pre-compact-poisoning`) — each with `README.md`, runtime-isolated fixture, single-command run-script, and `expected-findings.md` testable contract. Three new `tests/e2e/` suites (attack-chain 17 tests + multi-session 9 tests + scan-pipeline 19 tests = +45 tests, total 1822) prove the framework works as a coordinated system, not just isolated units. No scanner or hook behavior changes — purely additive surface. Scanner `VERSION` constants synced across `dashboard-aggregator.mjs`, `posture-scanner.mjs`, `ide-extension-scanner.mjs`. |
| **7.3.1** | 2026-05-01 | **Stabilization patch.** Project repositioned as solo, stabilization-only, with explicit "fork & own" stance for enterprise features. New public docs: `CONTRIBUTING.md` (fork-and-own model), README "Project scope" section (out-of-scope table with commercial alternatives), updated `SECURITY.md` (v7.3.x supported, v7.0v7.2 best-effort, < v7.0 EOL). Coherence: `package.json` files whitelist + `bugs` URL + repo URL fix; scanner `VERSION` constants synced across `dashboard-aggregator.mjs`, `posture-scanner.mjs`, `ide-extension-scanner.mjs`. Test ceiling raised on flaky pre-compact-scan timing test (500 ms → 1000 ms; design target unchanged). No behavior changes. |
| **7.3.0** | 2026-05-01 | **Batch C release.** Wave A (T7-T9 bash normalization + rot13 comment-block decoder), Wave B (`.gitattributes` post-clone advisory + npm scope-hop typosquat + GitHub/Forgejo workflow-scanner with 23-field blacklist + re-interpolation tracking + auth-bypass detection), Wave C (MCP cumulative-drift baseline + `/security mcp-baseline-reset`), Wave D (riskScoreV1 `@deprecated`; sandbox-architecture rationale docs; env-var deprecation runway to v8.0.0; CLAUDE.md hooks count + consistency test). 1665+ → 1777 tests. Wave E (additional attack-simulator scenarios) deferred indefinitely |
| **7.2.0** | 2026-04-29 | **Batch B release.** Critical-review B-tier scanner defects + v7.2.0 evasion-arsenal (PUA-A/B Unicode coverage, NFKC homoglyph fold, escalation-after-input window, markdown link-title + SVG `<desc>`/`<foreignObject>` + HTML comment extractors). Two-stage entropy context classification. v1→v2 risk-formula constants unified across docs. 8 new red-team scenarios (64 → 72). 1522 → 1665 tests |
| **7.1.0** | 2026-04-29 | **Critical-review patch.** Pathguard regex hole closed (`.env.production.local.backup`-class). Distributed-trifecta block-mode AND-gate removed. CaMeL claim toned down to honest "byte-fingerprint matching". Documentation honesty-sweep across 7 overclaim sites. 1487 → 1511 tests |
Full history in [`CHANGELOG.md`](CHANGELOG.md).
---
## License & attribution
MIT. See [`LICENSE`](LICENSE).
Built on published research from OWASP, ToxicSkills (Xi'an Jiaotong, 2025), ClawHavoc (Repello AI, 2025), MCPTox, Pillar Security, Invariant Labs, GHSL Security Lab, Operant AI, and Google DeepMind's AI Agent Traps taxonomy. Threat patterns and case studies in `knowledge/` are cited inline.
---
## Feedback & contributing
- **Bug reports + feature requests:** open an issue on Forgejo
- **Pull requests:** not accepted on this repo (solo project, dialog-driven
development with Claude Code). For larger changes, see
[`CONTRIBUTING.md`](CONTRIBUTING.md) and the **fork-and-own** model
- **Security disclosures:** see [`SECURITY.md`](SECURITY.md) — please email,
do not open a public issue
- **Project scope:** see "Project scope" section above for what is and
isn't on the roadmap, and what to fork for instead