CLAUDE.md OBLIGATORISK-regel: enhver feature-endring som pusher til
Forgejo MÅ oppdatere alle tre doc-nivåer i SAMME commit eller umiddelbart
etter. v7.6.1-fix-commit (
|
||
|---|---|---|
| .. | ||
| .claude-plugin | ||
| agents | ||
| bin | ||
| ci | ||
| commands | ||
| docs | ||
| examples | ||
| hooks | ||
| knowledge | ||
| playground | ||
| reports | ||
| scanners | ||
| scripts | ||
| templates | ||
| test-fixtures/trifecta-plugin | ||
| tests | ||
| --json | ||
| .editorconfig | ||
| .gitignore | ||
| .llm-security-ignore | ||
| .npmignore | ||
| .orphaned_at | ||
| CHANGELOG.md | ||
| CLAUDE.md | ||
| CONTRIBUTING.md | ||
| GOVERNANCE.md | ||
| LICENSE | ||
| package.json | ||
| README.md | ||
| SECURITY.md | ||
| V3-ANNOUNCEMENT.md | ||
| V3-UPGRADE.md | ||
LLM Security Plugin for Claude Code
Automated defense and advisory analysis for the agentic AI attack surface.
Solo-maintained, fork-and-own. This plugin is a starting point, not a vendor product. Issues are welcome as signals; pull requests are not accepted. See GOVERNANCE.md for the full model and what upstream provides.
AI-generated: all code produced by Claude Code through dialog-driven development. Full disclosure →
A Claude Code plugin that provides security scanning, auditing, and threat modeling for agentic AI projects. Built on OWASP LLM Top 10 (2025), OWASP Agentic AI Top 10 (ASI01-ASI10), OWASP Skills Top 10 (AST01-AST10), MCP Top 10, and the AI Agent Traps taxonomy (Google DeepMind, 2025), grounded in published research from ToxicSkills, ClawHavoc, MCPTox, Pillar Security, Invariant Labs, GHSL Security Lab, and Operant AI.
Why this exists
Claude Code's extensibility model — skills, MCP servers, plugins, hooks, IDE extensions — creates an attack surface that mirrors the npm/PyPI supply chain problem with one critical difference: extensions run with LLM agency. A malicious plugin doesn't just execute code in a sandbox. It can instruct the agent to read your SSH keys, exfiltrate environment variables, install persistence mechanisms, and modify its own configuration — all while appearing to be a helpful "Project Health Dashboard."
This is not theoretical. ToxicSkills (Xi'an Jiaotong, 2025), ClawHavoc (Repello AI, 2025), GHSL-class workflow injections, and the November 2024 npm/PyPI typosquat campaigns documented real attack patterns. OWASP, NIST, and the EU AI Act now formalize the controls needed.
This plugin layers three independent kinds of defense — runtime hooks that block, deterministic scanners that compute, and LLM-driven advisory commands that judge — so failures in any one layer are caught by the others.
Important
Scan repos remotely before cloning. A poisoned
CLAUDE.mdinjects instructions into the model context the moment you open a cloned repo — before any hook can intervene./security scan https://repo-url --deepanalyses everything safely via pre-extraction, without loading anything into your session. This is the primary defense againstCLAUDE.mdpoisoning.
Quick Start
Prerequisites
- Claude Code v2.x+
- Node.js (any recent LTS — required for hook scripts)
Install
claude plugin marketplace add https://git.fromaitochitta.com/open/ktg-plugin-marketplace.git
Or enable directly in ~/.claude/settings.json:
{
"enabledPlugins": {
"llm-security@ktg-plugin-marketplace": true
}
}
Hooks activate immediately on install. Secret detection, path guarding, prompt-injection scanning, destructive-command blocking, supply-chain guardrails, and runtime trifecta detection start working without any commands.
First scan
> /security posture
┌──────────────────────────────────────────────┐
│ Security Posture: 8/16 [B] 77% │
├──────────────────────────────────────────────┤
│ ✅ Deny-First Config │
│ ✅ Secrets Protection │
│ ⚠️ MCP Server Trust │
│ ✅ Destructive Command Blocking │
│ ⚠️ Sandbox Config │
│ ✅ Prompt Injection Hardening │
│ ⚠️ Rule of Two │
│ ✅ EU AI Act │
│ ⚠️ NIST AI RMF │
│ — ISO 42001 │
├──────────────────────────────────────────────┤
│ 6 findings (1 high, 3 medium, 2 low) │
└──────────────────────────────────────────────┘
Tip
Start with
/security posturefor a 30-second baseline, then/security auditfor the full picture, or/security scan <target>for supply-chain gating.
Important
Opus extended-context users: subagents inherit the parent session's context limit but do not support extended context, causing API errors. Run
/model Opusbefore using security commands to reset to the standard 200 K context window subagents handle correctly.
What's inside
flowchart TB
subgraph Runtime["Runtime defense — 9 hooks"]
direction LR
H1["UserPromptSubmit<br/>injection scan"]
H2["PreToolUse<br/>secrets · paths · bash · supply chain"]
H3["PostToolUse<br/>output verify · session guard"]
H4["PreCompact<br/>transcript scan"]
end
subgraph Scanning["Deterministic analysis — 23 scanners"]
direction LR
S1["UNI · ENT · PRM · DEP<br/>TNT · GIT · NET · MEM · SCR · TFA"]
S2["WFL workflow scanner"]
S3["MCI · IDE · PST · BOM<br/>+ standalone CLIs"]
end
subgraph Advisory["Advisory analysis — 6 agents · 20 commands"]
direction LR
A1["Skill scanner<br/>7 threat categories"]
A2["MCP scanner<br/>5-phase analysis"]
A3["Posture · audit<br/>16 categories, A-F"]
A4["Threat model<br/>STRIDE × MAESTRO"]
end
subgraph Knowledge["Knowledge base — 22 files"]
direction LR
K1["5 OWASP frameworks<br/>+ DeepMind Agent Traps"]
K2["Threat patterns<br/>skills · MCP · workflows · IDE · secrets"]
K3["Compliance · research<br/>registry · packages"]
end
Runtime -->|"blocks/warns in real time"| User["Claude Code session"]
User -->|"/security scan"| Scanning
User -->|"/security audit"| Advisory
Advisory -.->|"grounded by"| Knowledge
Scanning -->|"enriches"| Advisory
Each layer is independent. A failure in one (e.g. an injection that slips past the prompt scanner) gets a second chance from the others (e.g. the trifecta session guard catching the attempted exfiltration step downstream).
Commands
20 slash commands grouped by purpose. All accept path or GitHub URL targets unless noted.
Scanning & assessment
| Command | Description |
|---|---|
/security |
Router with quick-start guide |
/security scan [path|url] |
Supply-chain gate — ALLOW/WARNING/BLOCK verdict on skills, MCP servers, directories, or remote repos |
/security scan [path|url] --deep |
Adds 10 deterministic scanners on top of the LLM agents |
/security deep-scan [path] |
Run only the 10 orchestrated deterministic scanners. Supports --fail-on <severity>, --compact, --format sarif, --output-file <path> |
/security audit |
Full project audit, A-F grade, prioritized action plan |
/security plugin-audit [path|url] |
Plugin trust assessment with Install/Review/Do Not Install verdict |
/security mcp-audit [--live] |
Audit installed MCP server configs (--live adds runtime inspection) |
/security mcp-inspect |
Connect to running MCP stdio servers and scan live tool descriptions via JSON-RPC 2.0 |
/security mcp-baseline-reset |
Clear cumulative-drift baseline cache after a legitimate MCP server upgrade (E14, v7.3.0) |
/security ide-scan [target|url] |
Scan installed VS Code (+ Cursor, Windsurf, VSCodium, code-server) and JetBrains extensions, OR fetch a remote VSIX from VS Code Marketplace, OpenVSX, direct .vsix, or JetBrains Marketplace. 7 VS Code + 7 JetBrains-specific checks plus UNI/ENT/NET/TNT/MEM/SCR per extension |
/security posture |
30-second scorecard across 16 categories incl. EU AI Act, NIST AI RMF, ISO 42001 |
/security diff [path] |
Compare scan against stored baseline — new/resolved/unchanged/moved findings |
/security watch [path] [--interval 6h] |
Continuous monitoring via /loop |
/security registry [scan|search] |
Skill signature registry — view stats, scan-and-register, search known fingerprints |
/security supply-check [path] |
Re-audit installed dependencies from lockfiles against blocklists, OSV.dev, and typosquats |
/security dashboard |
Cross-project security dashboard — machine-grade aggregation across all projects under ~/ |
Remediation
| Command | Description |
|---|---|
/security clean [path] |
Three-tier remediation pipeline — auto-fix safe issues, confirm semi-auto with the user, report manual findings |
/security clean [path] --dry-run |
Preview without modifying files |
/security harden [path] |
Generate Grade A reference config (settings.json, CLAUDE.md, .gitignore) |
/security harden [path] --apply |
Apply with automatic backup |
Threat modeling & planning
| Command | Description |
|---|---|
/security threat-model |
Interactive STRIDE × MAESTRO 7-layer session, 15-30 min |
/security red-team [--category] [--adaptive] |
Attack simulation — 72 scenarios across 12 categories, 100 % block rate. --adaptive applies 5 mutation rounds per blocked scenario for evasion testing |
/security pre-deploy |
Pre-deployment checklist — 10 automated + 3 manual checks |
Remote scanning safely
/security scan and /security plugin-audit accept GitHub and Forgejo URLs directly. The plugin clones to a temp directory inside an OS sandbox, scans, and cleans up.
/security scan https://github.com/org/repo --deep
/security scan https://github.com/org/repo --branch dev --deep
Defense-in-depth on the clone path (v5.1+):
| Layer | Mechanism | Mitigates |
|---|---|---|
| Git config hardening | core.hooksPath=/dev/null, core.symlinks=false, all filter.lfs.* neutralized, protocol.file.allow=never, transfer.fsckObjects=true, plus GIT_CONFIG_NOSYSTEM=1 and friends |
Git hooks at clone, symlink traversal, filter/smudge driver code execution via .gitattributes (CVE-2024-32002 class), local-file protocol traversal, malformed objects |
| OS filesystem sandbox | macOS sandbox-exec (Seatbelt) or Linux bwrap (bubblewrap) restricts writes to the per-clone temp dir |
Even if a filter driver bypasses git config hardening, the kernel refuses writes outside the sandbox |
.gitattributes post-clone advisory (E12, v7.3.0) |
scanGitAttributes() scans for filter= / diff= / merge= driver directives and emits MEDIUM advisories |
Surfaces driver-based supply-chain surface that survives even a sandboxed clone |
| Pre-LLM injection strip | content-extractor.mjs produces a structured JSON evidence package; [INJECTION-PATTERN-STRIPPED] markers are confirmed findings |
Agents never see raw poisoned files from untrusted repos |
| Post-clone size cap | 100 MB max | Resource-exhaustion attacks |
Windows has no kernel-level sandbox equivalent. Run Claude Code inside WSL2 or Docker Desktop for full coverage; the git config hardening alone is sufficient against all known .gitattributes attack vectors.
Automated hooks (9)
Hooks run on every operation — no commands needed. They activate the moment the plugin is installed.
| Hook | Event | What it does |
|---|---|---|
| Prompt injection scan | UserPromptSubmit | Blocks direct injection (override instructions, spoofed system headers, identity redefinition) and warns on subtle signals (leetspeak, homoglyphs, zero-width chars, multi-language). Decodes obfuscated payloads (Unicode Tag, hex, URL, base64, rot13) before matching. Mode: LLM_SECURITY_INJECTION_MODE=block|warn|off (default block) |
| Secret detection | Edit, Write | Blocks AWS keys, Azure tokens, GitHub PATs, npm tokens, PEM keys, database URLs, Bearer tokens, and 30+ other secret patterns |
| Path guarding | Write | Blocks writes to .env* (multi-segment-suffix-safe), .ssh/, .aws/, .gnupg/, credentials files, hook scripts, /etc/, settings.json |
| Destructive commands | Bash | Blocks rm -rf /, chmod 777, pipe-to-shell, fork bombs, eval-with-substitution, T8 base64-pipe-shell loaders. Bash-normalize T1-T9 collapses obfuscation (empty quotes, ${IFS}, ANSI-C hex, process substitution, eval-via-variable) before pattern matching |
| Supply-chain guardrail | Bash | Blocks known-compromised npm/pip packages, Levenshtein typosquats, age-gated installs (<72 h), OSV.dev CVE checks. Covers npm, pip, brew, docker, go, cargo, gem. v7.3.0: npm scope-hop typosquat advisory (E13) — @evil/lodash-class catches scope-jumping when the unscoped name matches a popular package |
| Output verification | All tools (post) | Advisory: scans ALL tool output for indirect injection (LLM01) and HITL traps (DeepMind kat. 6). Bash-specific: leaked secrets, unexpected URLs, oversized MCP responses. v7.3.0: per-update MCP description drift AND cumulative drift vs sticky baseline (E14) — slow-burn rug-pulls that stay under per-update thresholds but cumulatively diverge ≥25% emit mcp-cumulative-drift MEDIUM |
| Session guard | All tools (post) | Advisory: monitors tool-call sequences for the lethal trifecta (untrusted input + sensitive read + exfiltration sink). 20-call sliding window + 100-call long-horizon window. Mode: LLM_SECURITY_TRIFECTA_MODE=block|warn|off. Sub-agent delegation tracking via Task/Agent tools surfaces escalation-after-input as a separate advisory |
| Pre-compact scan | PreCompact | Scans transcript tail (max 512 KB, <500 ms) for injection patterns + credentials before context compaction. Prevents poisoned content from surviving in compact form. Mode: LLM_SECURITY_PRECOMPACT_MODE=block|warn|off (default warn) |
| Update check | UserPromptSubmit | Checks for newer plugin versions max 1× / 24 h, cached. Disable: LLM_SECURITY_UPDATE_CHECK=off |
All hooks are Node.js .mjs for cross-platform compatibility (macOS, Linux, Windows).
Important
Five hooks are blocking (prompt injection, secrets, path guarding, destructive commands, supply chain). Four are advisory (output verification, session guard, pre-compact, update check). Blocking modes can be downgraded via env-vars or
policy.jsonfor security research or staged rollouts.
Deterministic scanners
23 scanners. Zero external dependencies. All output JSON.
Orchestrated (10) — run via node scanners/scan-orchestrator.mjs <target> or /security deep-scan
| Scanner | Prefix | Detects | OWASP |
|---|---|---|---|
unicode-scanner.mjs |
UNI | Zero-width chars, Unicode Tag steganography (incl. PUA-A/B), BIDI overrides, Cyrillic/Greek homoglyphs (NFKC fold) | LLM01 |
entropy-scanner.mjs |
ENT | High-entropy strings, base64/hex blobs, encoded payloads via Shannon entropy. Two-stage context classification suppresses GLSL/CSS/inline-SVG/markdown-CDN false positives | LLM01, LLM03 |
permission-mapper.mjs |
PRM | Purpose-vs-tools mismatch, ghost hooks, haiku on sensitive agents, overprivileged components | LLM06 |
dep-auditor.mjs |
DEP | CVEs (npm/pip audit + OSV.dev), Levenshtein + token-overlap typosquats, malicious install scripts, unpinned versions | LLM03 |
taint-tracer.mjs |
TNT | Source-to-sink data flow (process.env / req.body → eval / exec / fetch / writeFile), 3-pass analysis, destructuring + spread support | LLM01, LLM02 |
git-forensics.mjs |
GIT | Force pushes, description drift, hook modifications, new outbound URLs, author changes | LLM03 |
network-mapper.mjs |
NET | Undisclosed URLs, suspicious domains (ngrok, webhook.site), IP-based URLs, DNS analysis | LLM02, LLM03 |
memory-poisoning-scanner.mjs |
MEM | Injection patterns, shell commands, credential paths, permission expansion, suspicious URLs in CLAUDE.md / memory / .claude/rules / .claude/agents/*.md |
LLM01, ASI02 |
supply-chain-recheck.mjs |
SCR | Re-audit installed deps from lockfiles against blocklists, OSV.dev batch API, typosquats | LLM03 |
toxic-flow-analyzer.mjs |
TFA | Lethal trifecta correlation across prior scanner output (runs last) | ASI01, ASI02, ASI05 |
Workflow & live (3, run independently)
| Scanner | Prefix | Detects | OWASP |
|---|---|---|---|
workflow-scanner.mjs (E11, v7.3.0) |
WFL | GitHub Actions and Forgejo Actions injection — dangerous ${{ <field> }} interpolations inside run: blocks across a 23-field GHSL+GlueStack-class blacklist; sink-restricted (only run: is a shell sink); severity matrix grades by trigger privilege; tracks env-block re-interpolation (Appsmith GHSL-2024-277 stealth pattern); flags actor == bot[bot] auth-bypass (Synacktiv 2023 Dependabot class) |
LLM02, LLM06 |
mcp-live-inspect.mjs |
MCI | Connects to running MCP servers via JSON-RPC 2.0 and scans live tool descriptions for injection, shadowing, drift | LLM01, LLM02 |
ide-extension-scanner.mjs |
IDE | VS Code (+ forks) and JetBrains plugin prescan: blocklist, theme-with-code, sideload, broad activation, typosquat, extension-pack expansion, dangerous uninstall hooks; Premain-Class instrumentation; native binaries; nested-jar inspection |
LLM01-03, LLM06, ASI02, ASI04 |
Standalone utilities (10)
| Scanner | Purpose |
|---|---|
posture-scanner.mjs |
Deterministic posture assessment, 16 categories, <50 ms |
attack-simulator.mjs |
Red-team harness: 72 scenarios, 12 categories, fixed + adaptive modes, benchmark output |
ai-bom-generator.mjs |
CycloneDX 1.6 AI Bill of Materials |
dashboard-aggregator.mjs |
Cross-project security dashboard with weakest-link machine grade |
reference-config-generator.mjs |
Grade A config generation based on posture gaps |
mcp-baseline-reset.mjs |
Clear cumulative-drift baseline cache (--list / --target <tool> / clear-all) |
auto-cleaner.mjs |
Remediation engine — 16 fix operations, atomic writes, post-fix validation |
content-extractor.mjs |
Pre-extracts evidence from untrusted repos and strips injection patterns before LLM exposure |
watch-cron.mjs |
Cron wrapper for background scanning |
scan-orchestrator.mjs |
Entry point that runs all 10 orchestrated scanners |
Why deterministic? LLMs are powerful at semantic analysis — intent, social engineering, context. They cannot reliably calculate Shannon entropy, measure Levenshtein distance between package names, trace taint flow across function boundaries, or detect individual Unicode codepoints. These scanners fill that gap.
MCP cumulative drift baseline (E14, v7.3.0)
scanners/lib/mcp-description-cache.mjs anchors a sticky baseline description per MCP tool plus a rolling 10-event history. Cumulative drift = levenshtein(current, baseline) / max(|current|, |baseline|). When the ratio crosses mcp.cumulative_drift_threshold (default 0.25), post-mcp-verify.mjs emits a MEDIUM mcp-cumulative-drift advisory — independent of the existing per-update >10% drift signal. Slow-burn rug-pulls that keep each update under the per-update threshold but cumulatively diverge from baseline are now caught.
The baseline survives the 7-day TTL purge so detection persists across the full window. After a legitimate MCP server upgrade, run /security mcp-baseline-reset (or node scanners/mcp-baseline-reset.mjs --target <tool>) to clear the stale baseline. The next call seeds a fresh baseline; description, firstSeen, lastSeen, and history are preserved across reset for audit. LLM_SECURITY_MCP_CACHE_FILE overrides the cache path for testing.
Agents (6)
Specialized analysts spawned by commands. Read-only by default; clean and harden grant Edit/Write under explicit user confirmation.
| Agent | Role | Spawned by |
|---|---|---|
skill-scanner-agent |
7 threat categories (injection, exfiltration, privilege escalation, scope creep, hidden instructions, toolchain manipulation, persistence) for skills/commands/agents | scan, audit, plugin-audit |
mcp-scanner-agent |
5-phase MCP analysis (tool descriptions, source code, dependencies, configuration, rug pull detection) | scan, mcp-audit |
posture-assessor-agent |
Full audit narrative with PASS/PARTIAL/FAIL scoring and A-F grading (the deterministic posture-scanner.mjs handles quick mode) |
audit, posture |
threat-modeler-agent |
Interactive STRIDE × MAESTRO interview, 5-phase workflow | threat-model |
deep-scan-synthesizer-agent |
Interprets deterministic scanner JSON into a human-readable report with executive summary + prioritized recommendations | deep-scan, scan --deep |
cleaner-agent |
Generates semi-auto remediation proposals for findings requiring human judgment (returns JSON proposals; clean.md performs the edits after user approval) |
clean |
All agents run on Opus and reference the knowledge base for grounding. Agents are spawned sequentially to avoid burst rate limits.
Knowledge base (22 files)
All analysis is grounded in published threat intelligence. The knowledge files are read by agents at scan time, not loaded preemptively.
| Category | Files |
|---|---|
| OWASP frameworks | owasp-llm-top10.md, owasp-agentic-top10.md, owasp-skills-top10.md, mcp-threat-patterns.md (9 categories), mitigation-matrix.md |
| Threat patterns | skill-threat-patterns.md (7 categories from ToxicSkills/ClawHavoc), secrets-patterns.md (30+ regex), ide-extension-threat-patterns.md (10 categories with 2024-2026 case studies), workflow-injection-patterns.md (23-field blacklist + Forgejo divergences) |
| Research | prompt-injection-research-2025-2026.md (7 papers), deepmind-agent-traps.md (6 categories, 43 techniques), attack-scenarios.json (72 red-team scenarios), attack-mutations.json (synonym tables for adaptive testing) |
| Compliance | compliance-mapping.md (EU AI Act, NIST AI RMF, ISO 42001, MITRE ATLAS), norwegian-context.md (Datatilsynet, NSM, Digitaliseringsdirektoratet) |
| Reference data | top-packages.json (top 200 npm + 100 PyPI), top-vscode-extensions.json, top-jetbrains-plugins.json, typosquat-allowlist.json, marketplace-api-notes.md, jetbrains-marketplace-api-notes.md, skill-registry.json |
Coverage at a glance
OWASP LLM Top 10 (2025) — control-count coverage from knowledge/mitigation-matrix.md:
| Category | Hooks | Scanners | Commands | Coverage |
|---|---|---|---|---|
| LLM01 Prompt Injection | ✅ | UNI + ENT + TNT | scan, audit | 95 % |
| LLM02 Sensitive Info Disclosure | ✅ | TNT + NET | audit | 83 % |
| LLM03 Supply Chain | ◐ | ENT + DEP + GIT + NET | scan, plugin-audit, mcp-audit, supply-check | 60 % |
| LLM04 Data Poisoning | — | — | threat-model | 40 % |
| LLM05 Improper Output Handling | ✅ | — | audit | 83 % |
| LLM06 Excessive Agency | ✅ | PRM + WFL | posture | 100 % |
| LLM07 System Prompt Leakage | — | — | audit | 60 % |
| LLM08 Vector/Embedding Weaknesses | — | — | threat-model | 40 % |
| LLM09 Misinformation | — | — | advisory | 50 % |
| LLM10 Unbounded Consumption | — | — | pre-deploy | 83 % |
Average ~69 %. Strongest at prompt injection (95 % with input + output scanning + obfuscation decoders) and agency controls (100 %). Weakest at LLM04/08, which are better addressed at the model-provider or platform level. /security threat-model and /security pre-deploy surface the gaps advisorily.
Agentic and skill frameworks — full ASI01-ASI10 and AST01-AST10 mapping in knowledge/owasp-agentic-top10.md and knowledge/owasp-skills-top10.md.
Compliance & governance
| Capability | Detail |
|---|---|
| Compliance mapping | EU AI Act (Art. 9, 15, 17), NIST AI RMF (Map / Measure / Manage / Govern), ISO 42001 (Annex A), MITRE ATLAS techniques. Posture categories 14-16 assess readiness |
| Norwegian context | Datatilsynet DPIA-for-AI guidance, NSM basic security principles, Digitaliseringsdirektoratet — relevant for Norwegian public-sector deployments |
| SARIF 2.1.0 output | --format sarif on scan / deep-scan produces OASIS SARIF for CI/CD ingestion (GitHub Advanced Security, Azure DevOps, SonarQube) |
| Structured audit trail | JSONL events with ISO 8601 timestamps and OWASP category tags (LLM_SECURITY_AUDIT_* env-vars or audit.log_path policy key) — SIEM-ready |
| AI-BOM | CycloneDX 1.6 BOM for AI components — models, MCP servers, plugins, knowledge files, hooks (llm-security audit-bom <target>) |
| Policy-as-code | .llm-security/policy.json ships hook configuration with the team. v7.3.0 (D3) adds a one-time-per-process stderr deprecation line when both an env-var AND its policy.json equivalent are explicitly set; env still wins through the v7.x runway, env reads removed in v8.0.0. Suppress noise with LLM_SECURITY_DEPRECATION_QUIET=1 |
| Standalone CLI | node bin/llm-security.mjs scan <target> — runs scanners without Claude Code. Subcommands: scan, deep-scan, posture, audit-bom, benchmark. Schrems II compatible in default offline mode (optional OSV.dev enrichment is the only network call and is opt-in) |
| CI/CD integration | --fail-on <severity> for threshold-based exit codes, --compact for one-liner output. Templates for GitHub Actions, Azure DevOps, GitLab CI in ci/. Guide: docs/ci-cd-guide.md |
Benchmarks
/security red-team (also llm-security benchmark) tests hook defenses with 72 crafted scenarios across 12 categories. Adaptive mode applies 5 mutation rounds per blocked scenario (homoglyph substitution, encoding wrapping, zero-width injection, case alternation, synonym replacement). Current block rate: 100 % fixed mode.
Workflow examples
1 — pre-installation gate
/security scan path/to/plugin # ALLOW/WARNING/BLOCK
/security plugin-audit path/to/plugin # Install/Review/Do Not Install
# Remote — scans without installing
/security scan https://github.com/org/repo --deep
/security plugin-audit https://github.com/org/repo
2 — monthly review
/security posture # 30-second baseline
/security audit # full A-F grade with action items
# → fix critical/high
/security posture # verify improvement
3 — track over time
/security diff path/to/project # first run creates baseline
/security watch path/to/project # continuous, runs diff every 6h via /loop
4 — deep threat analysis
/security threat-model # 15-30 min STRIDE × MAESTRO interview
/security audit # verify current controls vs identified threats
/security pre-deploy # 10 automated + 3 manual checks
5 — remediation
/security clean path/to/project --dry-run # preview
/security clean path/to/project # auto + semi-auto + manual report
/security harden path/to/project --apply # Grade A reference config
What this plugin does NOT cover
| Area | Why | Alternative |
|---|---|---|
Post-clone CLAUDE.md poisoning |
Once a repo is cloned, CLAUDE.md loads into the system prompt before any hook runs. Platform limitation, no hook-based fix. |
Always scan repos remotely before cloning with /security scan <url> --deep. For repos already cloned: review CLAUDE.md before opening |
| ML-based injection classification | Regex patterns cannot catch novel phrasings or adversarial paraphrasing. Joint paper (14 researchers, 2025) reports 95-100 % ASR against all 12 tested defenses for motivated adaptive attackers | Use parry-guard (DeBERTa v3 + Llama Prompt Guard 2) alongside this plugin. No conflict |
| Enterprise SSO / SCIM | Platform-level configuration | Anthropic Admin Console |
| RAG infrastructure | Vector DB / embedding pipeline security | Dedicated RAG security tools |
| LLM gateway / proxy | Network infrastructure layer | API gateway solutions |
| SIEM integration | Organization security stack | Splunk, Sentinel, etc. — but the JSONL audit trail is SIEM-ready |
| General agent scheming detection | The session guard catches the lethal trifecta as a known sequence; novel hidden-goal pursuit remains fundamentally hard for any tool. | Trifecta + delegation tracking provide partial coverage; full scheming detection requires monitoring + human oversight |
These gaps are surfaced advisorily through /security threat-model and /security pre-deploy.
Complementary tools
| Tool | What it adds |
|---|---|
| parry-guard | ML injection classification (DeBERTa v3 + Llama Prompt Guard 2 86M, Rust, fail-closed). Catches what regex misses |
| Lasso claude-hooks | Different philosophy: 96 patterns across 5 categories, warn-and-continue. Both can run in the same hook chain |
| Snyk agent-scan | Commercial skills/MCP scanning with a larger training set (3 984 skills analyzed) |
Tip
Recommended combo: llm-security (breadth — static + supply chain + audit + posture + threat modeling) + parry-guard (depth — ML injection classification). Different layers, no conflict.
Project scope
This is a solo open-source project in stabilization mode as of 2026-05-01. The current feature set (5 frameworks, 23 scanners, 9 hooks, 6 agents, 20 commands, 22 knowledge files, 1822+ tests including a dedicated end-to-end suite) is the natural plateau for what a deterministic + advisory plugin can defend against without crossing into commercial-grade territory. Going forward, work focuses on:
- Bug fixes and security patches
- Compatibility with new Claude Code releases
- Knowledge-base refresh (OWASP updates, new published research, new attack patterns)
- Deprecation cleanup — v8.0.0 removes the
LLM_SECURITY_*env vars andriskScoreV1constant deprecated in v7.3.0 - Opportunistic small additions that fit the existing deterministic architecture
The following are explicitly out of scope — fork the repo and own them
under your organization's name. The MIT license permits this and the project
is architected to be forkable. See CONTRIBUTING.md for
the fork-and-own guide.
| Out of scope | Why | Where to look instead |
|---|---|---|
| Web dashboard / fleet policy server | Multi-tenant UX + ongoing infra work | Snyk, Lakera Cloud |
| Runtime prompt firewall (real-time blocking proxy) | Inline gateway architecture | Lakera Guard, Protect AI Rebuff, parry-guard |
| IDE real-time LSP scanning | IDE integration + always-on perf budget | Snyk IDE, Semgrep IDE |
| Compliance PDF/DOCX evidence pack | Auditor-formatted reports as a product | Vanta, Drata, Secureframe |
| Enterprise ticketing / chat connectors (Jira, ServiceNow, Slack, Teams, PagerDuty) | Per-vendor SDK + auth + ongoing API drift | Splunk SOAR, Tines, custom integration |
| Multi-tenancy / centralized plugin runtime / fleet state | Hosted-product surface area | Build it on a fork |
| ML-based detectors requiring model hosting | Model-serving infra (training, eval, drift) | parry-guard (DeBERTa v3 + Llama Prompt Guard 2) |
| Marketplace UI / web catalog | Frontend product | This is not that kind of project |
| SSO / SCIM / RBAC | Platform-level enterprise concerns | Anthropic Admin Console + your IdP |
If you need any of the above and your organization has the headcount to maintain it, fork freely. The maintainer encourages it. Issues and support flow back to the fork, not here.
Defense philosophy
Prompt injection is structurally unsolvable with current architectures (joint paper, 14 researchers, 2025: 95-100 % ASR against all 12 tested defenses by motivated red-teamers). v5.0+ does not claim to "prevent" injection. It implements defense-in-depth:
- Broader detection — MEDIUM advisories for obfuscation signals (leetspeak, homoglyphs, zero-width chars, multi-language); Unicode Tag and PUA-A/B steganography; bash expansion evasion T1-T9; rot13 hidden imperatives in comments
- Increased attack cost — Rule-of-Two trifecta detection (configurable block/warn/off, default warn), bash normalization before gate matching, MCP cumulative-drift baseline catching slow-burn rug-pulls
- Longer monitoring windows — 100-call long-horizon alongside 20-call sliding window; slow-burn trifecta detection (legs >50 calls apart); Jensen-Shannon behavioral drift; sub-agent delegation tracking
- Architectural constraints — opportunistic byte-fingerprint matching for output→input lineage (first 200 bytes, SHA-256/16-hex tag — not semantic capability tracking; trivially bypassed by mutation, but raises the cost of casual exfil)
- Honest documentation — known limitations are surfaced, not hidden
System-card alignment (Opus 4.7): §5.2.1 documents that multi-layer defenses outperform single-layer against adaptive attacks; this plugin's posture matches. §6.3.1.1 documents that Opus 4.7 follows agent instructions more literally — stacked imperatives are less useful than tool-level enforcement, and agent files have been updated accordingly. Full mapping in docs/security-hardening-guide.md §5.
What v5.0+ cannot do: prevent adaptive attacks from motivated human red-teamers, fix CLAUDE.md loading before hooks (platform limitation), detect novel NL indirection without ML, prevent long-horizon attacks without detectable patterns, provide formal worst-case guarantees.
Compatibility
- Claude Code: v2.x+
- Platform: macOS, Linux, Windows (all hooks are Node.js
.mjs) - Node.js: any recent LTS for hook scripts and CLI
- Overlap with
claude-code-essentials: safe to run both. This plugin extends with path guarding, MCP verification, and runtime trifecta detection. Duplicate blocking is harmless — hooks run sequentially
Playground (v7.6.0)
A single-file SPA at playground/llm-security-playground.html provides
an interactive surface for onboarding, command discovery and report demos
without requiring Claude Code installation. Open the file directly in
a browser (Chrome/Firefox/Safari over file://) — no build step, no
network calls, no npm install. Theme-bootstrap with FOUC-prevention; state
persisted in IndexedDB primary + localStorage fallback.
v7.6.0 Tier 3-referanse-case: Playgroundet er nå en visuelt og
strukturelt fullført referanse for shared/playground-design-system/
Tier 3-supplementet. 8 nye DS-komponenter integrert i de 18 rapport-
rendererne: tfa-flow (lethal trifecta-kjede), mat-ladder (modenhets-
stige), suppressed-group (narrative-audit), codepoint-reveal (Unicode-
steganografi), top-risks (rangert top-funn), recommendation-card[data- severity] (severity-tinted advisory), risk-meter (band-visualisering
0-100), card--severity-{level} (severity-color findings-cards). Pluss
badge--scope-security, verdict-pill-lg og form-progress+fp-step
fra wave 1.
Layout:
playground/
├── llm-security-playground.html ← single-file SPA (~10 700 lines)
├── vendor/
│ └── playground-design-system/ ← synket fra shared/, sjekksum-låst
├── test-fixtures/ ← markdown-fixtures (én per kommando)
├── screenshots/v7.5.0/ ← Playwright-genererte demobilder (12)
├── screenshots/v7.6.0/ ← v7.6.0 demobilder (12, manuelt generert)
└── A11Y-RAPPORT.md ← WCAG 2.1 AA verifisering + Tier 3 ARIA
Hva playgroundet dekker:
- Onboarding (5 grupper): organisasjon, scope, profil, plattform,
compliance. Verdier persisteres som
shared-state og prefylles automatisk i alle command-skjemaer. - Home: prosjekt-grid, fleet-tracks for posture/scan/red-team. «Last
inn demo-data»-knappen aktiverer 3 prosjekter inkludert
dft-komplett-demomed alle 18 rapporter ferdig parsed. - Catalog: alle 20 kommandoer gruppert i 5 kategorier. Søk filtrerer cards, og «Åpne skjema»-knapp bygger ferdig pipeline-streng for klipp-og- lim til terminalen.
- Project surface: 4 skjermer (Oversikt / Rapporter / Kontekst / Eksport). Rapporter-tabben har category-tabs (discover / posture / findings-ops / hardening / adversarial / mcp-ops) og lim-inn-import for hver rapport-kommando.
Parser/renderer-arkitektur: Hver produces_report=true-kommando i
CATALOG har en parser (markdown → struktur) og en renderer (struktur
→ DS-komponenter). 18 archetypes støttes: findings, findings-grade,
risk-score-meter, posture-cards, dashboard-fleet, red-team-results,
diff-report, kanban-buckets, matrix-risk. Parser-kontrakten er
{ ok: true, data: {...} } | { ok: false, errors: [...] }. Test-fixtures
under playground/test-fixtures/ er kontrakt-anker — én markdown-fil per
kommando som speiler templates/unified-report.md-formatet.
Eksponerte testing/automasjons-globaler: __store, __navigate,
__loadDemoState, __scheduleRender, __PARSERS, __RENDERERS,
__CATALOG, __inferVerdict, __inferKeyStats, __renderPageShell,
__handlePasteImport. Aktiverer Playwright-styrt navigasjon og
programmatisk parser/renderer-test mot fixture-katalogen.
Begrensninger: SPA er en lim-inn-overflate — den kjører ingen scannere
selv. Output må komme fra Claude Code (/security scan ...), CLI
(node scanners/...) eller stub-fixtures. Demo-state inneholder kun de
3 inline-prosjektene; nye prosjekter er per-bruker og lagres lokalt.
Self-scan
Running node scanners/scan-orchestrator.mjs . on this plugin produces 0 findings (ALLOW) with ~190 suppressions via .llm-security-ignore. Every suppression is explained — a security plugin that documents attack patterns, ships a malicious demo fixture, and tests against deliberately evil code will trigger its own scanners. The entropy scanner flags regex patterns in knowledge/secrets-patterns.md. The taint scanner flags eval(user_input) in test fixtures. The toxic flow analyzer flags the plugin's own commands that use Read+Bash. Remove the ignore file and re-run to see the unsuppressed picture.
The examples/malicious-skill-demo/ directory contains a deliberately malicious "Project Health Dashboard" plugin and a full security assessment. The combined LLM + deterministic pipeline produced 85 findings (24 critical, 24 high, 20 medium, 6 low, 11 info) and verdict BLOCK 100/100 — both layers independently maxed the risk score. A human reviewing the plugin's README.md and SKILL.md would likely miss most of them; the Unicode Tag steganography is literally invisible.
node scanners/scan-orchestrator.mjs examples/malicious-skill-demo/evil-project-health/ # ~5s
/security scan examples/malicious-skill-demo/evil-project-health/ --deep # full pipeline
Other runnable examples
The examples/ directory contains additional self-contained
demonstrations — each with README.md, fixture, run script, and
expected-findings.md:
prompt-injection-showcase/— 61 payloads across 19 categories fed topre-prompt-inject-scan,post-mcp-verify, andpre-bash-destructive. Run:node examples/prompt-injection-showcase/run-showcase.mjslethal-trifecta-walkthrough/— 5-step Rule-of-Two demonstration (WebFetch → Read .env → Bash curl POST + suppression follow-ups) showingpost-session-guardadvisory firing on leg 3. State-isolated via run-script PID. Run:node examples/lethal-trifecta-walkthrough/run-trifecta.mjsmcp-rug-pull/— 8-stage MCP description drift, each step under the 10% per-update threshold but cumulatively >25% from baseline. Demonstrates the v7.3.0 cumulative-drift advisory (E14, OWASP MCP05). Cache isolated viaLLM_SECURITY_MCP_CACHE_FILE. Run:node examples/mcp-rug-pull/run-rug-pull.mjssupply-chain-attack/— two-layer demonstration: PreToolUse hook blocks compromisedevent-stream@3.3.6and advises on scope-hopping@evilcorp/lodash; offlinedep-auditorflags 5 typosquats + apostinstall: curl ... | shvector in the fixturepackage.json. Run:node examples/supply-chain-attack/run-supply-chain.mjspoisoned-claude-md/— 6 memory-poisoning detectors fire on a fixtureCLAUDE.md+ agent file (E15 surface). Demonstrates injection, shell-command, suspicious-URL, credential-path, permission-expansion, and base64-encoded-payload detection. Run:node examples/poisoned-claude-md/run-memory-poisoning.mjsbash-evasion-gallery/— one disguised variant per T-tag (T1-T9) fed throughpre-bash-destructive, verified BLOCK afterbash-normalizestrips the evasion. T8 has its own BLOCK_RULE. Run:node examples/bash-evasion-gallery/run-evasion-gallery.mjstoxic-agent-demo/— single-component lethal trifecta detected by thetoxic-flow-analyzer(TFA). A fixture agent withtools: [Bash, Read, WebFetch]covers all three trifecta legs (untrusted input + sensitive data access + exfil sink), and the fixture deliberately ships nohooks/hooks.jsonso TFA emits a CRITICALLethal trifecta:finding without mitigation downgrade. Usesplugin.fixture.jsonas the plugin marker so the example doesn't trippre-write-pathguardon.claude-plugin/. Maps to ASI01 / ASI02 / ASI05 / LLM01 / LLM02 / LLM06. Run:node examples/toxic-agent-demo/run-toxic-flow.mjspre-compact-poisoning/—pre-compact-scanPreCompact hook detecting both an injection pattern and a credential-shaped string in a synthetic transcript across all three modes (off / warn / block). The transcript is generated at runtime in a per-invocation tempdir; the AWS-shaped key uses the same'AK' + 'IA' + ...fragmentation idiom astests/e2e/attack-chain.test.mjs, so the source contains no literal credentials. Includes a benign-transcript control case in block mode to prove the gate is not a brick wall. Maps to LLM01 / LLM02 / ASI01 / AT-1 / AT-3. Run:node examples/pre-compact-poisoning/run-pre-compact-poisoning.mjs
Recent versions
| Version | Date | Highlights |
|---|---|---|
| 7.6.1 | 2026-05-06 | Playground v7.6.0 visuell-patch. Seks bugs fanget under maintainer-verifisering i nettleser. Alle skyldtes mismatch mellom DS-klasser og rendrer-bruk (eller manglende DS-implementasjoner playground antok eksisterte). (1) renderFindingsBlock brukte .findings outer som er DS' 2-kolonners list+detail-grid → erstattet med <section class="report-meta"> + korrekt findings__list > findings__group-mønster. (2) .report-table manglet helt i DS men brukes i 7+ rendrere → lokal CSS-implementasjon i playground-HTML. (3) renderPreDeploy traffic-lights brukte .sm-card__grade (28×28 px for én A-F-bokstav) for "PASS"/"PASS-WITH-NOTES"/"FAIL" → erstattet med bredde-tilpasset status-pill. (4) Threat-model matrix-bobler ikke klikkbare → <button> med data-threat-id + click-handler som scroller til Trusler-tabellen. (5) Radar-labels overlappet ved 6+ akser → SVG 280→380, R 105→125, dynamisk text-anchor (start/end/middle) basert på horisontal-posisjon. (6) recommendation-card__body overflow på lange tekster → overflow-wrap: anywhere. 4/4 fix-spesifikke smoke-tester + 18/18 renderer-regresjon passerer. Ingen scanner- eller hook-atferdsendringer — purely additive surface. |
| 7.6.0 | 2026-05-06 | Playground Tier 3-referanse-case. Playground (playground/llm-security-playground.html) hevet til visuelt og strukturelt fullført referanse for shared/playground-design-system/ Tier 3-supplementet. 8 nye DS-komponenter integrert i de 18 rapport-rendererne: tfa-flow + tfa-leg + tfa-arrow (lethal trifecta-kjede med <button>-elementer + ARIA), mat-ladder + mat-step (5-trinns modenhets-stige med terskler 0/25/50/75/95% PASS), suppressed-group (narrative-audit fra summary.narrative_audit.suppressed_findings), codepoint-reveal + cp-tag/cp-zw/cp-bidi (Unicode-steganografi side-ved-side), top-risks + top-risk[data-severity] (rangert top-funn-listing, semantisk <ol>), recommendation-card[data-severity] (severity-tinted advisory på clean/harden/audit/posture/pre-deploy/plugin-audit), risk-meter (band-visualisering 0-100 på 5 archetypes), card--severity-{level} (severity-color modifier på findings-cards). Wave 1: badge--scope-security (identitets-chip), verdict-pill-lg (DS Tier 3-pill), form-progress + fp-step (onboarding-wizard). Slettet ~30 duplikat-CSS-deklarasjoner (DS vinner cascade). 5 nye DS-helpers + mapSeverityToCardLevel + parseNarrativeAudit. Filendring 10209 → 10677 linjer. Levert over 5 sesjoner, atomic commits. A11Y-rapport oppdatert. Ingen scanner- eller hook-behavior-changes — purely additive surface. |
| 7.5.0 | 2026-05-05 | Playground. Single-file SPA at playground/llm-security-playground.html (~10 200 lines) for onboarding, demoer og workshop-bruk uten Claude Code-installasjon. Parsere + renderere for alle 18 produces_report=true-kommandoer (Fase 2: 10 høy-prio + Fase 3: 8 gjenstående: mcp-inspect, supply-check, pre-deploy, diff, watch, registry, clean, threat-model). 18 markdown test-fixtures under playground/test-fixtures/ som kontrakt-anker. Komplett demo-prosjekt dft-komplett-demo har alle 18 rapporter ferdig parsed inline. Vendor-synket design-system under playground/vendor/ (sjekksum-låst). 9 Playwright-genererte screenshots i playground/screenshots/v7.5.0/. 11 nye window-globaler for testing/automasjon. 2 nye KEY_STATS_CONFIG-archetypes (kanban-buckets, matrix-risk). Bug-fix: normalizeVerdictText regex-rekkefølge oppdatert så GO-WITH-CONDITIONS / CONDITIONAL / BETINGET ikke lenger kollapser til ALLOW. Ingen scanner- eller hook-behavior-changes — purely additive surface. |
| 7.4.0 | 2026-05-05 | Examples + e2e suite. Seven runnable demonstration walkthroughs under examples/ (prompt-injection-showcase, lethal-trifecta-walkthrough, mcp-rug-pull, supply-chain-attack, poisoned-claude-md, bash-evasion-gallery, toxic-agent-demo, pre-compact-poisoning) — each with README.md, runtime-isolated fixture, single-command run-script, and expected-findings.md testable contract. Three new tests/e2e/ suites (attack-chain 17 tests + multi-session 9 tests + scan-pipeline 19 tests = +45 tests, total 1822) prove the framework works as a coordinated system, not just isolated units. No scanner or hook behavior changes — purely additive surface. Scanner VERSION constants synced across dashboard-aggregator.mjs, posture-scanner.mjs, ide-extension-scanner.mjs. |
| 7.3.1 | 2026-05-01 | Stabilization patch. Project repositioned as solo, stabilization-only, with explicit "fork & own" stance for enterprise features. New public docs: CONTRIBUTING.md (fork-and-own model), README "Project scope" section (out-of-scope table with commercial alternatives), updated SECURITY.md (v7.3.x supported, v7.0–v7.2 best-effort, < v7.0 EOL). Coherence: package.json files whitelist + bugs URL + repo URL fix; scanner VERSION constants synced across dashboard-aggregator.mjs, posture-scanner.mjs, ide-extension-scanner.mjs. Test ceiling raised on flaky pre-compact-scan timing test (500 ms → 1000 ms; design target unchanged). No behavior changes. |
| 7.3.0 | 2026-05-01 | Batch C release. Wave A (T7-T9 bash normalization + rot13 comment-block decoder), Wave B (.gitattributes post-clone advisory + npm scope-hop typosquat + GitHub/Forgejo workflow-scanner with 23-field blacklist + re-interpolation tracking + auth-bypass detection), Wave C (MCP cumulative-drift baseline + /security mcp-baseline-reset), Wave D (riskScoreV1 @deprecated; sandbox-architecture rationale docs; env-var deprecation runway to v8.0.0; CLAUDE.md hooks count + consistency test). 1665+ → 1777 tests. Wave E (additional attack-simulator scenarios) deferred indefinitely |
| 7.2.0 | 2026-04-29 | Batch B release. Critical-review B-tier scanner defects + v7.2.0 evasion-arsenal (PUA-A/B Unicode coverage, NFKC homoglyph fold, escalation-after-input window, markdown link-title + SVG <desc>/<foreignObject> + HTML comment extractors). Two-stage entropy context classification. v1→v2 risk-formula constants unified across docs. 8 new red-team scenarios (64 → 72). 1522 → 1665 tests |
| 7.1.0 | 2026-04-29 | Critical-review patch. Pathguard regex hole closed (.env.production.local.backup-class). Distributed-trifecta block-mode AND-gate removed. CaMeL claim toned down to honest "byte-fingerprint matching". Documentation honesty-sweep across 7 overclaim sites. 1487 → 1511 tests |
Full history in CHANGELOG.md.
License & attribution
MIT. See LICENSE.
Built on published research from OWASP, ToxicSkills (Xi'an Jiaotong, 2025), ClawHavoc (Repello AI, 2025), MCPTox, Pillar Security, Invariant Labs, GHSL Security Lab, Operant AI, and Google DeepMind's AI Agent Traps taxonomy. Threat patterns and case studies in knowledge/ are cited inline.
Feedback & contributing
- Bug reports + feature requests: open an issue on Forgejo
- Pull requests: not accepted on this repo (solo project, dialog-driven
development with Claude Code). For larger changes, see
CONTRIBUTING.mdand the fork-and-own model - Security disclosures: see
SECURITY.md— please email, do not open a public issue - Project scope: see "Project scope" section above for what is and isn't on the roadmap, and what to fork for instead